SQLite: Cache prepared statements behind `sql.exec()`. #2970

kentonv · 2024-10-22T04:02:16Z

sql.exec() currently parses the query every time it is invoked. This change makes it so that we retain some number of prepared statements in a cache and reuse them, keyed by the query text. Since query strings are likely commonly internalized strings, cache lookups should be O(1) most of the time, as internalized strings compare equal by identity without the need to compare content.

kentonv · 2024-10-22T16:48:04Z

Added a couple commits to fix a problem where queries were very commonly being left open until the cursor was GC'd, which made the statement cache ineffective.

justin-mp · 2024-10-22T12:27:38Z

src/workerd/util/sqlite.c++

+          KJ_IF_SOME(p, prelude) {
+            p.add(Statement(*this, regulator,
+                StatementAndEffect{.statement = kj::mv(ownResult),
+                  .stateChange = kj::mv(parseContext.stateChange)}));


parseContext was just reset on line 642, so parseContext.stateChange is going to be an uninitialized OneOf. Is that what we want here?

Eek, no, that is a bug. Good catch. I'd better add a test that does transactions.

(Though this bug wouldn't be possible to hit from JavaScript since we don't allow people to execute transaction statements from JS.)

(But it's not actually an uninialized OneOf. The default value for ParseContext.stateChange is NoChange().)

(Ah, yes, I jumped 1 level too far when reading that code.)

justin-mp · 2024-10-22T12:43:00Z

src/workerd/api/sql.h

@@ -191,7 +185,7 @@ class SqlStorage::Cursor final: public jsg::Object {

    void ensureInitialized(jsg::Lock& js, SqliteDatabase::Query& source);

-    JSG_MEMORY_INFO(cachedColumnNames) {
+    JSG_MEMORY_INFO(CachedColumnNames) {


Huh. I guess this worked before because cachedColumnNames is accessible in the scope of this class and is the same size.

Yeah I was surprised by this, but I guess it just feeds into sizeof(). (I guess the size was even correct in this case, though if it wasn't, the result would presumably only be that memory was miscounted. No one would ever have noticed.)

justin-mp · 2024-10-22T12:54:26Z

src/workerd/api/sql.h

+    }
+  };
+
+  using StatementMap = kj::Table<kj::Rc<CachedStatement>, kj::HashIndex<StatementCacheCallbacks>>;


For my own education, why do we have to use kj::Table with a kj::HashIndex instead of using kj::HashMap? Is it because the key could be either a jsg::JsString or a jsg::HashableV8Ref<v8::String>?

It's because I wanted the key to be stored as a member of CachedStatement which is held by kj::Rc, not inlined into the table. HashMap doesn't quite have that structure. (I'll add a comment.)

jclee · 2024-10-23T04:31:20Z

src/workerd/api/sql.h

@@ -284,6 +288,11 @@ class SqlStorage::Cursor final: public jsg::Object {
  kj::Maybe<CachedColumnNames> ownCachedColumnNames;
  CachedColumnNames& cachedColumnNames;

+  // Invoke when `query.isDone()`, or when we want to prematurely cancel the query. This records
+  // row counters and then sets `stateRef` to `none` to drop the query and return the prepared


I guess technically it sets state to none, not stateRef, although they refer to the same thing in most cases.

jclee · 2024-10-23T04:55:08Z

src/workerd/api/sql.h

+  struct StatementCache {
+    StatementMap map;
+    kj::List<CachedStatement, &CachedStatement::lruLink> lru;
+    size_t totalSize = 0;


Should totalSize maybe feed back to jsg memory tracking metrics somehow? Intuitively, I'd guess one could maybe add it to the jsgGetMemorySelfSize() return value, but I'm not familiar with exactly how that gets used.

Hmm I looked at SqlStorage::visitForMemoryInfo and I frankly have no idea what's going on there. It seems like it's double-counting pointer sizes but not counting the things they point to. Also I suppose it cannot dereference IoOwn because visitForMemoryInfo() is not necessarily called from within any particular IoContext? So is it even possible to count the statement cache?

I think I'm just not going to touch it.

jclee · 2024-10-23T05:10:27Z

src/workerd/util/sqlite-test.c++

+      value INTEGER
+    );
+    INSERT INTO things(value) VALUES (123);
+    INSERT INTO things(value) VALUES (456);


Would there be value in testing a multi statement that fails partway through? I guess it wouldn't make a difference at construction time, since it's lazy. And I guess it would just throw the appropriate exception at run time?

Good call. I think the prelude vector would end up incorrect... the queries before the one that failed would get duplicated on the next attempt.

kentonv · 2024-10-28T20:31:54Z

Fixups:

However, the D1 API test is still failing. Going to fix that next.

kentonv · 2024-10-29T02:06:25Z

OK, the D1 tests caught a real regression and the last commit fixes it.

justin-mp · 2024-10-29T10:38:20Z

src/workerd/util/sqlite-test.c++

+  KJ_EXPECT_THROW_MESSAGE("SQLITE_CONSTRAINT", stmt.run());
+  KJ_EXPECT_THROW_MESSAGE("SQLITE_CONSTRAINT", stmt.run());
+
+  // We ran the statement three times. Each time it should have inserted a new row containing


We talked offline yesterday about exec()ing multiple statements and how if one fails in the middle, we should probably roll back all of the statements. For my own education, in a world where we fix that bug, would prepareMulti implement the rollback or would something higher than prepareMulti handle the rollback?

If it's prepareMulti, should we drop a TODO(soon) here about how even this behavior is buggy?

Technically prepareMulti() itself doesn't parse or execute anything. But I think your question is, would SqliteDatabase handle the rollback internally or would the caller have to do it. I think SqliteDatabase should handle it. I suspect logic will be needed in all of prepareSql(), Statement::prepareForExecution(), and ~Query(), annoyingly.

If it's prepareMulti, should we drop a TODO(soon) here about how even this behavior is buggy?

Added comment:

https://github.com/cloudflare/workerd/pull/2970/files#diff-2b573b1c6fa7749e40c7da2c29f37f0f4accf53f2406bdab6ca5378755260fecR1173-R1177

(No other changes, just rebased.)

…ltiple statements. `prepare()` does not allow you to pass code that contains multiple statements (separated by semicolons). Extending it to allow this isn't straightforward, because later statements in the block might not be possible to parse if earlier statements haven't yet executed. For example, the first statement might create a table, and the next statement might insert into it. If you try to parse each statement before executing any of them, SQLite will throw an error on the second statement, because the table it's trying to insert into doesn't exist. To support a multi-statement prepare, we must change the contract such that the whole thing is parsed lazily, on the first actual execution. This is needed in order to support caching of prepared statements behind `exec()`, since `exec()` supports passing multiple statements at once.

This API is deprecated and won't be made public, so let's simplify the implementation to minimize maintenance burden. This commit also changes `exec()` to take `JsString` instead of `kj::String` because caching will be keyed by the JS string handle, so that's what prepared statements need to hold onto.

…chedColumnNames Will need to be here so it can be kept in the statement cache.

…JsValue. This is much cleaner as `SqlValue` may contain pointers that are invalidated when we iterate the statement. Previously, we were counting on such iteration not happening before JSG had a chance to convert the values to JavaScript, which was a little precarious. This will also allow us -- in a later commit -- to implement an optimization where we iterate the statement more eagerly. With this change, the logic of `wrapSqlRow{,Raw}()` ends up being merged into `{row,raw}IteratorNext()` so the former can be deleted.

By always iterating the underlying query one row ahead of what has been returned, we can discover when the query is done and return it to the statement cache more proactively. Without this optimization, statements very commonly don't get returned to the cache until the cursor is GC'd -- especially statements that return no results at all. In theory this would inflate the "rows read" metric for an application that commonly creates cursors and doesn't iterate them to completion. But, that should be uncommon, and "buffering ahead" is hardly an unreasonable thing for the platform to be doing. I verified this optimization works by checking how often sql-test.js is unable to reuse a cache entry because it's still "in use" (`isShared()` returns true). Before this, there were dozens of cases, but after, there are only two, and they are intentional cases of overlapping queries.

We now clean up queries more eagerly, but until now `.columnNames` couldn't be accessed after cleanup. This broke D1 tests. Additionally, I had forgotten to actually store the cached column names along with the cached stament (I created a CachedColumnNames object in CachedStatement but never used it!). This change makes it so "cached column names" is just a JsArray rather than a C++ object. We can then more easily hold onto it past the end of the query. Code also generally gets simpler. One caveat is that we must initialize the column names at the start of the query rather than lazily. This is because we cannot predict in advance whether `.columnNames` will be accessed after the query ends. So, we always need a copy of column names just in case.

…ent-cache-3" This reverts commit 9326b8a, reversing changes made to bbde7e4. We believe that the SQLite query cache has lead to a latency increase for many workloads.

kentonv requested review from jclee and justin-mp October 22, 2024 04:02

kentonv requested review from a team as code owners October 22, 2024 04:02

kentonv requested review from ketanhwr and Frederik-Baetens October 22, 2024 04:02

kentonv force-pushed the kenton/sqlite-statement-cache-3 branch from abdd91a to ff99401 Compare October 22, 2024 04:04

justin-mp approved these changes Oct 22, 2024

View reviewed changes

jclee reviewed Oct 23, 2024

View reviewed changes

jclee approved these changes Oct 23, 2024

View reviewed changes

anonrig approved these changes Oct 23, 2024

View reviewed changes

kentonv force-pushed the kenton/sqlite-statement-cache-3 branch from 16f2f4a to 7061747 Compare October 28, 2024 20:31

justin-mp approved these changes Oct 29, 2024

View reviewed changes

kentonv added 8 commits October 29, 2024 08:57

Drive-by: Fix comment about SqliteDatabase::prepare().

8156a08

SQLite: Move: SqlStorage::Cursor::CachedColumnNames -> SqlStorage::Ca…

d676e3c

…chedColumnNames Will need to be here so it can be kept in the statement cache.

SQLite: Cache prepared statements behind exec().

e9df309

kentonv force-pushed the kenton/sqlite-statement-cache-3 branch from 17c88eb to aa744f9 Compare October 29, 2024 13:57

kentonv merged commit 9326b8a into main Oct 29, 2024
13 checks passed

kentonv deleted the kenton/sqlite-statement-cache-3 branch October 29, 2024 15:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQLite: Cache prepared statements behind `sql.exec()`. #2970

SQLite: Cache prepared statements behind `sql.exec()`. #2970

kentonv commented Oct 22, 2024

kentonv commented Oct 22, 2024

justin-mp Oct 22, 2024

kentonv Oct 23, 2024

kentonv Oct 23, 2024

justin-mp Oct 23, 2024

justin-mp Oct 22, 2024

kentonv Oct 23, 2024

justin-mp Oct 22, 2024

kentonv Oct 23, 2024

jclee Oct 23, 2024

kentonv Oct 23, 2024

jclee Oct 23, 2024

kentonv Oct 28, 2024

jclee Oct 23, 2024

kentonv Oct 23, 2024

kentonv commented Oct 28, 2024

kentonv commented Oct 29, 2024

justin-mp Oct 29, 2024

kentonv Oct 29, 2024

SQLite: Cache prepared statements behind sql.exec(). #2970

SQLite: Cache prepared statements behind sql.exec(). #2970

Conversation

kentonv commented Oct 22, 2024

kentonv commented Oct 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kentonv commented Oct 28, 2024

kentonv commented Oct 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SQLite: Cache prepared statements behind `sql.exec()`. #2970

SQLite: Cache prepared statements behind `sql.exec()`. #2970