-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQLite: Cache prepared statements behind sql.exec()
.
#2970
Conversation
abdd91a
to
ff99401
Compare
Added a couple commits to fix a problem where queries were very commonly being left open until the cursor was GC'd, which made the statement cache ineffective. |
KJ_IF_SOME(p, prelude) { | ||
p.add(Statement(*this, regulator, | ||
StatementAndEffect{.statement = kj::mv(ownResult), | ||
.stateChange = kj::mv(parseContext.stateChange)})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parseContext
was just reset on line 642, so parseContext.stateChange
is going to be an uninitialized OneOf
. Is that what we want here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eek, no, that is a bug. Good catch. I'd better add a test that does transactions.
(Though this bug wouldn't be possible to hit from JavaScript since we don't allow people to execute transaction statements from JS.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(But it's not actually an uninialized OneOf. The default value for ParseContext.stateChange
is NoChange()
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Ah, yes, I jumped 1 level too far when reading that code.)
src/workerd/api/sql.h
Outdated
@@ -191,7 +185,7 @@ class SqlStorage::Cursor final: public jsg::Object { | |||
|
|||
void ensureInitialized(jsg::Lock& js, SqliteDatabase::Query& source); | |||
|
|||
JSG_MEMORY_INFO(cachedColumnNames) { | |||
JSG_MEMORY_INFO(CachedColumnNames) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh. I guess this worked before because cachedColumnNames
is accessible in the scope of this class and is the same size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was surprised by this, but I guess it just feeds into sizeof()
. (I guess the size was even correct in this case, though if it wasn't, the result would presumably only be that memory was miscounted. No one would ever have noticed.)
} | ||
}; | ||
|
||
using StatementMap = kj::Table<kj::Rc<CachedStatement>, kj::HashIndex<StatementCacheCallbacks>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my own education, why do we have to use kj::Table
with a kj::HashIndex
instead of using kj::HashMap
? Is it because the key could be either a jsg::JsString
or a jsg::HashableV8Ref<v8::String>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because I wanted the key to be stored as a member of CachedStatement
which is held by kj::Rc
, not inlined into the table. HashMap
doesn't quite have that structure. (I'll add a comment.)
src/workerd/api/sql.h
Outdated
@@ -284,6 +288,11 @@ class SqlStorage::Cursor final: public jsg::Object { | |||
kj::Maybe<CachedColumnNames> ownCachedColumnNames; | |||
CachedColumnNames& cachedColumnNames; | |||
|
|||
// Invoke when `query.isDone()`, or when we want to prematurely cancel the query. This records | |||
// row counters and then sets `stateRef` to `none` to drop the query and return the prepared |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess technically it sets state
to none, not stateRef
, although they refer to the same thing in most cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo.
struct StatementCache { | ||
StatementMap map; | ||
kj::List<CachedStatement, &CachedStatement::lruLink> lru; | ||
size_t totalSize = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should totalSize maybe feed back to jsg memory tracking metrics somehow? Intuitively, I'd guess one could maybe add it to the jsgGetMemorySelfSize()
return value, but I'm not familiar with exactly how that gets used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I looked at SqlStorage::visitForMemoryInfo
and I frankly have no idea what's going on there. It seems like it's double-counting pointer sizes but not counting the things they point to. Also I suppose it cannot dereference IoOwn
because visitForMemoryInfo()
is not necessarily called from within any particular IoContext
? So is it even possible to count the statement cache?
I think I'm just not going to touch it.
value INTEGER | ||
); | ||
INSERT INTO things(value) VALUES (123); | ||
INSERT INTO things(value) VALUES (456); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there be value in testing a multi statement that fails partway through? I guess it wouldn't make a difference at construction time, since it's lazy. And I guess it would just throw the appropriate exception at run time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. I think the prelude
vector would end up incorrect... the queries before the one that failed would get duplicated on the next attempt.
16f2f4a
to
7061747
Compare
OK, the D1 tests caught a real regression and the last commit fixes it. |
KJ_EXPECT_THROW_MESSAGE("SQLITE_CONSTRAINT", stmt.run()); | ||
KJ_EXPECT_THROW_MESSAGE("SQLITE_CONSTRAINT", stmt.run()); | ||
|
||
// We ran the statement three times. Each time it should have inserted a new row containing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked offline yesterday about exec()
ing multiple statements and how if one fails in the middle, we should probably roll back all of the statements. For my own education, in a world where we fix that bug, would prepareMulti
implement the rollback or would something higher than prepareMulti
handle the rollback?
If it's prepareMulti, should we drop a TODO(soon) here about how even this behavior is buggy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically prepareMulti()
itself doesn't parse or execute anything. But I think your question is, would SqliteDatabase
handle the rollback internally or would the caller have to do it. I think SqliteDatabase
should handle it. I suspect logic will be needed in all of prepareSql()
, Statement::prepareForExecution()
, and ~Query()
, annoyingly.
If it's prepareMulti, should we drop a TODO(soon) here about how even this behavior is buggy?
Added comment:
(No other changes, just rebased.)
…ltiple statements. `prepare()` does not allow you to pass code that contains multiple statements (separated by semicolons). Extending it to allow this isn't straightforward, because later statements in the block might not be possible to parse if earlier statements haven't yet executed. For example, the first statement might create a table, and the next statement might insert into it. If you try to parse each statement before executing any of them, SQLite will throw an error on the second statement, because the table it's trying to insert into doesn't exist. To support a multi-statement prepare, we must change the contract such that the whole thing is parsed lazily, on the first actual execution. This is needed in order to support caching of prepared statements behind `exec()`, since `exec()` supports passing multiple statements at once.
This API is deprecated and won't be made public, so let's simplify the implementation to minimize maintenance burden. This commit also changes `exec()` to take `JsString` instead of `kj::String` because caching will be keyed by the JS string handle, so that's what prepared statements need to hold onto.
…chedColumnNames Will need to be here so it can be kept in the statement cache.
…JsValue. This is much cleaner as `SqlValue` may contain pointers that are invalidated when we iterate the statement. Previously, we were counting on such iteration not happening before JSG had a chance to convert the values to JavaScript, which was a little precarious. This will also allow us -- in a later commit -- to implement an optimization where we iterate the statement more eagerly. With this change, the logic of `wrapSqlRow{,Raw}()` ends up being merged into `{row,raw}IteratorNext()` so the former can be deleted.
By always iterating the underlying query one row ahead of what has been returned, we can discover when the query is done and return it to the statement cache more proactively. Without this optimization, statements very commonly don't get returned to the cache until the cursor is GC'd -- especially statements that return no results at all. In theory this would inflate the "rows read" metric for an application that commonly creates cursors and doesn't iterate them to completion. But, that should be uncommon, and "buffering ahead" is hardly an unreasonable thing for the platform to be doing. I verified this optimization works by checking how often sql-test.js is unable to reuse a cache entry because it's still "in use" (`isShared()` returns true). Before this, there were dozens of cases, but after, there are only two, and they are intentional cases of overlapping queries.
We now clean up queries more eagerly, but until now `.columnNames` couldn't be accessed after cleanup. This broke D1 tests. Additionally, I had forgotten to actually store the cached column names along with the cached stament (I created a CachedColumnNames object in CachedStatement but never used it!). This change makes it so "cached column names" is just a JsArray rather than a C++ object. We can then more easily hold onto it past the end of the query. Code also generally gets simpler. One caveat is that we must initialize the column names at the start of the query rather than lazily. This is because we cannot predict in advance whether `.columnNames` will be accessed after the query ends. So, we always need a copy of column names just in case.
17c88eb
to
aa744f9
Compare
sql.exec()
currently parses the query every time it is invoked. This change makes it so that we retain some number of prepared statements in a cache and reuse them, keyed by the query text. Since query strings are likely commonly internalized strings, cache lookups should be O(1) most of the time, as internalized strings compare equal by identity without the need to compare content.