Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate and verify CRC32 checksums for snapshot SST files #2436

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pcholakov
Copy link
Contributor

I'd like to ensure that we can detect corruption either in-place, or during download, of snapshot data files.

So far this is just a draft; I'd love some input on these points:

  • I picked CRC32 because that's what RocksDB uses for its SST files - but open to suggestions! If we stay with CRC, what would be a good way to represent it in the serialized JSON? Numeric / hex?
  • I am computing the checksums on the fly but I believe we can read them from the SSTs themselves; it seems like it would be just as involved, if not more, to open the exported snapshot as a read-only database, for the sake of reading the checksums though - I feel like we can burn a few CPU cycles on this since we have the data in memory during upload/download anyway?
  • I'm a bit shaky on serde usage - I am having to use a wrapper to extend the RocksDB type, as well as a separate remote type to customize the serialization formatting of certain fields; are there opportunities to streamline this? Thoughts on where I've put the checksum in the metadata?

Copy link

github-actions bot commented Dec 17, 2024

Test Results

  7 files  ±0    7 suites  ±0   4m 57s ⏱️ - 9m 43s
 47 tests  - 2   46 ✅ +34  1 💤 +1  0 ❌ ±0 
182 runs  +4  179 ✅ +70  3 💤 +1  0 ❌ ±0 

Results for commit 400400c. ± Comparison against base commit 3da5356.

This pull request removes 2 tests.
dev.restate.sdktesting.tests.CallOrdering ‑ ordering(boolean[], Client)
dev.restate.sdktesting.tests.CancelInvocation ‑ cancelInvocation(BlockingOperation, Client, URL)

♻️ This comment has been updated with latest results.

@AhmedSoliman
Copy link
Contributor

I am computing the checksums on the fly but I believe we can read them from the SSTs themselves; it seems like it would be just as involved, if not more, to open the exported snapshot as a read-only database, for the sake of reading the checksums though - I feel like we can burn a few CPU cycles on this since we have the data in memory during upload/download anyway?

I'm in favor of getting this from rocksdb itself, having two potentially disagreeing values will only make debugging/repair harder in my opinion.

@AhmedSoliman
Copy link
Contributor

I picked CRC32 because that's what RocksDB uses for its SST files - but open to suggestions! If we stay with CRC, what would be a good way to represent it in the serialized JSON? Numeric / hex?
A number is probably the best option here

pub crc32: u32,
}

impl From<&LiveFile> for ExtendedFileMetadata {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implementing From on a reference is not conventional. From by convention consumes its input. Do not hide the clone, instead, let the cost be visible to users of the API.

@@ -43,7 +43,7 @@ pub(crate) async fn run_tests(manager: PartitionStoreManager, mut partition_stor
key_range: key_range.clone(),
min_applied_lsn: snapshot.min_applied_lsn,
db_comparator_name: snapshot.db_comparator_name.clone(),
files: snapshot.files.clone(),
files: snapshot.files.iter().map(|f| f.into()).collect(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, fyi, another way to do it:

Suggested change
files: snapshot.files.iter().map(|f| f.into()).collect(),
files: snapshot.files.iter().map(Into::into).collect(),

@AhmedSoliman
Copy link
Contributor

Did a quick scan, left a couple of nits. Happy to do a real review once you decide on whether using rocksdb's built-in sst crc checking (you need to enable this if it's not enabled by default) or not.

@pcholakov
Copy link
Contributor Author

Perfect, this is exactly what I was looking for! Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants