Skip to content

Commit

Permalink
chore: even more responsive compaction cancellation (#8725)
Browse files Browse the repository at this point in the history
Some benchmarks and tests might still fail because of #8655 (tracked in
#8708) because we are not fast enough to shut down ([one evidence]).
Partially this is explained by the current validation mode of streaming
k-merge, but otherwise because that is where we use a lot of time in
compaction. Outside of L0 => L1 compaction, the image layer generation
is already guarded by vectored reads doing cancellation checks.

32768 is a wild guess based on looking how many keys we put in each
layer in a bench (1-2 million), but I assume it will be good enough
divisor. Doing checks more often will start showing up as contention
which we cannot currently measure. Doing checks less often might be
reasonable.

[one evidence]:
https://neon-github-public-dev.s3.amazonaws.com/reports/main/10384136483/index.html#suites/9681106e61a1222669b9d22ab136d07b/96e6d53af234924/

Earlier PR: #8706.
  • Loading branch information
koivunej authored Aug 14, 2024
1 parent 36c1719 commit 60fc1e8
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 1 deletion.
7 changes: 6 additions & 1 deletion pageserver/src/tenant/timeline.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4540,7 +4540,12 @@ impl Timeline {
new_images: &[ResidentLayer],
layers_to_remove: &[Layer],
) -> Result<(), CompactionError> {
let mut guard = self.layers.write().await;
let mut guard = tokio::select! {
guard = self.layers.write() => guard,
_ = self.cancel.cancelled() => {
return Err(CompactionError::ShuttingDown);
}
};

let mut duplicated_layers = HashSet::new();

Expand Down
13 changes: 13 additions & 0 deletions pageserver/src/tenant/timeline/compaction.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1048,11 +1048,22 @@ impl Timeline {
let mut dup_end_lsn: Lsn = Lsn::INVALID; // end LSN of layer containing values of the single key
let mut next_hole = 0; // index of next hole in holes vector

let mut keys = 0;

while let Some((key, lsn, value)) = all_values_iter
.next(ctx)
.await
.map_err(CompactionError::Other)?
{
keys += 1;

if keys % 32_768 == 0 && self.cancel.is_cancelled() {
// avoid hitting the cancellation token on every key. in benches, we end up
// shuffling an order of million keys per layer, this means we'll check it
// around tens of times per layer.
return Err(CompactionError::ShuttingDown);
}

let same_key = prev_key.map_or(false, |prev_key| prev_key == key);
// We need to check key boundaries once we reach next key or end of layer with the same key
if !same_key || lsn == dup_end_lsn {
Expand Down Expand Up @@ -1157,6 +1168,8 @@ impl Timeline {
.await
.map_err(CompactionError::Other)?,
);

keys = 0;
}

writer
Expand Down

1 comment on commit 60fc1e8

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2244 tests run: 2165 passed, 0 failed, 79 skipped (full report)


Code coverage* (full report)

  • functions: 32.4% (7199 of 22244 functions)
  • lines: 50.4% (58241 of 115603 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
60fc1e8 at 2024-08-14T15:46:17.664Z :recycle:

Please sign in to comment.