-
Notifications
You must be signed in to change notification settings - Fork 483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storcon: timeline table, deletion and creation #10440
base: main
Are you sure you want to change the base?
Conversation
7fe0060
to
361f491
Compare
7403 tests run: 7016 passed, 0 failed, 387 skipped (full report)Flaky tests (6)Postgres 17
Postgres 16
Postgres 14
Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
c660d78 at 2025-01-27T20:03:25.839Z :recycle: |
46ddaaa
to
0ec5046
Compare
9ddb0dc
to
5bdda19
Compare
5bdda19
to
a7bd98b
Compare
a7bd98b
to
7ec08ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just took a cursory look to prepare for tomorrow's meeting.
storage_controller/src/service.rs
Outdated
// Can't do return Err because of async block, must do ? plus unreachable!() | ||
return Err(ApiError::InternalServerError(anyhow!( | ||
"import pgdata doesn't specify the start lsn, aborting creation on safekeepers" | ||
)))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't understand this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah sorry I will remove it, it was from an earlier state where there was an async block instead of an async fn
// notify cplane about creation | ||
// TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This todo indicates that this method is in the wrong place.
Really, this method belongs into the impl Reconciler
, where we have self.compute_hook.notify(ShardUpdate{...})
.
Which today calls cplane, but, architecturally, cplane doesn't actually need to know which safekeepers a compute should connect to; cplane only needs to funnel through the ShardUpdate
to the compute. It could/should be an opaque blob to cplane.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact I've been thinking about extending compute_hook
for this purpose because that one already more or less has a connection to the control plane.
As for where to put the hook call, I think we can already call the hook the moment a quorum of safekeepers is reached. How to determine that is a good question, I suppose we can run db queries for that. But it needs to be something that is race condition free i.e. we don't want two reconcilers finishing in parallel both issue the reconcile (or the opposite, two reconcilers finishing in parallel not issue reconciles).
storage_controller/src/service.rs
Outdated
tenant_id: TenantId, | ||
timeline_info: &TimelineInfo, | ||
create_mode: models::TimelineCreateRequestMode, | ||
) -> Result<(u32, Vec<NodeId>), ApiError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This naked u32 should be wrapped in a newtype so it's clear what it means.
Had to jump to caller to learn it's the safekeeper generation.
Generation numbers are super important to get right, correctness of the service depends on them.
|
||
pub safekeepers: Option<Vec<NodeId>>, | ||
pub safekeepers_generation: Option<u32>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would actually be better to use the same data structure here that the reconciler upcall will use (the TODO: notify cplane
in tenant_timeline_create_safekeepers_reconcile
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
submitting superficial things on a first glance
@@ -280,6 +280,18 @@ pub struct TimelineCreateRequest { | |||
pub new_timeline_id: TimelineId, | |||
#[serde(flatten)] | |||
pub mode: TimelineCreateRequestMode, | |||
/// Whether to also create timeline on the safekeepers (specific to storcon API) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking to have this as a global command line argument -- it avoids cplane change and it doesn't seem useful to enable this only for some timelines.
#[serde(flatten)] | ||
pub timeline_info: TimelineInfo, | ||
|
||
pub safekeepers: Option<Vec<NodeId>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We agreed that it might be useful to also pass hostname here, see
https://github.com/neondatabase/cloud/pull/23181/files#diff-e89538b4960047b678a0a0b78fc0cee4449b07a4d4e9c6e47b54549c8ef3f23b
@@ -1135,6 +1198,189 @@ impl Persistence { | |||
}) | |||
.await | |||
} | |||
|
|||
/// Timelines must be persisted before we schedule them for the first time. | |||
pub(crate) async fn insert_timeline(&self, entry: TimelinePersistence) -> DatabaseResult<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call it upsert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really an upsert call though, it only inserts. If the timeline already exists, it returns an error.
I think an upsert like functionality is dangerous, especially for something that happens regularly. upsert APIs are prone to race conditions.
That being said, we should return something more nice than an error if the entry already exists, so that we can use this information to make higher level calls idempotent.
.do_nothing() | ||
.execute(conn)?; | ||
|
||
if inserted_updated != 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is inserted_updated if DO NOTHING happened?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably it's 0. yeah, this case should probably be improved.
|
||
macro_rules! measured_request { | ||
($name:literal, $method:expr, $node_id: expr, $invoke:expr) => {{ | ||
let labels = crate::metrics::PageserverRequestLabelGroup { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PageserverRequestLabelGroup
This PR adds an optional setting to the storage controller's timeline creation endpoint to also create them on the safekeepers.
In order to support individual safekeepers going offline, we immediately return from the http endpoint if news from a quorum of safekeepers reaches us that they have successfully created the timeline.
We introduce a new safekeeper reconciler task to the storage controller which reconciles outstanding timelines. For durably storing which timelines to reconcile, it uses two specific columns in the newly added
timelines
table:status_kind
andstatus
. Former is for quickly determining which status we want to do reconciliations for (we ask the db to maintain an index of it so that we can query specific states), latter is for status specific metadata in the json format.TODO until ready for review:
For follow-ups:
COUNT(*)
for determining target safekeepersdeleted_at
Part of #9011