Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RHCLOUD-37466 - Implement zookie into api #352

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

lennysgarage
Copy link
Contributor

@lennysgarage lennysgarage commented Jan 29, 2025

PR Template:

Describe your changes

NEW RPC: CheckForUpdate

  • Full consistency Check. Same behaviour as Check but always fully consistent
  • Still returns a zookie

  • Adds a zookie to each create/del/check/lookup requests as an optional field
  • Adds a zookie to each create/del/check/lookup response
    • Zookie is always returned from a read/write request. (Even if full consistency is enabled.)
  • Tests updated to at least work with zookies. 1 test case so far to verify read after write consistency with zookie (TestSpiceDbRepository_CheckPermissionZookie)

Example output of running a create & check immediately:

CreateTuples:
{ 
  "createdAt": { 
    "token": "GgYKBENNY0c=" 
  } 
}
Check (passing zookie in request):
{
  "allowed": "ALLOWED_TRUE",
  "checkedAt": {
    "token": "GgYKBENNY0c="
  }
}

Ticket reference (if applicable)

Fixes # https://issues.redhat.com/browse/RHCLOUD-37466

Checklist

  • Are the agreed upon acceptance criteria fulfilled?

  • Was the 4-eye-principle applied? (async PR review, pairing, ensembling)

  • Do your changes have passing automated tests and sufficient observability?

  • Are the work steps you introduced repeatable by others, either through automation or documentation?

    • If automation is possible but not done due to other constraints, a ticket to the tech debt sprint is added
    • An SOP (Standard Operating Procedure) was created
  • The Changes were automatically built, tested, and - if needed, behind a feature flag - deployed to our production environment. (Please check this when the new deployment is done and you could verify it.)

  • Are the agreed upon coding/architectural practices applied?

  • Are security needs fullfilled? (e.g. no internal URL)

  • Is the corresponding Ticket in the right state? (should be on "review" now, put to done when this change made it to production)

  • For changes to the public API / code dependencies: Was the whole team (or a sufficient amount of ppl) able to review?

Signed-off-by: Jonathan Marcantonio <jmarcant@redhat.com>
Signed-off-by: Jonathan Marcantonio <jmarcant@redhat.com>
Signed-off-by: Jonathan Marcantonio <jmarcant@redhat.com>
Copy link

github-actions bot commented Jan 29, 2025

The latest Buf updates on your PR. Results from workflow buf-pull-request / build (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedJan 31, 2025, 6:36 PM

Signed-off-by: Jonathan Marcantonio <jmarcant@redhat.com>
… requests

Signed-off-by: Jonathan Marcantonio <jmarcant@redhat.com>
Copy link
Contributor

@merlante merlante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good @lennysgarage. It has everything I was expecting. Let's wait for a couple of more reviews on this one.

internal/service/lookup_test.go Show resolved Hide resolved
@@ -36,5 +37,6 @@ message CheckResponse {
// e.g. ALLOWED_CONDITIONAL = 3;
}
Allowed allowed = 1;
Zookie checked_at = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any value to having checked_at, deleted_at, etc., rather than just zookie? Just wondering why you named them like this?

I'm wondering whether we want to give the impression that zookies are like timestamps, or to stay completely opaque as to what they are. I don't know the answer, just thinking out loud.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great question, it would not make much of a difference if it were all just zookie. The reasoning is due to the fact that I was following how authzed had setup their zedtoken. With looked_up_at for lookup responses https://buf.build/authzed/api/docs/main:authzed.api.v1#authzed.api.v1.LookupResourcesResponse and checked_at for https://buf.build/authzed/api/docs/main:authzed.api.v1#authzed.api.v1.CheckPermissionResponse etc.

Considering it is just an opaque string, I think having names that sound like timestamps can be misleading. I think it would be good to just have a single name like zookie for requests and response fields and won't have users thinking they can grab some extra timestamp value from what it is. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting that spicedb do it like this. :) I'm on the fence with this one, so whichever way you like is good with me. If nobody else comments, feel free to resolve one way or the other.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether we want to give the impression that zookies are like timestamps, or to stay completely opaque as to what they are. I don't know the answer, just thinking out loud.

This – we should be very opaque about them IMO, because we can't guarantee they'll always be timestamps.

See my comment below on the "Zookie" name itself – I think we need a little less specific of a field name for these because across other FGA projects, the token we'd use isn't actually a logical timestamp, so there is arguably a better name for this. Maybe simply "consistency_token" or "read_token" (similar to what you had originally).

deploy/kessel-relations.yaml Outdated Show resolved Hide resolved
Signed-off-by: Jonathan Marcantonio <jmarcant@redhat.com>
Signed-off-by: Jonathan Marcantonio <jmarcant@redhat.com>
Signed-off-by: Jonathan Marcantonio <jmarcant@redhat.com>
Signed-off-by: Jonathan Marcantonio <jmarcant@redhat.com>
@wscalf
Copy link
Contributor

wscalf commented Jan 30, 2025

Two thoughts around usecases for fullyconsistent alongside zookies:

  • Moving more toward Zanzibar..compliance? Normalness?..we'll need to be able to do fully consistent checks for non-view operations. Can we do that with this API change? It seems like it's either all fully consistent (by configuration) or minimize latency/zookie depending on whether or not one's passed.
  • ReadRelationships may need to be fully consistent more often than not (or even always.) Rationale: the main usecase for directly reading relationships (vs subjects/resources) is if a client intends to follow it with a WriteRelationships call (+ precondition?) to make some change relative to the current state. A stale read would be a problem there. (Though open to there being other usecases. Also not sure anything calls ReadRelationships atm, or that anything outside the management fabric should.)

@lennysgarage
Copy link
Contributor Author

Two thoughts around usecases for fullyconsistent alongside zookies:

  • Moving more toward Zanzibar..compliance? Normalness?..we'll need to be able to do fully consistent checks for non-view operations. Can we do that with this API change? It seems like it's either all fully consistent (by configuration) or minimize latency/zookie depending on whether or not one's passed.

Thanks for raising these points and with this change, no we won't be able to do that. We don't expose the consistency level in the API, so like you said we're either always fully consistent or minimize latency/zookie. I guess we would need to implement a consistency level similar to spicedb to determine when we want fully consistent checks versus ones with a zookie/no zookie.

  • ReadRelationships may need to be fully consistent more often than not (or even always.) Rationale: the main usecase for directly reading relationships (vs subjects/resources) is if a client intends to follow it with a WriteRelationships call (+ precondition?) to make some change relative to the current state. A stale read would be a problem there. (Though open to there being other usecases. Also not sure anything calls ReadRelationships atm, or that anything outside the management fabric should.)

ReadRelationships can be passed a zookie in the request as well. Since we're not exposing the consistency to the user we don't have a way for them to determine what they're going to do next (unless we modify the api like above). If the main usecase is for directly following up with a WriteRelationships call then it would be appropriate to change to full consistency always.

@merlante
Copy link
Contributor

Two thoughts around usecases for fullyconsistent alongside zookies:

  • Moving more toward Zanzibar..compliance? Normalness?..we'll need to be able to do fully consistent checks for non-view operations. Can we do that with this API change? It seems like it's either all fully consistent (by configuration) or minimize latency/zookie depending on whether or not one's passed.

Thanks for raising these points and with this change, no we won't be able to do that. We don't expose the consistency level in the API, so like you said we're either always fully consistent or minimize latency/zookie. I guess we would need to implement a consistency level similar to spicedb to determine when we want fully consistent checks versus ones with a zookie/no zookie.

I think this is a good point. We have assumed the existence of fully consistent checks in our candidate solutions, so I think we need them. Even if we were to suggest to notifications that they move from fully consistent to zookies/at least as fresh (with them managing zookies ad hoc for now), we couldn't actually support their use case fully with the current implementation.

The simplest change for right now might be to just add boolean fully_consistent to check (partially exposing consistency). What do you/others think?

(Other options include creating a new fully consistent check endpoint that could, for example, be locked down to inventory-api -- non-inventory uses maybe never get offered fully consistent anything? -- or merge the "fully consistent check" into the write endpoints, since a lot of the time we're using them to check access for a write. But my vote would be to keep things simple and just do the above.)

  • ReadRelationships may need to be fully consistent more often than not (or even always.) Rationale: the main usecase for directly reading relationships (vs subjects/resources) is if a client intends to follow it with a WriteRelationships call (+ precondition?) to make some change relative to the current state. A stale read would be a problem there. (Though open to there being other usecases. Also not sure anything calls ReadRelationships atm, or that anything outside the management fabric should.)

ReadRelationships can be passed a zookie in the request as well. Since we're not exposing the consistency to the user we don't have a way for them to determine what they're going to do next (unless we modify the api like above). If the main usecase is for directly following up with a WriteRelationships call then it would be appropriate to change to full consistency always.

If we add a fully consistent check (above), we'd have the option of always performing that and using the zookie for read and write. Yes, it's an extra hop, but we can functionally do everything and can optimise in the future when we'll have a better idea of our requirements.

@lennysgarage
Copy link
Contributor Author

Two thoughts around usecases for fullyconsistent alongside zookies:

  • Moving more toward Zanzibar..compliance? Normalness?..we'll need to be able to do fully consistent checks for non-view operations. Can we do that with this API change? It seems like it's either all fully consistent (by configuration) or minimize latency/zookie depending on whether or not one's passed.

Thanks for raising these points and with this change, no we won't be able to do that. We don't expose the consistency level in the API, so like you said we're either always fully consistent or minimize latency/zookie. I guess we would need to implement a consistency level similar to spicedb to determine when we want fully consistent checks versus ones with a zookie/no zookie.

I think this is a good point. We have assumed the existence of fully consistent checks in our candidate solutions, so I think we need them. Even if we were to suggest to notifications that they move from fully consistent to zookies/at least as fresh (with them managing zookies ad hoc for now), we couldn't actually support their use case fully with the current implementation.

The simplest change for right now might be to just add boolean fully_consistent to check (partially exposing consistency). What do you/others think?

(Other options include creating a new fully consistent check endpoint that could, for example, be locked down to inventory-api -- non-inventory uses maybe never get offered fully consistent anything? -- or merge the "fully consistent check" into the write endpoints, since a lot of the time we're using them to check access for a write. But my vote would be to keep things simple and just do the above.)

Big agree on adding a fully_consistent bool to the check request that can be utilized to handle fully consistent checks when needed.

Signed-off-by: Jonathan Marcantonio <jmarcant@redhat.com>
@wscalf
Copy link
Contributor

wscalf commented Jan 31, 2025

The one thing I'll say on a bool versus a separate endpoint (like the ContentChangeCheck endpoint from Zanzibar) is it may be misleading to callers in a way that trends toward misuse- it's framing the request around how fresh you want the results to be (which is always the freshest possible, right?) rather than what you intend to do with them (authorize a view request vs authorizing something else.)

We'll almost have to make that distinction in inventory because what it does with those two requests is pretty different, but I think it makes the API a little more self-evident here as well. (Put another way- I think it'd feel more natural to tell people to call CheckForUpdate or ContentChangeCheck when authorizing a change and Check or even CheckForRead/ReadCheck/etc to authorize reading data vs telling them to set fullyconsistent to true when making changes and false otherwise, though at the end of the day the two approaches are effectively the same.)

Signed-off-by: Jonathan Marcantonio <jmarcant@redhat.com>
@@ -49,3 +49,8 @@ message ObjectType {
string namespace = 1 [(buf.validate.field).string.min_len = 1];
string name = 2 [(buf.validate.field).string.min_len = 1];
}

// The Zookie is used to provide consistency between write and read requests.
message Zookie {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about exposing the "Zookie" name (and the exact concept) itself directly in the API.

What I mean by this is that we will probably benefit from making a bit more generic concept that fits better with other Zanzibar/FGA implementations, but still provides the same or similar API semantics.

e.g. something like maybe Revision or ReadToken or ConsistencyToken ... etc. We need to convey, "Here's your response and a token. You can use this token later to get data at least as recent as from this request. We promise nothing else about what this token means."

This value needn't be a logical timestamp as in a Zookie. It could simply be an opaque string that means "for this request, use full consistency," which would be compatible with OpenFGA, for example. OSO Cloud's "consistency token" is similar as well – it just guarantees reading-your-writes, but otherwise doesn't refer to a particular snapshot (in their case looks like a particular shard). For completeness, Permify, SpiceDB, and likely Warrant (now WorkOS FGA) use more by-the-book zookies.

The guarantees may not all be rigorously the same with different implementations, so I don't want to go down that rabbit hole. But for now we have to remain open to working with different implementations, even if there is the caveat that they have slightly different semantics. (In these examples though, for what we need, I think they would all work.)

This point is true regardless of whether we expose this in protos or not – even in our internal abstraction, we should remain as agnostic of the "Zanzibars" as possible until we absolutely have to commit to limiting that.

@@ -29,11 +29,13 @@ message LookupResourcesRequest {
string relation = 2 [(buf.validate.field).string.min_len = 1];
SubjectReference subject = 3 [(buf.validate.field).required = true];
optional RequestPagination pagination = 4;
optional Zookie zookie = 5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want a little more indirection here, although if this becomes an internal API this is not a big deal because we can refactor easily. But as a proto, I'd probably want more like a Consistency message, of which one option is "use this token to get me at least as recent as whatever that token is bound to," leaving room for other options in the future without having to update every request where the token is appropriate input. (This is similar to how SpiceDB does it in this case, for good reason I think.)

Comment on lines 29 to 30
Zookie zookie = 4;
bool fully_consistent = 5; // defaults to false
Copy link
Member

@alechenninger alechenninger Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment below on a "Consistency" option – however in the case of check, I think we should go the Zanzibar route here and have an explicitly different RPC for a fully consistent check. This is easier to use correctly, and more obvious if you misuse it.

Offering "fully_consistent" as an option is an attractive footgun that gets you (from a client perspective) full consistency (yay) at the cost of killing our scalability (ouch), from our perspective.

So for a regular check, I'd offer an optional consistency option, of which fully consistent is not an option. And then for fully consistent checks, I'd offer a "ContentChangeCheck" (or similarly named) which is what you should use when you are well... changing content! (See internal design doc)

(Technically maybe another, later option is that we actually keep one endpoint, but automatically detect the consistency level to use based on schema. E.g. you can annotate a relation as "this one allows writes/content changes" and we use that. A bit more advanced though, so I'd probably save that for later if at all.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, agree on the points listed here. Added in the latest commit another rpc for full consistency check updates CheckForUpdate that remove the need of a consistency option bool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants