Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add identifier syntax #296

Open
wants to merge 9 commits into
base: persisted-documents
Choose a base branch
from

Conversation

martinbonnin
Copy link
Contributor

Copy link
Contributor

@Shane32 Shane32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps only use quotes around strings and characters?

spec/Appendix A -- Persisted Documents.md Outdated Show resolved Hide resolved
spec/Appendix A -- Persisted Documents.md Outdated Show resolved Hide resolved
spec/Appendix A -- Persisted Documents.md Outdated Show resolved Hide resolved
martinbonnin and others added 2 commits July 21, 2024 15:52
@benjie
Copy link
Member

benjie commented Jul 22, 2024

It's interesting you should raise this now; I was working on Friday on a reworking of the persisted documents spec that relates to using the /graphql/<hash>/<operationId> URLs and so I also felt this need to define the identifiers. I think we should encourage these URLs as the default going forward, but that's a topic for a different PR; I'm not ready to open that PR just yet but here's the commit where I added the syntax (note that I also narrowed the allowed characters to make them URL-safe):

a3d9ecb

I present this only for comparison, in case it's helpful.

spec/Appendix A -- Persisted Documents.md Outdated Show resolved Hide resolved

IdentifierCharacter :: SourceCharacter but not `:`

SourceCharacter :: Any Unicode scalar value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incredibly broad; I know that's effectively what we have currently but I feel like we should lock it down a bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we could allow all Unicode characters, the main spec defines a specific set of characters for use in that document. Something similar could be done here.

SourceCharacter
U+0009
U+000A
U+000D
U+0020–U+FFFF

On a side note, the spec is defining these characters as UTF-16 codes, not Unicode code points.

I’m fine with allowing any Unicode character in this context, or any non control character would be fine too.

Copy link
Contributor Author

@martinbonnin martinbonnin Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had only alphanumericals initially and then backtracked to "any unicode" because "why not". But I think your point about having the identifier in url path is the important one. I'll move to alphanumerical + a bunch of unreserved chars like you did here. Questions:

  • do we even need ~, ., _, -?
  • do we need uppercase letters? This might cause issues on case-insensitive filesystems (hello macOS 👋 )

Edit: updated in 5d48e0e

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly recommend a wide range of characters. URLs can escape any Unicode character so there need be no limit on what characters can be used for document ids. Also, url paths are by definition case sensitive; but they may be treated with no case sensitivity. For instance, if the document id was a guid it would need dashes. If it was base64 it would need lowercase and uppercase and a couple extra characters. If it was readable names, it would need to support Unicode for foreign languages. And a custom implementation may decide to treat the document id with no case sensitivity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the document id was a guid it would need dashes

Excellent point.

If it was base64 it would need lowercase and uppercase and a couple extra characters

Another excellent point. +, / and = would be needed too. Which would make them a bit awkard to use as url path segments... Interesting!

If it was readable names, it would need to support Unicode for foreign languages

I think I disagree on this one. I don't really see anyone using readable names as identifiers? Sounds like a footgun to me that if we can, we should discourage. Just like the relay cursors are encouraged to be opaque, I would encourage the document ids to be similarly opaque.

Tangential: I have a GraphQL-over-HTTP meeting in my calendar on Thursday. Is that still happening? Want to discuss over there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @Shane32; : sits in that grey area where you don't strictly need to escape it, but it's best practice to do so. We should change that to a URL-safe character.

Copy link
Member

@benjie benjie Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the ! suggestion best I think.

I think we can change the spec text to that without making existing : usage non-compliant - it just becomes a custom ID rather than prefixed ID.

Looks like ! gets encoded to %21 via new URLSearchParams({"documentId":"sha256!abcdef"}).toString()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* seems to work with both URLSearchParams and encodeURIComponent, and I don't think it's normally used in encodings so should be safe... I'll need to look up in the relevant specs to see if it's not meant to be used for some reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spec for URLSearchParams:

The application/x-www-form-urlencoded percent-encode set contains all code points, except the ASCII alphanumeric, U+002A (*), U+002D (-), U+002E (.), and U+005F (_).

So ~ is probably encoded also.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, yeah. Actually I remember I deliberately chose a character outside of unreserved in RFC3986 for the separator, so I'm okay with : being URL encoded. But * might be a better choice to avoid it.

GitHub renders the URLs okay:

spec/Appendix A -- Persisted Documents.md Outdated Show resolved Hide resolved
- move `x-` and `sha256` to validation
- allow `:` in prefixed identifier payloads
- add link to RFC3986
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants