Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relatively small Unicode wins #290

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

Relatively small Unicode wins #290

wants to merge 5 commits into from

Conversation

hildjj
Copy link
Contributor

@hildjj hildjj commented Jun 11, 2022

A prototype for dealing with Unicode codepoints above U+FFFF. In JavaScript, you can spell these \u{1f41d} (U+1f41d: HONEYBEE 🐝).

Added:

  • Identifiers and "strings" can use escapes like \u{1f41d}. For non-BMP characters, these escapes generate two UTF-16 bytes.
  • Updated the definitions of character classes in parser.pegjs to match Unicode 14, allowing non-BMP identifiers (e.g.) without escaping
  • . now matches full non-BMP characters
  • Ensured that Unicode escapes can't be used in character classes
  • Ensured that unescaped non-BMP characters can't be used in character classes

@hildjj hildjj marked this pull request as draft June 11, 2022 17:01
@hildjj
Copy link
Contributor Author

hildjj commented Jun 11, 2022

We might decide to take some pieces of this, all of it, or continue on to add (perhaps-optional?) support for non-BMP character classes.

@hildjj
Copy link
Contributor Author

hildjj commented Jun 11, 2022

The MATCH_ANY portion of generate-js.js also needs to be fixed.

@@ -33,12 +33,15 @@ line
/ buzz

fizzbuzz = f:fizz _ b:buzz { return f + b }
fizz = @"fizz"i !{ return currentNumber % 3 }
buzz = @"buzz"i !{ return currentNumber % 5 }
\u0066izz 'fizz' = @"fizz"i !{ return currentNumber % 3 }
Copy link

@reverofevil reverofevil Jun 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(oh god not another language of this please)

image

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I'd rather ban everything outside of [a-z_]i[a-z0-9_]i.

@hildjj hildjj force-pushed the astral branch 3 times, most recently from cc5094e to fa92718 Compare June 24, 2022 06:55
@hildjj hildjj force-pushed the astral branch 2 times, most recently from d9787e0 to c21d7f6 Compare July 27, 2022 17:34
@frostburn
Copy link
Contributor

Breaks on this grammar: SharpSign = [#♯x𝄪]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants