-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support non-ascii case folding within i modifier #90
Support non-ascii case folding within i modifier #90
Conversation
'pattern': '(?i:[є-ґ])', | ||
'options': { modifiers: 'transform' }, | ||
'matches': ['\u0462', '\u0463', '\u1C87'], | ||
'expected': '(?:[\\u0404-\\u040F\\u0454-\\u0491\\u1C87])', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here U+1C87 ᲇ should be matched because the uppercase of U+1C87 is U+0462 Ѣ, well in range of [є-ґ]
, while the lowercase of U+0462 Ѣ is U+0463 ѣ. So the String#toLowerCase
approach in #80 will miss cases like that, which is why we have introduced iu-mappings
.
'expected': '(?:[Kk\\u212A])', | ||
}, | ||
{ | ||
'pattern': '(?i:\\u2C2F)', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also an example that a legacy ES5 engine might not support all case foldings in the Basic Plane, because the U+2C2F is introduced in Unicode 14.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks goods in principle, but I don't have enough domain knowledge to do a full review.
// https://mths.be/es6#sec-runtime-semantics-canonicalize-abstract-operation | ||
( | ||
if( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: there is some weird formatting going on here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(✔️ I have completed forgotten the context so I can give a review)
The idea of this PR is to further extend current iu-mappings to the BMP characters (U+0080 - U+FFFF) because we have to handle the i
modifier, as we generate a non-i
flag regex and simulate the i
behaviour in modified groups. While the approach in #80 works for most common characters, it will introduce platform-depending behaviours because old platform will not support new Unicode characters.
17d07d1
to
22a5353
Compare
Fixes #79
Closes #80
Most test cases are inherited from #80 while I also commented some differences.
@stulov Thank you for your work, which is very helpful.