Support non-ascii case folding within i modifier #90

JLHwung · 2023-09-30T00:07:10Z

Fixes #79
Closes #80

Most test cases are inherited from #80 while I also commented some differences.

@stulov Thank you for your work, which is very helpful.

JLHwung · 2023-09-30T00:13:35Z

tests/fixtures/modifiers.js

+		'pattern': '(?i:[є-ґ])',
+		'options': { modifiers: 'transform' },
+		'matches': ['\u0462', '\u0463', '\u1C87'],
+		'expected': '(?:[\\u0404-\\u040F\\u0454-\\u0491\\u1C87])',


Here U+1C87 ᲇ should be matched because the uppercase of U+1C87 is U+0462 Ѣ, well in range of [є-ґ], while the lowercase of U+0462 Ѣ is U+0463 ѣ. So the String#toLowerCase approach in #80 will miss cases like that, which is why we have introduced iu-mappings.

JLHwung · 2023-09-30T00:17:50Z

tests/fixtures/modifiers.js

+		'expected': '(?:[Kk\\u212A])',
+	},
+	{
+		'pattern': '(?i:\\u2C2F)',


This is also an example that a legacy ES5 engine might not support all case foldings in the Basic Plane, because the U+2C2F is introduced in Unicode 14.

nicolo-ribaudo

This looks goods in principle, but I don't have enough domain knowledge to do a full review.

nicolo-ribaudo · 2024-09-12T09:28:12Z

scripts/case-mappings.js

 		// https://mths.be/es6#sec-runtime-semantics-canonicalize-abstract-operation
-		(
+		if(


Nit: there is some weird formatting going on here

JLHwung

(✔️ I have completed forgotten the context so I can give a review)

The idea of this PR is to further extend current iu-mappings to the BMP characters (U+0080 - U+FFFF) because we have to handle the i modifier, as we generate a non-i flag regex and simulate the i behaviour in modified groups. While the approach in #80 works for most common characters, it will introduce platform-depending behaviours because old platform will not support new Unicode characters.

JLHwung commented Sep 30, 2023

View reviewed changes

nicolo-ribaudo approved these changes Sep 12, 2024

View reviewed changes

JLHwung commented Sep 18, 2024

View reviewed changes

JLHwung requested a review from mathiasbynens September 18, 2024 15:15

JLHwung added 2 commits September 18, 2024 14:06

Support non-ASCII case folding within i modifier

28ad019

rename iu-mappings to case-mappings

22a5353

JLHwung force-pushed the support-non-ascii-case-folding-i-modifier branch from 17d07d1 to 22a5353 Compare September 18, 2024 18:07

add data/i-bmp-mappings.js to the publish files

ad86715

JLHwung merged commit 26e03fc into mathiasbynens:main Oct 2, 2024
3 checks passed

JLHwung deleted the support-non-ascii-case-folding-i-modifier branch October 2, 2024 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support non-ascii case folding within i modifier #90

Support non-ascii case folding within i modifier #90

JLHwung commented Sep 30, 2023

JLHwung Sep 30, 2023

JLHwung Sep 30, 2023

nicolo-ribaudo left a comment

nicolo-ribaudo Sep 12, 2024

JLHwung left a comment •

edited

Loading

Support non-ascii case folding within i modifier #90

Support non-ascii case folding within i modifier #90

Conversation

JLHwung commented Sep 30, 2023

JLHwung Sep 30, 2023

Choose a reason for hiding this comment

JLHwung Sep 30, 2023

Choose a reason for hiding this comment

nicolo-ribaudo left a comment

Choose a reason for hiding this comment

nicolo-ribaudo Sep 12, 2024

Choose a reason for hiding this comment

JLHwung left a comment • edited Loading

Choose a reason for hiding this comment

JLHwung left a comment •

edited

Loading