- Generally added some documentation.
- Added
CharMap#{size,entryCount}
to get the size of a map. - Added
CharMap#copy
to create a (mapped) copy of a map. - Added
StringSet#{is{Proper,}{Subset,Superset}Of,isDisjointWith}
for set relations. - Added
UnicodeSet#{is{Proper,}{Subset,Superset}Of,isDisjointWith}
for set relations. - Added
UnicodeSet#{wordSets,maximum}
for a more ergonomic API. UnicodeSet#{equals,union,intersect,without}
allowCharSet
s.
- Made
StringSet#words
private. This should have never been public in the first place. - Made
StringSet.empty
readonly. - Removed ignored
range
parameter fromCharMap#entries
. - Fixed some bugs in
StringSet#{equals,union}
. - Removed
StringSet#{filter,map}
. These methods should have never been public in the first place.
- Added support for the
v
flag.- This significantly changes the interfaces and types around the parser.
- There are now 2 never classes
JS.UnicodeSet
andJS.StringSet
to represent a Unicode set with strings. - Much more.
combineTransformers
has been deprecated. UseCombinedTransformer
instead.
JS.parseCharSet
andJS.parseUnicodeSet
have been added to easily parse a character AST into aCharSet
orUnicodeSet
.JS.toLiteral
now supports thev
flag.- Added
CharSet#fromCharacter
to easily create a character set from a single character. - Allow string argument for
JS.Parser.fromLiteral
. - Added
CombinedTransformer
class to combine multiple transformers into one. - Added transform events which allows the caller to observe the transformation.
- Added
Transformers.makeGreedy
to make quantifiers greedy whenever possible. - Added
Transformers.simplify
as a stable way to get the best combination of transformers to simplify a regex.
- Transformers can now have an optional name.
- Major internal improvements to some transformers, especially
applyAssertions
.
isEmpty
has been fixed.Transformers.moveUpEmpty
should now work correctly.- Transformers are now guaranteed to be called with the correct
this
argument.
- Upgraded to
@eslint-community/regexpp
v4.5.0 and dropregexpp
. - Drop support for NodeJS 10.
- Changed default character-set-to-string function of
{DFA,ENFA,NFA}#toDot
toCharSet#toUnicodeString
. - Changed character-set-to-string function of
{DFA,ENFA,NFA}#toString
toCharSet#toUnicodeString
. - Renamed
ToDotInfo
toNodeInfo
. - Removed
createSimpleToDotOptions
.
- Added
toMermaid
as part of theFAIterators
namespace andFiniteAutomaton
interface. - Added a unified interface for the namespaced
toDot
andtoMermaid
functions. - Many DFA, ENFA, and NFA operations now take optional node factory arguments to control the number of nodes created. All operations that create nodes no take factory arguments.
{DFA,ENFA,NFA}.emptyWord
will create a new FA that makes exactly the empty word.- Added
withInitial
,withGetOut
, andwithIsFinal
to more easily derive new FA iterators. - Added
assertions: "ignore"
to JS parser,ENFA.fromRegex
, andNFA.fromRegex
. This is mostly for convenience and performance. The same behavior could previously be achieved using transformers. - Added
CharSet#toUnicodeString
to provide an easy way to convert a character set into a human-readable string. - Added
CharSet#isProper{Subset,Superset}Of
. CharSet#equals
now supportsCharRange
s.
- Fixed
ENFA#isEmpty
for non-normalized graphs.
{DFA,NFA}.fromCharSet
and{ENFA,NFA}.all
now use constructions with fewer states.approximateRejectingWordSet
will now returnundefined
instead of throwing an error if the input character set is empty.- Changed behavior of
ENFA#countNodes
to be consistent with NFA and DFA. - Upgraded from Unicode 13.0.0 to Unicode 15.0.0.
- Generally added and improved documentation.
CharSet
: Theintersect
andwithout
methods now only takeCharSet
s andCharRange
s as arguments.{DFA,ENFA,NFA}#{isDisjointWith,getIntersectionWords,getIntersectionWordSets}
were removed.- Removed
NodeList
API for all FA implements. This is a very significant change as to how FAs are implemented but doesn't affect the main FA APIs too much. This change gives users a lot more control over FA implements. - Removed
{DFA,ENFA,NFA}.CreationOptions
interfaces. Use the newNodeFactory
API instead. - Removed
{DFA,ENFA,NFA}#options
. Just pass the FA as is instead. - Removed
FACreationOptions
interface. getIntersection{Iterator,Words,WordSets}
,isDisjointWith
: Replaced optionalFACreationOptions
parameter with optionalmaxNodes
parameter.- DFA nodes can now only be linked using
CharSet
s. Linking withChar
s andCharRange
s is no longer supported. - Removed
ENFA
'sunorderedResolveEpsilon
. UseresolveEpsilon
instead. FAIterator.MapFABuilder
: Removed optionalkind
argument.
NodeFactory
interface. This new interface is the basis of allFABuilder
s and FA implemented.{DFA,ENFA,NFA}.nodeFactory
: This is an unlimited node factory.{DFA,ENFA,NFA}.LimitedNodeFactory
: This node factory can be used to limit the number of nodes an FA operation is allowed to create.CharSet#resize
CharSet.fromRange
FAIterator.fromWords
will create a new iterator from a list of words.FAIterator.fromWordSets
will create a new iterator from a list of word sets.{DFA,ENFA,NFA}.fromCharSet
will create new FAs from a givenCharSet
.{DFA,ENFA,NFA}.fromWordSets
will create new FAs from a list of word sets.{DFA,ENFA,NFA}#countNodes
will return the number of nodes in the FA.{DFA,ENFA,NFA}#nodes
will iterate through all nodes in the FA.ENFA.withoutEmptyWord
ENFA#{append,prepend,union}Into
will move the nodes of the given ENFA instead of copying them. This can be used to improve performance.
JS.toLiteral
: Settingunicode: false
in theflags
option will now always succeed.
CharMap
is now implemented using a sorted array instead of an AVL tree. This is significantly faster. Most DFA operation are now 10% faster.FAIterators.iterateWordSets
will now use the natural iteration order of the given FAIterator for words of the same length. This makesENFA#wordSets
a lot more logical.ENFA
'sresolveEpsilon
is now implemented non-recursively.- The docs now have a dark mode thanks to TypeDoc v0.22.0.
- Fixed that some ENFA operations created unnecessary states.
- Fixed internal functions used to traverse graphs. This fixes the bug that some
FAIterators
functions had trouble with falsy state values.
JS.toLiteral
: The heuristic used to decide flags has been improved to prevent unnecessaryi
flags.
FAIterators.intersection
no longer accepts options.- Removed the
IntersectionOptions
interface. Use the newFACreationOptions
interface or any of the FA-specific interfaces instead. - Removed custom equality functions for
CharMap
. - The constructor of
FAIterators.MapFABuilder
changed slightly. It now accepts arguments as parameters instead of as an object. - Some renaming:
FAIterator#deterministicOut
->FAIterator#stableOut
FAIterators.ensureDeterministicOut
->FAIterators.ensureStableOut
CharMap#{delete,set}Every
->CharMap#{delete,set}Range
- Added support for the new JS RegExp
hasIndices
flag. - New
WordSet
andReadonlyWordSet
types. - New
CharBase
class. This provides methods to remap alphabets. - Added
CharMap#clear
. - Added
CharMap#filter
. - Added
CharMap#invert
to convertCharMap
s toMap
s. - Added
CharMap#setCharSet
to more efficiently set many ranges. - Added
CharSet#characters
to iterate over all characters in a set. - Added
CharSet#toRangesString
to print only the ranges of a set. - Added
CharSet.fromCharacters
to create a set from a collection of characters. FAIterators.shortestAcceptingPath
returns the shortest accepting path of arbitrary iterators.FAIterators.shortestWordSet
returns the shortest accepted word set of an iterator.FAIterators.makeInitialFinal
andFAIterators.makeInitialNonFinal
changes whether the initial state is also a final state.FAIterators.approximateRejectingWordSet
tries to find a rejected word set.FAIterators.makeDeterministic
builds a deterministic version of an iterator. This is a general DFA construction.Words.wordSetsToWords
converts a collections of word sets into a collection of words.
- refa is now allowed to assume all given
Char
s andCharRange
s conform to the guarantees given by the interface. This includes guarantees that cannot be verified at compile time (e.g.min <= max
forCharRange
). Words.pickMostReadableWord
will now always return a word.{DFA,ENFA,NFA}#{isDisjointWith,getIntersectionWords,getIntersectionWordSets}
are now deprecated and will be removed in future releases.Words.wordSetToWords
is now deprecated. UseWords.wordSetsToWords
instead (mind the s).
ReadonlyCharMap#isEmpty
is now a readonly property.- Fixed
JS.Parser
incorrectly caching parsed characters. - Fixed
JS.Parser
incorrectly canonicalizing Unicode property escapes. - Fixed
DFA.NodeList#removeUnreachable
removing reachable states sometimes.
- Many, many minor improvements (code quality, documentation, etc.).
- 10x faster
wordSetToWords
. All methods iterating words will now be faster. - Pretty much all DFA operations will be faster. DFA minimization is up to 20x faster.
- Faster NFA creation.
- New RE AST node:
Unknown
. This node is used to represent parts of a regex that cannot be represented using RE AST. - Removed
JS.ParseOptions.disableOptimizations
. UseJS.ParseOptions.simplify
instead. - Removed
TransitionIterableFA
interface. TransitionIterable
is now generic over the state type.- Renamed
{DFA,NFA}.intersectionWordSets
togetIntersectionWordSets
. - Renamed
{DFA,NFA}.intersectionWords
togetIntersectionWords
.
- ENFA - a non-deterministic finite automaton with epsilon transitions.
- FAIterators - a new namespace containing methods can consume and produce FA iterators.
- New
toDot
method for finite automata. This will make it easier to visualize the state machines. - New
isDisjointWith
,getIntersectionWordSets
, andgetIntersectionWords
functions. These free functions can be used with any FA types. - New
JS.ParseOptions.simplify
option. - New
FAIterator.deterministicOut
property. - New
TransitionIterator
. (This only gives an already commonly used type a name.) - New
MaxCharacterError
for incompatible finite automata. - New
FABuilder
interface to allow algorithms to construct FA without knowing the actual FA implementation. {DFA,ENFA,NFA}.NodeList
: Added staticwithLimit
method to be able to limit the number of nodes aNodeList
is allowed to create.
JS.Parser.parseElement
now accepts more parsable elements.JS.toLiteral
now accepts any RE AST node.NFA.fromRegex
now accepts any RE AST node.{DFA,ENFA,NFA}.NodeList
now implement theFABuilder
interface.- Many, many internal changes that do not affect the API.
DFA.fromIntersection
now correctly computes the intersection for non-DFA arguments.getBaseSets
(a util function) now guarantees O(n*log n) run time. I accidentally implemented this in O(n^2) before which caused some DFA operations to be extremely slow.iterateWordSets
(a util function) now correctly eliminates dead states. This fixes the bug that some FA with infinite languages only yielded finitely many words when iterating over them.
- Backreferences in ignore-case JS RegExps are now correctly resolved (#25).
FiniteAutomaton.test
now requires aReadonlyArray
instead of anIterable
.Words.wordSetToWords
now returns anIterable
instead of anIterableIterator
.- Removed
toPatternString
function. - Removed
NFA.FromRegexOptions.disableLookarounds
. UseNFA.FromRegexOptions.assertions
instead. - AST format: Quantifier nodes now have a lazy property to enable non-greedy quantifiers.
JS.Parser
no longer implementsJS.Literal
. Use theJS.Parser.literal
property instead.JS.Parser
now resolves backreferences differently. It now supports resolving capturing groups with finite small languages. How small the language is required to be can be controlled via the newJS.ParseOptions.maxBackreferenceWords
option (defaults to 100 words).JS.ParseOptions.backreferences
also works differently now. See theJS.ParseOptions
documentation for more details.- Some renaming:
JS.ParseOptions.lookarounds
->JS.ParseOptions.assertions
ToRegexOptions.maximumNodes
->ToRegexOptions.maxNodes
ToRegexOptions.maximumOptimizationPasses
->ToRegexOptions.maxOptimizationPasses
Words.fromStringToUTF16
now works properly.JS.toLiteral
will now properly detect predefined character sets in character classes. This didn't work properly before.
- Documentation. A lot of code documentation and a TypeDoc-generated website have been added.
- New
Char
,Word
, andReadonlyWord
types replace the old plain number and iterable types. - AST transformers. They can efficiently modify a given AST and are used to e.g. apply assertions.
JS.ParseOptions
now has amaxNodes
option to limit the size of the parsed AST.JS.Parser
now has amaxCharacter
property.
NFA.test
now implements Thompson's algorithm which guarantees efficient execution.- The
toRegex
methods of the DFA and NFA classes now use AST transformers under the hood to produce smaller ASTs. - The default value of
ToRegexOptions.maxOptimizationPasses
is now implementation-defined.
DFA#clone
has been renamed toDFA#copy
to be compatible withNFA#copy
.- The
source
property of RE AST nodes is now optional. This results in the removal/change of several types and functions. TheSimple
type has been removed; useNoParent
instead.
JS.toLiteral
now has aflags
options to force/disallow certain flags and afastCharacters
options for up to 10x better performance.JS.toLiteral
now detects builtin assertions.
- All DFA and NFA creation methods now have safe defaults and will throw if the FA that is being created is too large. The limit can be controlled using the
maxNodes
option.
- Remove
NFA.fromDFA
andDFA.fromNFA
. Use{DFA,NFA}.fromFA
instead. - Renamed
NFA#removeEmptyWord
toNFA#withoutEmptyWord
. - All
NFA
/DFA
-specific interfaces/classes now live in theNFA
/DFA
namespace.
- New
FAIterator
interface. - New options for
#toRegex
functions. NFA#reverse
will reverse the accepted language.CharSet#commonCharacter
can used to return any character two sets have in common.- New
Words
methods to pick as-readable-as-possible words or characters from a set of words or characters.
- Unified
DFA
/NFA
API. This givesDFA
a lot of functionality that used to beNFA
-exclusive. JS.toLiteral
will now print smaller character classes.- New build system.
- A few minor bug fixes.
- NFA intersection performance optimizations by using lazy data structures.
NFA#disjointWith
is now a lot faster.
- Fixed NFA intersection always discarding the empty word. The intersection can now contain the empty word.
- Removed
JS.toSource
. UseJS.toLiteral
instead.
NFA#suffixes()
will change the NFA to accepts all suffixes of its language.- New options for intersection operations.
- Added a method to count the number of nodes in DFA and NFA.
DFA#complement()
will make the DFA match the opposite language.- New options for the DFA creation operation.
- New
ReadonlyDFA
interface. - New
ReadonlyCharMap<T>
interface. CharSet#compare(CharSet)
will compare two character sets. This can be used to sort character sets.- Added regex stress test. This will check common operations on all 2.5K regexes of the PrismJS syntax highlighter.
- The RE produced by the
toRegex
functions is now optimized to be as minimal as possible. JS.toLiteral
will now make efficient use of flags to produce smaller literals.
- Fixed and improved
DFA.minimize()
. - Fixed
CharMap
's AVL tree implementation. - Underlying implementation of the
toRegex
functions will now the correct AST and that much faster. - Fixed
filterMut
in util. toPatternString
now correctly handles quantified empty concatenations.
- The prefix and suffix optimization removed final states from the graph but not from the set of final states.
- The intersection algorithm can now handle final states not reachable from the initial states of either NFAs.
- New
JS.Parser
API. - Removed
MutSimple
interface.
CharSet#size
returns the number of characters in the set.NFA#prefixes()
will change the NFA such that it accepts all prefixes of itself.- New
ReadonlyNFA
interface. - New
FAIterator
interface as an abstraction over different FA representations. This is now the basis for virtually all FA operations that don't change the underlying data structure. - New
NoParent
andNoSource
interface for AST nodes.
- Better
Flags
interface for JS functions. - Some minor improvements.
- The prefix and suffix optimizations of the NFA construction didn't check the equivalence of states correctly.
- Added a
prepublish
script to actually publish changes.
- Calling
CharSet#isSubsetOf
caused a stack overflow because of infinite recursion.
- New
CharSet
API. This renames some ofhas*
methods, to instead use names based on set relations. NFA#concat
is now calledappend
.NodeList#final
is now calledfinals
for both DFA and NFA nodes lists.NFANode#in
andNFANode#out
are now of typeReadonlyMap
instead ofMap
.
- New options for
NFA.fromRegex
to disable assertions instead of throwing an error and to simplify quantifiers with a huge maximum. - New
NFA#prepend
method. - Added changelog.
CharSet#toString
now uses a simple hexadecimal format.
NFA#append
modified the given NFA.
- Fixed repository link in
package.json
.
Initial release