Regex Flavors Family Index
type: language-family-index family: regex-flavors languages_catalogued: 22 tags: [language-reference, family-index, regex-flavors, regular-expressions, pattern-matching, pcre, re2, redos, unicode]
Regex Flavors — Family Index
Family overview
Regex has two histories that don’t quite fit together. The theoretical history starts with Stephen Kleene’s 1956 paper on “regular sets” — a closure-algebra description of the languages recognised by finite automata, equivalent in expressive power to NFAs and DFAs and provably decidable in O(n) time. The practical history starts with Ken Thompson’s 1968 paper “Regular Expression Search Algorithm,” which became the regex engine in ed, then grep, then the IEEE POSIX.2 standard of 1986 (POSIX BRE and ERE). Up through this point, “regex” still meant “regular language” in the formal sense — every supported feature could be expressed as a finite automaton and matched in linear time.
Then came Perl. Larry Wall shipped Perl 1.0 in 1987 with a pragmatic regex engine, and across Perl 2/3/4/5 (1988–1994) it accreted features that broke true regularity: backreferences (\1, which require remembering arbitrary captured substrings, pushing the language up to NP-complete in the worst case), lookaround assertions ((?=...), (?<=...), (?!...), (?<!...)), atomic groups ((?>...)), possessive quantifiers (*+, ++, ?+), conditionals ((?(1)yes|no)), and recursion ((?R), (?1)). Philip Hazel extracted this dialect into the standalone PCRE library in 1997, and PCRE became the de facto Ur-flavor that everyone else cloned, extended, or deliberately departed from. PCRE2 (the 10.x series, current 10.47 released October 2025) is the modern continuation; the original PCRE 8.x is end-of-life.
The Perl/PCRE family of engines is implemented as backtracking NFAs — when a quantifier is ambiguous, the engine tries one path and rewinds if it fails. This is fast on most inputs but has a notorious worst case: catastrophic backtracking, where a pattern like ^(a+)+$ matched against "aaaaaaaaaaaaaaa!" takes exponential time. Russ Cox documented this in his 2007 series “Regular Expression Matching Can Be Simple and Fast”, reviving Thompson’s NFA-simulation algorithm and showing that for the truly regular subset (no backreferences, no lookarounds), linear-time matching is straightforward. This work became RE2 at Google (2010, C++), and inspired a whole lineage of “RE2-style” engines — Go’s regexp (standard library), Rust’s regex crate, and parts of .NET’s RegexOptions.NonBacktracking mode. The modern split is: backtracking engines (PCRE2, .NET default, Java java.util.regex, JavaScript V8/JSC, Python re and regex, Ruby Onigmo) accept the full Perl-extended dialect including backreferences and lookarounds and risk ReDoS; automata engines (RE2, Go, Rust, Hyperscan, .NET non-backtracking) refuse the irregular features and guarantee linear time. ReDoS as a security category became a CVE-able class around 2012–2017 and is now a routine finding in static analysis tools.
The fourth axis is Unicode. The Unicode Consortium’s UTS #18 defines three conformance levels for regex Unicode support — Level 1 (basic code-point handling, simple loose matches, basic property classes like \p{L}), Level 2 (extended grapheme clusters, full case folding, named characters, default-ignorable handling), and Level 3 (locale-tailored). Most engines hit Level 1; only a few (ICU, Python regex, .NET, increasingly JS with /v) push into Level 2. JavaScript’s /v flag (ES2024, Stage 4 in 2023, shipping in V8 11.2 / Chrome 112 / Safari 17 / Node.js 20) is the most consequential recent addition: it enables Unicode “set notation” — nested character classes, set difference ([A--B]), set intersection ([A&&B]), multi-character string properties via \p{...} and \q{...}, and proper case-insensitive matching for negated property sets. It supersedes the older ES2015 /u flag for any new Unicode-aware regex work.
In our deep library
Languages with first-class regex stories that have their own deep notes:
- perl — the Ur-flavor; PCRE descends from Perl 5.x, and many Perl extensions were retrofitted into PCRE2.
- python — built-in
re(limited, ASCII-default\w) and the third-partyregexmodule (variable-length lookbehinds, possessive quantifiers, atomic groups, full Unicode property support,concurrent=TrueGIL release). - javascript — V8/JSC regex engines;
/u(ES2015) and/v(ES2024) flags, lookbehinds (ES2018), named groups (ES2018), Unicode property escapes (ES2018). - java —
java.util.regex.Pattern, backtracking NFA,UNICODE_CHARACTER_CLASSflag for UTS #18 Level 1 conformance, named groups since Java 7. - csharp —
System.Text.RegularExpressions, source-generated regexes via[GeneratedRegex](.NET 7+),RegexOptions.NonBacktracking(.NET 7+, derivative-based linear-time engine from MSR). - ruby — Onigmo (since Ruby 2.0), forked from Oniguruma which was archived in April 2025; backtracking NFA with broad encoding support.
- go —
regexppackage, RE2 lineage, deliberately rejects backreferences and lookarounds for O(n) guarantees. - rust —
regexcrate by Andrew Gallant, RE2-style with Rust-specific optimisations, paired with the lower-levelregex-automataand the third-partyfancy-regexfor Perl-like features. - cpp —
std::regex(C++11, ECMAScript dialect by default + POSIX modes), Boost.Xpressive (compile-time + run-time regex via expression templates), Boost.Regex. - php —
preg_*family wraps PCRE2 directly. - bash —
=~operator uses POSIX ERE;grep,sed,awkflavors covered below. - lua — uses Lua patterns, deliberately not a full regex flavor (no alternation, no backtracking — see “Notable threads” below).
Adjacent Tier 3 notes:
- query — SQL
LIKE,SIMILAR TO(POSIX-flavor in PostgreSQL/Snowflake), and dialect-specificREGEXP_*functions; Splunk SPL’srex; Lucene query parser regex. - config-and-dsl —
.gitignore, Apachemod_rewrite, and assorted config DSLs that embed regex or glob fragments. - notation-spec — ABNF, EBNF, PEG; the family of grammar formalisms that subsumes regex.
Tier 3 family table
| Flavor | First appeared | Origin | Engine type | Notable features | Status (2026) | URL |
|---|---|---|---|---|---|---|
| POSIX BRE | 1986 (POSIX.2) | IEEE / Open Group | NFA-simulation, regular language | Basic Regular Expressions: backslash-escaped metacharacters (\(, \), \{, \}); the dialect of grep (no flag) and traditional sed; foundational and intentionally minimal | Stable, mostly legacy outside Unix tooling | https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html |
| POSIX ERE | 1986 (POSIX.2) | IEEE / Open Group | NFA-simulation, regular language | Extended Regular Expressions: bare metacharacters ((, ), {, }, +, ?, ` | ); the dialect ofegrep/grep -E,awk, modernsed -E` | Stable; the lingua franca of POSIX text tooling |
| Perl 5 regex | 1987 (Perl 1) → mature in Perl 5 (1994) | Larry Wall | Backtracking NFA | The Ur-flavor: lookarounds, named captures ((?<name>...)), atomic groups ((?>...)), possessive quantifiers (*+), conditionals, recursion ((?R), (?&name)), \K keep-out, embedded code (?{...}) | Active; tracks Perl release cadence | https://perldoc.perl.org/perlre |
| PCRE2 | 1997 (PCRE 1.0) → 2015 (PCRE2 10.0) | Philip Hazel, University of Cambridge | Backtracking NFA + JIT | Standalone C library cloning Perl 5 regex; UTF-8/16/32 modes; pcre2grep CLI; widely embedded (PHP, nginx, Apache, R, many languages); current 10.47 (Oct 2025) | Very active; PCRE 8.x is EOL, PCRE2 is the supported line | https://www.pcre.org/ |
| RE2 | 2010 | Russ Cox / Google | NFA-simulation, regular language | Linear-time guarantee for arbitrary input; no backreferences, no general lookarounds; bounded memory; safe for adversarial patterns; C++ library | Very active (used in Google production, Cloud Logging, code search) | https://github.com/google/re2 |
Go regexp | 2012 (Go 1.0) | Russ Cox / Go team | RE2 port in Go | RE2 syntax exactly; same restrictions (no backreferences, no lookarounds); deliberate language-level commitment to ReDoS safety; regexp/syntax exposes the AST | Stable; standard library | https://pkg.go.dev/regexp |
Rust regex crate | 2014 (crate v0.1) | Andrew Gallant (“BurntSushi”) | RE2-style NFA/DFA hybrid | RE2 lineage in syntax and guarantees; rewritten internally as regex-automata (multiple matching strategies); paired with fancy-regex for backref/lookaround if needed | Very active; de facto Rust ecosystem standard | https://docs.rs/regex/ |
Java java.util.regex | 2002 (Java 1.4) | Sun Microsystems | Backtracking NFA | Pattern/Matcher API; named groups since Java 7; UTS #18 Level 1 with UNICODE_CHARACTER_CLASS flag (or (?U) inline); \p{InGreek} Unicode block syntax; lookarounds and backreferences supported | Active (tracks JDK releases; Java 21/22/23 stable) | https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/regex/Pattern.html |
.NET System.Text.RegularExpressions | 2002 (.NET 1.0) | Microsoft | Backtracking NFA (default) + DFA mode + non-backtracking mode | Default backtracking with rich Perl-like features (balancing groups for matched-paren parsing — unique to .NET); RegexOptions.Compiled (IL emit, since .NET 1.0); [GeneratedRegex] source-generated regex (.NET 7, 2022); RegexOptions.NonBacktracking linear-time mode (.NET 7, 2022, derivative-based MSR engine) | Very active | https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expressions |
JavaScript regex (/u, /v) | 1997 (ES1) → /u (ES2015) → /v (ES2024) | Brendan Eich → ECMA TC39 | Backtracking NFA (V8 Irregexp, JSC YarrJIT, SpiderMonkey) | Built into the language as a literal syntax (/pattern/flags); /v flag (ES2024, TC39 Stage 4 in 2023) added set notation [A--B] / [A&&B], nested classes, \q{...} string properties; lookbehinds + named groups since ES2018; sticky /y since ES2015 | Very active; /v shipping in V8 11.2 / Chrome 112 / Safari 17 / Node.js 20 (all 2023 era) | https://tc39.es/ecma262/#sec-regexp-regular-expression-objects |
Python re | 1997 (Python 1.5) | Guido van Rossum / core team | Backtracking NFA | Standard library; ASCII-default \w (Unicode requires re.UNICODE / (?a) flags); lookaheads + fixed-length lookbehinds only; named groups; no possessive quantifiers, no atomic groups (until Python 3.11), no recursion | Active; Python 3.14 docs current | https://docs.python.org/3/library/re.html |
Python regex (PyPI) | 2009 | Matthew Barnett | Backtracking NFA | Drop-in re superset: variable-length lookbehinds, possessive quantifiers, atomic groups, recursive patterns, \p{Script=Greek} properties, grapheme clusters \X, concurrent=True GIL release, fuzzy matching | Very active (latest release April 2026) | https://pypi.org/project/regex/ |
| Ruby Onigmo | 2002 (Oniguruma) → 2011 (Onigmo fork) → Ruby 2.0 (2013) | K. Kosako (Oniguruma) → K. Takata (Onigmo) | Backtracking NFA | Multi-encoding (UTF-8, EUC-JP, Shift_JIS, etc.) baked in from the start; backports Perl 5.10+ features like named captures and \K; Oniguruma upstream archived April 2025, Onigmo continues for Ruby | Active (Onigmo); Oniguruma archived | https://github.com/k-takata/Onigmo |
| Vim regex | 1991 (Vi → Vim) | Bram Moolenaar (RIP 2023) | Backtracking NFA + NFA-simulation since 7.4 | Four “magicness” levels: \v very-magic (egrep-like), \m magic (default), \M nomagic, \V very-nomagic; idiosyncratic atom syntax (\(, \) for groups in default magic); since Vim 7.4 (2013) supports a Thompson NFA engine alongside the old backtracker | Active (Vim 9.x and Neovim 0.10+) | https://vimdoc.sourceforge.net/htmldoc/pattern.html |
| Emacs regex | 1985 | Richard Stallman / GNU Emacs | Backtracking NFA | Lisp-string regexes — every backslash doubles in source code ("\\(" for \(); group via \(...\), alternation via |; re-search-forward, looking-at, replace-regexp; the rx macro (since Emacs 27) provides s-expression syntax that compiles to the underlying flavor | Active (Emacs 30) | https://www.gnu.org/software/emacs/manual/html_node/elisp/Regular-Expressions.html |
| Tcl ARE | 1999 (Tcl 8.1) | Henry Spencer | Hybrid (NFA + DFA) | “Advanced Regular Expressions”: POSIX ERE superset with Perl-style extensions (lookarounds, non-greedy, named); also used by PostgreSQL’s ~/SIMILAR TO since 7.4; Spencer’s library underpinned MySQL pre-8.0.4 too | Stable | https://www.tcl-lang.org/man/tcl/TclCmd/re_syntax.htm |
| GNU grep / sed extensions | 1988 (GNU grep) / 1992 (GNU sed) | Mike Haertel (grep), Jay Fenlason (sed), GNU project | NFA / DFA hybrid | Extends POSIX BRE/ERE with \<, \> word boundaries, \b, \B, \w, \W, \s, \S; grep -P (Perl mode) shells out to PCRE2; grep -E/-G/-F switch flavors | Very active (coreutils) | https://www.gnu.org/software/grep/manual/grep.html |
| AWK ERE | 1977 (AWK) | Aho, Weinberger, Kernighan / Bell Labs | NFA-simulation | POSIX ERE in gawk/mawk/nawk; pattern matching as a first-class language construct (/regex/ { action } rule head); GNU gawk adds \<, \>, \B, \y word boundaries | Active (gawk 5.x) | https://www.gnu.org/software/gawk/manual/html_node/Regexp.html |
| POSIX glob | 1986 (POSIX.2) | IEEE / Open Group | Token-based, not regex | Not a regex flavor strictly but constantly conflated with one: *, ?, [abc], [!abc], [a-z] for filename matching; ** (recursive) is an extension (bash globstar, zsh, fish); fnmatch(3) and glob(3) are the C APIs | Stable, ubiquitous | https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13 |
| Boost.Xpressive | 2007 (Boost 1.34) | Eric Niebler | Backtracking NFA | C++ template library: regexes as expression templates — write the pattern as C++ code at compile time (sregex re = (s1 = +_w) >> '@' >> (s2 = +_w)) or as a runtime string; same engine handles both; semantic actions in C++ | Stable (header-only, in current Boost) | https://www.boost.org/doc/libs/release/doc/html/xpressive.html |
| Hyperscan / Vectorscan | 2008 (Sensory Networks) → 2015 OSS at Intel → 2020 Vectorscan fork | Sensory Networks → Intel → VectorCamp | NFA + literal matchers, SIMD-accelerated | Multi-pattern matching: compile thousands of regexes simultaneously and stream-scan input; SSE/AVX/AVX-512 acceleration; powers Snort, Suricata, ClamAV; Hyperscan 5.4 last OSS release (Intel went proprietary at 5.5); Vectorscan is the community ARM-NEON / Power-VSX / SIMDe portable fork | Active (Vectorscan); Intel branch closed-source post-5.4 | https://github.com/intel/hyperscan |
| Lua patterns | 1993 | Roberto Ierusalimschy / PUC-Rio | Greedy left-to-right, no backtracking | Deliberately not regex — no alternation, no ?+*++ on groups, only on character classes; ~500 LoC implementation; the language docs are explicit that it’s a simpler alternative trading expressive power for tininess | Active (tracks Lua releases) | https://www.lua.org/manual/5.4/manual.html#6.4.1 |
Notable threads
-
ReDoS and the Cox/RE2 response (2007 → today). Russ Cox’s 2007 article series is the single most influential piece of writing in modern regex history. He showed that the algorithm Perl, Python, Ruby, Java, .NET, and JavaScript all used (backtracking NFA) had a worst case that was exponential in the input size for patterns like
(a?a?a?a?a?aaaaa)— and that Ken Thompson’s 1968 NFA-simulation algorithm matched the same patterns in O(nm) time, modulo backreferences and lookarounds which break true regularity. RE2 (2010) was Cox’s implementation at Google. Go’s standardregexp(2012) is RE2 in Go; Rust’sregex(2014) is “RE2-but-in-Rust”; .NET’sRegexOptions.NonBacktracking(.NET 7, 2022) is a derivative-based variant from Microsoft Research that achieves the same linear-time guarantee. The 2010s saw ReDoS become a documented attack class — Stack Overflow’s 2016 outage from a single ReDoS-vulnerable regex in their Markdown post-processor is the canonical case study, and modern static analyzers (CodeQL, Semgrep, npm audit) flag it as a routine finding. -
The JavaScript
/vflag (ES2024) — set notation, finally. ES2015 added/ufor code-point-correct matching and\p{...}Unicode property escapes (ES2018). The remaining gap was that you couldn’t compose property classes — you could match\p{Script=Greek}and\p{Letter}separately, but not “Greek letters” as a single class. The/vflag (TC39 proposal-regexp-v-flag, Stage 4 in 2023, ECMA-262 in ES2024, V8 11.2 / Chrome 112 / Safari 17 / Node 20) added: nested character classes ([[A-Z]&&[^AEIOU]]), set difference ([\p{Decimal_Number}--[0-9]]for non-ASCII digits), set intersection ([\p{Letter}&&\p{Script=Greek}]), and string properties via\q{...}([\q{ng|gh|sh}]). It’s effectively UTS #18 Level 1 done properly and brings JS regex meaningfully closer to ICU and the Pythonregexmodule. The/vflag implies/uand forbids the legacy “annex B” loosenesses, so it’s also a quiet cleanup of the language’s regex surface. -
POSIX ERE vs PCRE: the
\1line. POSIX ERE without backreferences is a true regular language — you can compile any pattern to a DFA and match in O(n) time and O(1) memory. The moment Perl 5 added\1for backreferences (matching whatever the first capture group captured), the language jumped expressive class to something that can matcha^n b^nand is no longer regular at all. PCRE inherited this. Russ Cox showed in the second article of his series that PCRE-style backreferences make matching NP-complete in the worst case. Most production engines that accept backreferences therefore can’t promise linear time; engines that do (RE2, Go, Rust, Hyperscan, Lua patterns) reject backreferences as a category. This is the deepest, oldest fault line in the regex world — every flavor lives on one side of it. -
.NET’s three-engine story is unique. Microsoft’s
System.Text.RegularExpressionsis the only mainstream engine that ships three matching strategies in one library: (1) the original interpreted backtracking engine (default), (2) a JIT-compiled IL backtracker (RegexOptions.Compiled, since .NET 1.0), now eclipsed by (3) source-generated regex via[GeneratedRegex]attribute (.NET 7, 2022) which emits real C# code at build time, and (4) the non-backtracking derivative-based engine (RegexOptions.NonBacktracking, .NET 7, 2022) that gives RE2-like O(n) guarantees while preserving backtracking semantics for the supported subset (no lookarounds, no backreferences). .NET also has the only mainstream engine with balancing groups ((?<-name>...)pops a stack) — which lets you match arbitrarily nested parentheses, the canonical example of “regex shouldn’t be able to do this.” -
Hyperscan: SIMD multi-pattern at line rate. When Snort or Suricata inspects a 100 Gbit/s network link looking for thousands of intrusion-detection signatures simultaneously, you can’t run 5000 regexes one at a time. Hyperscan (Sensory Networks, then Intel, then partly forked as Vectorscan after Intel went proprietary at version 5.5) compiles a set of regexes into a single combined matcher and uses SSE/AVX/AVX-512 instructions to advance the automaton across multiple input bytes in parallel. The trade-offs: only matches in left-to-right scan order (no end-of-match position guarantees by default), no backreferences/lookarounds, and the compile step is slow because it does heavy literal extraction and graph optimisation up front. The Vectorscan fork (VectorCamp, 2020+) extends portability to ARM NEON and Power VSX and remains ABI-compatible with Hyperscan 5.4, the last OSS Intel release.
-
Go and Rust as language-level commitments to safety. Both languages chose RE2 lineage on purpose. Go’s
regexp(Cox, 2012) is RE2 in Go. Rust’sregex(BurntSushi, 2014) explicitly cites RE2 as its blueprint. The deliberate decision in both ecosystems is that the standard regex library cannot ReDoS. If you want backreferences in Go, you reach for regexp2 (third-party, .NET-derived); in Rust, you reach forfancy-regex(third-party, RE2-fallback hybrid). This is a quietly important language-design statement: in safety-conscious languages, regex is a place where you trade expressive power for predictable performance, and the type/standard library reflects that. Compare the unreflective Perl/PCRE/Python lineage where the maximally-permissive flavor is the default and ReDoS is left as an exercise for the user. -
Lua patterns as the contrarian case. Lua patterns are intentionally not a regex flavor at all. There is no alternation operator. The
?/*/+/-quantifiers only apply to single-character classes, never to groups. The implementation is around 500 lines of C. The Lua manual is explicit that this is a deliberate cost/value trade — a full regex would dwarf the entire rest of the standard library. This makes Lua patterns the smallest practically-useful pattern language in mainstream use, and a clean argument for “regex is bigger than it needs to be.” -
The Onigmo / Oniguruma split. Ruby uses Onigmo, a fork of Oniguruma (the original by K. Kosako, used in PHP
mb_ereg, TextMate’s grammar engine, Atom, and Ruby 1.9). The fork (Onigmo, by K. Takata, since ~2011) backports Perl 5.10+ features Oniguruma upstream didn’t take. As of April 2025, Oniguruma was archived; Onigmo carries on as the canonical Ruby regex engine. This is a quiet but meaningful event — it means GitHub’s Linguist, TextMate-grammar tooling, and PHPmb_eregmay need to migrate to Onigmo or absorb Oniguruma’s archive state.
Citations
- POSIX.2 regular expressions (Open Group): https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html
- PCRE2 home and manual: https://www.pcre.org/ ; https://www.pcre.org/current/doc/html/pcre2.html
- PCRE2 release notes (10.47, October 2025): https://github.com/PCRE2Project/pcre2/blob/master/NEWS
- Russ Cox, “Regular Expression Matching Can Be Simple And Fast” (2007): https://swtch.com/~rsc/regexp/regexp1.html
- Russ Cox, regex articles index (#1, #2, #3, #4): https://swtch.com/~rsc/regexp/
- RE2 (Google): https://github.com/google/re2 and https://github.com/google/re2/wiki/Syntax
- Go
regexppackage: https://pkg.go.dev/regexp ; syntax: https://pkg.go.dev/regexp/syntax - Rust
regexcrate: https://docs.rs/regex/latest/regex/ - Andrew Gallant, “Regex engine internals as a library”: https://burntsushi.net/regex-internals/
- Java
java.util.regex.Pattern(JDK 21): https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/regex/Pattern.html - .NET regex options: https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-options
- “.NET 7 Regex Improvements” (Stephen Toub, 2022): https://devblogs.microsoft.com/dotnet/regular-expression-improvements-in-dotnet-7/
- “Derivative-based Nonbacktracking Real-World Regex Matching” (MSR): https://www.microsoft.com/en-us/research/publication/derivative-based-nonbacktracking-real-world-regex-matching-with-backtracking-semantics/
- TC39 proposal-regexp-v-flag: https://github.com/tc39/proposal-regexp-v-flag
- ECMA-262 (latest): https://tc39.es/ecma262/#sec-regexp-regular-expression-objects
- MDN, “RegExp.prototype.unicodeSets”: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicodeSets
- “Regexes Got Good” (Smashing Magazine, 2024) — JS regex history overview: https://www.smashingmagazine.com/2024/08/history-future-regular-expressions-javascript/
- Python
re(3.x): https://docs.python.org/3/library/re.html - Python
regex(PyPI): https://pypi.org/project/regex/ - Onigmo (Ruby’s regex engine): https://github.com/k-takata/Onigmo
- Oniguruma (archived April 2025): https://github.com/kkos/oniguruma
- Vim
:help pattern: https://vimdoc.sourceforge.net/htmldoc/pattern.html - Emacs Lisp regular expressions: https://www.gnu.org/software/emacs/manual/html_node/elisp/Regular-Expressions.html
- Tcl ARE syntax: https://www.tcl-lang.org/man/tcl/TclCmd/re_syntax.htm
- Boost.Xpressive: https://www.boost.org/doc/libs/release/doc/html/xpressive.html
- Hyperscan (Intel): https://github.com/intel/hyperscan
- Vectorscan (community fork): https://github.com/VectorCamp/vectorscan
- “Hyperscan: A Fast Multi-pattern Regex Matcher” (NSDI ‘19): https://www.usenix.org/conference/nsdi19/presentation/wang-xiang
- GNU grep manual: https://www.gnu.org/software/grep/manual/grep.html
- GNU gawk regexp manual: https://www.gnu.org/software/gawk/manual/html_node/Regexp.html
- Lua 5.4 patterns: https://www.lua.org/manual/5.4/manual.html#6.4.1
- Unicode UTS #18 (Unicode Regular Expressions): https://unicode.org/reports/tr18/
Caveats
- Hyperscan / Vectorscan version cadence post-2024. Intel’s continued internal Hyperscan development (5.5+) is under a proprietary licence; the open-source surface is fixed at 5.4 and Vectorscan tracks that as a portability fork. The community split is real but the precise Intel-internal version cadence isn’t publicly documented; treat any “current Intel Hyperscan version” claim above 5.4 as unverified.
- Vim regex engine internals. Vim 7.4 (2013) introduced an NFA-simulation engine that runs alongside the original backtracker; the engine selection is heuristic and the exact rules are documented only loosely in
:help two-engines. Specific performance claims should be benchmarked rather than asserted. - Tcl ARE / PostgreSQL regex. PostgreSQL’s
~operator andSIMILAR TOuse Henry Spencer’s library, but the exact feature subset has drifted across PostgreSQL versions; cite the PG docs for version-specific syntax claims rather than relying on the generic Tcl ARE description.