Enhance regular expressions

Dmitry Olshansky

Abstract

The goal is to make existing std.regex module in the D standard library more feature-rich and flexible. This includes: upgrading existing engine to support some popular extensions, providing alternative implementations based on finite state automation with their respective trade-offs. Also to provide the ability to generate the whole regex engine from pattern statically at compile time.

Additional Information

 

Having regular expression (regex) engine in standard library is totally expected for any modern language. The solid and performant implementation of regex is one of the greatest points of pride (if not of sale).

While concrete results and benchmarks are frequently biased, there are some general properties. Regexes usually came in two flavors: Finite Automation (FA, be it deterministic or non-deterministic) and backtracking, each of these with their respective set of traits. The popular implementation strategy is Virtual Machine approach, which is quite extendable for further optimization.

FA are more stable O(N) in time on input, however they are usually more limited as implementing as implementing features (such as back references) becomes problematic, also not to mention the time for constructing DFA from pattern. Backtracking allows easily implementing some interesting features like back referencing, lookahead/lookbehind and such, but has a dreadful exponential time on input in worst case.

Now speaking of D, what the standard library currently has is essentially an optimized recursive backtracking based on VM. There are plenty not implemented features that it may very well have had. Also there are a lot of issues that need proper resolution.

Basically, current goal is seen as make new fast and robust regular expression library that leverages D's great capabilities in generic programming. Also hopefully using modern D2 idioms.

 

See more info on project's GitHub page.