Skip to content

Nested Brackets and Regular Expressions

Recently on the TextMate mailing list Robin Houston generously offered an insightful explanation of matching nested brackets with regular expressions:

If you have ever studied automata theory, perhaps you remember that nested brackets are the classic example of something that can’t be matched by a regular expression. So you might be surprised (or disbelieving) to see a regular expression that does exactly that. The explanation, of course, is that theory and practice are closer in theory than in practice: Onigurama regular expressions, in common with many other flavours, are more powerful than the things that computer scientists call “regular expressions”.

The key in this case is the recursion (or “subexpression call”) feature. (For those of you who know about such things, this feature makes it possible to represent any context-free language — a step up the Chomsky hierarchy.) I like to think that I invented this feature; certainly I wrote the initial implementation for the PCRE engine, which as far as I know is the first place that it appeared. However, the Onigurama implementation may be a case of independent rediscovery: the manual describes it as “Tanaka Akira special”.

Read the full explanation in the archives of the TextMate Mailing List. Thanks Robin!

Post a Comment

You must be logged in to post a comment.
FireStats icon Powered by FireStats