Jail-breaking Parser Technology
2026-01-09
PEG makes parsing even easier to express.
OhmJS is just a new-and-improved PEG parser.
OhmJS adds several important concepts to PEG parsing:
space-skipping
rule parameterization
no-fuss match capture
locality-of-reference, leaving the grammar unsullied by semantic and implementation-level details
a grammar explorer (formerly known as a REPL) in the form of ‘ohm-editor’
left-recursion.
No-fuss match capture means that the grammar is not sullied by the inclusion of variable naming syntax required to specify what to capture and what not to capture. The programmer has more left-over brain power to devote to other issues. This is a DX (Developer eXperience) issue that enhances programmer productivity. [One might consider why so many language syntaxes exist. Assembler provides every operation necessary for programming. If DX were unimportant there would be no need to invent new syntaxes nor to use anything but assembly language]. In OhmJS, the grammar contains less noise and remains more readable.
Left-recursion allows one to specify grammars in a more natural way, which, again gives the programmer left-over brain-power for thinking about better things.
Left-recursion extends the class of grammars that PEGs are able to parse (which is now a superset of PEGs).
A major feature of PEG, in general, and, thus, of Ohm, is the use of backtracking.
Backtracking essentially means that you don’t have to worry about whether the grammar is LL(1), LL(2), LL(3), etc. This equates to less fuss and less tinkering with the grammar - all of which leaves the programmer with more brain-time for thinking about better things.
Note that backtracking is not present in traditional recursive-descent parsing methods.
Another useful aspect of PEG - and hence OhmJS - is that programmers don’t need to specify separate scanners. PEG lets them write rules all the way down to the character level. OhmJS, goes further by letting programmers express character-level rules and more-abstract higher level rules (lexical and syntactic rules).
These things - together - make it massively easier (guesstimate: 10x, 100x, ...) to build new syntaxes and to treat existing programming languages like assemblers (t2t - text to text transpilation, just like the original ‘cc’ used to do).
I’ve found that you can use PEG on itself to build little DSLs to help build more little DSLs. I call these things SCNs - Solution Centric Notations. There’s a joke about physicists creating “spherical cows” using the approach of laser-focusing on only one aspect of a problem. SCNs are my idea for using this same approach for thinking about software.
If you lean on existing programming languages and treat them as “assemblers”, you don’t need to go through adding the ceremony and self-flagellation of doing type-checking, code generation, etc. Lots of compilers already exist and they do all this stuff for you. You can knock off custom, little helper SCNs in hours (I want to say minutes, but, I’m feeling conservative). Turing completeness doesn’t even matter for this approach. It’s like using REGEX in your code, but, more powerful.
And, if one wipes the memetic indoctrination from one’s mind about “parsing” being useful for only one thing, i.e. “compiler building”, then you end up in new and interesting places. For example, I regularly parse diagrams and transmogrify them into running code. With OhmJS, one can write “macros” for languages other than Lisp-based ones. I’ve built text document manipulators, like converting Kinopio documents into markdown.
Parsing can be jail-broken from the compiler world, just like what was done with REGEX. PEG parsing is the escape key. Traditional CFG parsing is the shackle around our ankles.
Further Reading
Alan Kay on using existing PLs as assemblers
Experiments With Text to Text Transpilation (this is just a starting point, I’ve been adding articles on the subject since having written this specific article. Search my substack and blog for keywords like “t2t” and “OhmJS” and “transmogrification” and “SCN” and “RWR” and “rewriting” and “drawware”)
kinopio2md code repository (unfinished experiment)
See Also
Email: ptcomputingsimplicity@gmail.com
Substack: paultarvydas.substack.com
Videos: https://www.youtube.com/@programmingsimplicity2980
Discord: https://discord.gg/65YZUh6Jpq
Leanpub: [WIP] https://leanpub.com/u/paul-tarvydas
Twitter: @paul_tarvydas
Bluesky: @paultarvydas.bsky.social
Mastodon: @paultarvydas
(earlier) Blog: guitarvydas.github.io
References: https://guitarvydas.github.io/2024/01/06/References.html

