What if the fundamental limitation of Abstract Syntax Trees isn’t their size or efficiency, but rather the mental framework they impose on how we think about problems? The real issue lies in the mindset that accompanies AST-based thinking—a mindset that treats memory as an infinite resource rather than a finite constraint to be managed.
Consider how the AST approach to compilation shapes algorithm design. It encourages us to assume unlimited memory availability, treating memory exhaustion as an exceptional error condition rather than a normal aspect of the problem domain. This assumption isn’t inherently wrong—it’s simply one way to frame the problem space. But it’s not the only way, and recognizing this distinction opens up entirely different algorithmic possibilities.
We can solve the same computational problems using either assumption—infinite or finite memory—but these different starting points lead us down radically different design paths. The algorithms that emerge from finite-memory thinking are different from their infinite-memory counterparts, yet both can effectively solve the same underlying problems.
The Dominance of Infinite-Memory Thinking
Look at how modern programming culture has evolved. Our collective understanding of “what programming is” has become disproportionately shaped by the infinite-memory paradigm. We’ve invested enormous effort into managing the complexities that arise from assuming infinite memory—garbage collection, virtual memory systems, sophisticated memory allocators—while comparatively little work has explored algorithms explicitly designed for finite-memory constraints.
This represents a choice about which problem variables to suppress. Effective problem-solving often involves deliberately hiding certain domain variables to reduce the degrees of freedom we must consider. Memory availability is simply one such variable. In infinite-memory thinking, we suppress this variable entirely, making it invisible to our design process. In finite-memory thinking, we keep it visible and treat it as an integral part of the problem to be solved.
Two Paradigms for Memory Management
We can frame this choice explicitly through two distinct approaches:
The infinite-memory assumption treats virtual memory as a universal substrate underlying all aspects of an algorithm—both data and code operate under this abstraction automatically.
The finite-memory assumption—say, 128KB for code space and 128KB for data space—treats virtual memory as one tool among many available to software architects and engineers. It must be explicitly applied where needed rather than universally assumed. Programs that don’t require this tool shouldn’t pay its cost.
AST-based approaches emerge naturally from the first paradigm, regardless of how compactly we might represent those ASTs in practice. The second paradigm leads somewhere quite different.
What Finite-Memory Thinking Enables
When we design with finite memory in mind, different concerns come to the forefront. We naturally gravitate toward staged computation—small processing steps that don’t require tree-walking entire input structures. Each stage becomes a self-contained unit with bounded memory requirements.
Staged computation creates new opportunities. Each stage benefits from its own small parser—a streaming parser that doesn’t necessarily rely on AST structures. When each stage has its own parser, each can define its own domain-specific mini-syntax for input. This leads us to a key requirement: building these mini-syntaxes must be genuinely easy.
Let’s call these Solution-Centric Notations, or SCNs. This approach mirrors fundamental techniques from physics: reduce the number of variables in a problem through “much-greater-than” analysis, understand and solve the simplified problem, then decompose the larger problem into multiple smaller islands of reduced complexity. This is “divide and conquer” in its most practical form.
Practical Tools for Building SCNs
This perspective directly addresses how we should think about programming language and DSL design. My approach diverges from conventional wisdom: don’t think of DSLs as “compilers.” Instead, strive for simplicity and rapid iteration.
Formal language techniques like context-free grammars and LR(k) parsing carry too much conceptual weight for achieving quick turnaround. CFGs push you into bottom-up parsing mindsets that demand excruciating attention to detail before producing working results.
Consider what we already recognize as DSLs: regular expressions are DSLs. BNF notation is a DSL. These succeed precisely because they’re lightweight and focused.
Top-down parsing proves far more effective for quickly building small SCNs. Until recently, recursive descent offered the best approach for creating top-down parsers, yet few tools for recursive descent parsing gained widespread adoption. S/SL accomplishes this, but remains largely unknown to most software engineers. Parsing combinators work as well, but lock you into functional programming paradigms that themselves assume infinite memory.
More recently, Parsing Expression Grammars (PEGs) have made top-down parsing both practically useful and academically respectable. This matters because practical adoption and formal acceptance don’t always align initially.
OhmJS: A Practical Path Forward
OhmJS provides a practical tool built on PEG foundations while extending PEG in valuable directions—improved backtracking, parameterized rules, separation of concerns, and the Ohm-editor for development. While OhmJS builds on AST principles internally, we can treat this as an implementation detail rather than a conceptual constraint. In fact, PEG could theoretically support streaming parsers—I began exploring this direction but never completed a releasable implementation.
Transmogrification: Languages as Assembly Targets
Consider an alternative approach to language implementation: treat existing languages as “assemblers” and transpile new language syntaxes into existing programming languages. This is transmogrification.
Text-to-text transpilation (T2T) provides tooling for building these transmogrifiers. The T2T approach uses OhmJS for front-end parsing and RWR for back-end rewriting, transforming new syntax into established syntax.
RWR itself exemplifies this philosophy—it’s an SCN built using OhmJS that specifies rewriting operations without requiring JavaScript code. RWR serves as an SCN specifically for building backends for OhmJS-based parsers.
The path forward involves questioning which assumptions we’ve inherited and which we should deliberately choose based on the problems we’re actually trying to solve.
Resources
Text-to-text (t2t) tool: https://github.com/guitarvydas/pbp-dev/tree/main/t2t
RWR documentation at https://github.com/guitarvydas/pbp-dev/blob/main/t2t/doc/rwr/RWR%20Spec.pdf (RWR is the back end of t2t)
S/SL and staged computation PT Pascal compiler: https://research.cs.queensu.ca/home/cordy/pub/downloads/ssl/
https://www.txl.ca (TXL is a unique programming language specifically designed to support computer software analysis and source transformation tasks.)
See Also
Email: ptcomputingsimplicity@gmail.com
Substack: paultarvydas.substack.com
Videos: https://www.youtube.com/@programmingsimplicity2980
Discord: https://discord.gg/65YZUh6Jpq
Leanpub: [WIP] https://leanpub.com/u/paul-tarvydas
Twitter: @paul_tarvydas
Bluesky: @paultarvydas.bsky.social
Mastodon: @paultarvydas
(earlier) Blog: guitarvydas.github.io
References: https://guitarvydas.github.io/2024/01/06/References.html

