Compilation Using Two Stage Code Emission

2026-02-14

Feb 14, 2026

Building compilers as shown in previous articles regarding Forth Haiku, is super-simple.

The technique uses a dictionary to store code templates. The first emitter pass just fetches templates and simply pastes them together. A second pass optimizes the code generated in the first pass.

Pass 1 can be trivially implemented in any language that has hash tables or dictionaries.

Pass 2 can be trivially implemented using a peepholing text-to-text rewriter, which produces locally optimized code. This is often “good enough”.

Or … pass 2 can be more involved, using global optimization methods. This takes a lot more thinking and extra work. This should be reserved for parts of code that really need it, i.e. have a high ROI for the extra work put in. Avoid premature optimization. Don’t optimize unless you can prove that a section of code needs it. Don’t optimize everything by default, optimize only the parts that need it (my gut feel is that only 5% of code needs heavy speed optimization, whereas a lot of code could benefit from size reduction at the cost of speed). Most modern PLs encourage “premature optimization” by requiring a lot of attention to detail (types, memory management, overuse of synchronization due to use of functions, etc.). This slows down development workflow and derails inventiveness.

This two-stage code emission technique is more obvious when Forth is used, but, it is possible to use this technique to compile traditional infix languages.

When compiling Forth, we just map each Forth “word” to a canned template of code. To do this with infix languages, we need to parse and make decisions about which canned templates to use. This is slightly more difficult than compiling Forth, but, not overly difficult.

This kind of technique is used in gcc (“RTL” devised by Fraser/Davidson). Cordy’s OCG further generalizes the technique using MISTs (decision trees) and Data Descriptors. As far as I can tell, more bloatful techniques, based on functional programming techniques (ASTs needing lots of memory) overtook development of simpler techniques.

I built my first peepholer in 1981, using about a page of AWK code. I currently use t2t to parse and rewrite text. OhmJS’s (based on PEG) backtracking makes this approach convenient. I’m, also, playing with LLMs to generate code. (Steve Phillips successfully created a new programming language using an LLM to transpile the new language into running Go code).

Asides

As of this writing, I haven’t yet produced a video nor article about how the Forth Haiku compiler performs optimization.

I have created example code using T2T to do peepholing, but, as of this writing, haven’t written an article about that yet. The code, “as is”, is in a public github repository.

The transcription of the above video incorrectly uses the word “fourth” when I speak “Forth”. (If I understand correctly, Chuck Moore dropped the “u” due to early hardware/software constraints).

The current trend in sports psychology towards flow (staying “in the zone”) affects programming, too. When programmers interrupt their flow by using premature typing and tools with modal dialogs, edit-compile cycles instead of homoiconic languages with REPLs and tools that need too much up-front information (like asking “where should this file be saved?” too early in the programming process), their productivity drops, their “inventiveness” drops (tendency to use only one paradigm instead of using many paradigms), and programming doesn’t “feel like fun”.

I’m kind of laid-back when it comes to just trying things out. I don’t care about “efficiency” in the same way that we did in the 1960s. If it works “fast enough” on my development machine and if it increases my productivity, then I’m OK with it. Modern machines are so fast that I can - finally - think about using computers as tools to help me develop code. Being able to slap tools together quickly is more important than worrying about every niggly detail, like bytes, types, etc., at first. Writing code that writes code is what compiler writers do. I want to find ways to enable any programmer to write code that writes code. I want to spiral in to an engineering design that I like, before worrying about how to make it fast enough for Production.

Substack: paultarvydas.substack.com

Videos: https://www.youtube.com/@programmingsimplicity2980

Discord: https://discord.gg/65YZUh6Jpq

Leanpub: [WIP] https://leanpub.com/u/paul-tarvydas

Twitter: @paul_tarvydas

Bluesky: @paultarvydas.bsky.social

Mastodon: @paultarvydas

(earlier) Blog: guitarvydas.github.io

References: https://guitarvydas.github.io/2024/01/06/References.html

Paul’s Substack

Discussion about this post

Ready for more?