PBP Kernel Portability
2025-10-31
In the following, PBP stands for Parts Based Programming. I used to call this 0D.
Design Note
The PBP kernel is in https://github.com/guitarvydas/pbp-dev/tree/main/kernel. It is written in a portable manner which spits out kernels in Python, Javascript and Common Lisp. Other languages should be easily possible (by hacking the python version, say), but I ran out of interest (contributions welcome). The source code for the kernel is in 4 files: 0d.rt, external.rt, kernel_external.rt, stock.rt.
Portability
Portability is achieved by creation of an uber-syntax instead of the usual conditional compilation messiness (#ifdef and friends). This technique is discussed in a substack article https://programmingsimplicity.substack.com/p/portability. The uber-syntax is very reminiscent of Python, but without the hard-to-automate concept of indentation-based syntax. Indentation is meant for human-readability, but, is unnecessary for machine-readability and machine-writability. Our current tools, such as parser generators, prefer brace-brackets instead of indentation. As long as the internal representation (.rt) is not required for programming by normal human programmers, it is OK to appease the tools in our current workflow. This opens the door to other, more convenient, ways of writing code that writes code. [In fact, I consider that my “programming language” is UNIX pipelines instead of what is more commonly used. Programming languages are just 1960s IDEs for programming workflow, while UNIX pipelines are a more advanced, 1970s notion of the same ideas. Programming languages are just notations for specific paradigms, while UNIX pipelines allow programmers to compose programs using LEGO® blocks built in any language/paradigm. Functions and pipelines are not equivalent. Function-based code libraries are not LEGO® blocks.]
The uber-language approach demonstrates the utility of PEG-based parsing technology. It is easy to build a new syntax. It is easy to transmogrify this new syntax into existing programming languages. The kernel uses t2t (pbp/t2t.bash, pbp/kernel/t2t/) to transmogrify the uber-language into Python and Javascript and Common Lisp. See the back-ends for each (e.g. emitPython.rewrite). The front end for all emitters can be written in a language independent manner (see emit.ohm). Only the back ends for each emitter need to be customized for each target language. Emission of custom code is done in 2 steps
parse the I.R. using a common grammar (I use OhmJS for this, see emit.ohm)
create a custom rewriter for each target language (e.g. see emitPython.rewrite, emitjs.rewrite, emitcl.rewrite) [the .rewrite suffix is historical, I’ve switched to using .rwr for newer projects].
Some actions and features are vastly different between target languages. In these cases, I create pseudo-operations that look like macros and functions and resolve them to code in the target-specific emitters.
For example, reading a JSON file representing a little network into some internal data structure needs different code for Python than for Javascript. This action is written as ‘#internalize_lnet_from_string ()’ in the 0d.rt source code. This produces pseudo Python code: [mis-formatting due to the Substack editor…]
try:⤷
fil = open(«fname», “r”)
json_data = fil.read()
routings = json.loads(json_data)
fil.close ()
return routings ⤶
except FileNotFoundError:⤷
print (f”File not found: ‘{«fname»}’”)
return None⤶
except json.JSONDecodeError as e:⤷
print (”Error decoding JSON in file: ‘{e}’”)
return None⤶
while it produces Javascript code:
let jstr = undefined;
if (filename == “0”) {
jstr = fs.readFileSync (0, { encoding: ‘utf8’});
} else if (pathname) {
jstr = fs.readFileSync (\`\${pathname}/\${filename}\`, { encoding: ‘utf8’});
} else {
jstr = fs.readFileSync (\`\${filename}\`, { encoding: ‘utf8’});
}
if (jstr) {
return JSON.parse (jstr);
} else {
return undefined;
}
The rewrite DSL contains constructs like “«fname»” which causes the rewriter to interpolate the parameter “fname” in-place in the generated code, like a macro expander.
The pseudo-Python code contains unicode characters meant for directing indentation in the final output. The symbol “⤷” means indent while the symbol “⤶” means outdent. A final fixer-upper pass resolves these symbols and indents the code to create legal Python source code. The fixer-upper filter is a tiny program about 50 lines in length.
Note that the source code in 0d.rt emphasizes machine-writability and machine-readability over human readability. There is no attempt at generalizing the uber-language. In some cases, like the previous example, single lines of .rt code expand into multiple lines of target code. The supported operations are specific to the task-at-hand, i.e. compiling the PBP kernel. Operations are tuned to fit the task and are not generalized.
Focussing on the task instead of focussing on generalization greatly simplifies the .rt transmogrifier. The whole kernel took only a few hours to write, once the design of the kernel had settled down. The balance of brain power was devoted to iterating the design of the kernel instead of dealing with niggly details in the code. The design required many iterations over several months / years, while the final version of the code took only a few hours to write.
In general, I believe that we have enough “compilers” already. We don’t need to write any more compilers (or interpreters), we only need to transmogrify uber source code into a form that can be executed by already-existing compilers and interpreters.
Rewriting
Rewriting is kept simple by dealing only with string-to-string transmogrification. I used to think that we would want to cover all sorts of user-defined data structures, but, in practice I’ve found that string-to-string transmogrification is enough. I’ve been doing this kind of transmogrification for several years and haven’t found a case where I needed something more than strings. I’ve reached the conclusion that data structures - built into programming languages and user-defined - are the result of the 1970s biases for premature optimization. Simpler parsing technologies, like OhmJS, can convert any text into internal data structures on the fly. It appears that FP techniques are backing into this realization with inventions like “pattern matching”. Pattern matching was already well understood in the 1970s, but historical biases made us think that we needed something more complicated.
The historical bias towards compilation AOT (ahead of time) compilation is diminishing as we encounter new, distributed problem domains. AOT compilation was effective for the single-computer mindset of the 1970s and concealed the fact that it may not be the optimal choice for more distributed problem domains. Many of the biases embedded in the AOT mindset - like the need to pre-define data structures - are dissolving as we address new kinds of problem. I conclude that the AOT (and JIT?) mindset are representative of the act of Production Engineering instead of program engineering.
To keep the task of rewriting simple, a small DSL (I call it an SCN - Solution Centric Notation) was developed. Again, focussing only on the task of rewriting strings to other strings keeps the design of the DSL simple and easily implementable.
The documentation for the rewriting SCN (DSL) is https://github.com/guitarvydas/pbp-dev/tree/main/t2t/doc/rwr.
Documents for text-to-text (t2t), with some overlap can be found at:
https://github.com/guitarvydas/pbp-dev/blob/main/t2t/doc/T2T.pdf
https://programmingsimplicity.substack.com/p/experiments-with-text-to-text-transpilation.
DaS - Diagrams as Syntax for Program Architecture
The source code for the kernel is written as a set of diagrams as described in https://github.com/guitarvydas/pbp-dev/blob/main/kernel/README.md. Most of the blocks on the diagram preprocess input code (the .rt code) in stages into internal form. The real transmogrification work happens in blocks “RT to Javascript”, “RT to Python” and RT to Common Lisp” which use the t2t toolset to parse and rewrite the internal representation using a shell script ndsl. At all stages, the internal form uses ASCII and Unicode text (instead of say, some sort of binary representation), hence, remains viewable and understandable to human programmers, although the syntax might not be as concise as one would choose for direct, manual programming.
Make
Use Makefile (-> make.bash) to rebuild. The kernel uses itself to build itself (kernel.py). This can result in a broken kernel if you make a serious mistake. The current solution is to manually copy the previous kernel from the ./before-??? directory into the build directory and try again. I use the kernel as a smoke test for itself, hence, I haven’t switched to using the “obvious” solution of using a golden copy of kernel.py for the self-build. Hmm, suggestions welcome to stop me from continuing to make excuses for this workflow.
[I haven’t yet cleaned this directory up, so it contains a lot of historical cruft. It should all be there and should work, though.]
Untested idea
An early version of the kernel was built in Odin to ensure that the kernel covered systems languages that did not do garbage collection by default. After switching to Python, it is likely that I’ve veered away from keeping the kernel honest about GC. OTOH, it seems that we don’t need full-blown GC. I think that we just need to worry about GC’ing `mevents` (messages for message-passing with event semantics and single-input-queues). I think that we don’t need generalized GC for every kind of data type. We should be able to express this kind of simplified GC in the uber-language .rt files (contributions welcome).
See Also
Email: ptcomputingsimplicity@gmail.com
Substack: paultarvydas.substack.com
Videos: https://www.youtube.com/@programmingsimplicity2980
Discord: https://discord.gg/65YZUh6Jpq
Leanpub: [WIP] https://leanpub.com/u/paul-tarvydas
Twitter: @paul_tarvydas
Bluesky: @paultarvydas.bsky.social
Mastodon: @paultarvydas
(earlier) Blog: guitarvydas.github.io
References: https://guitarvydas.github.io/2024/01/06/References.html

