2024-07-30-Diagram Compiler Status

Jul 31, 2024

This project creates a working “Larson scanner” (seen on the T.V. show “Knight Rider”) from a diagram. This is a working paper, a report on the current status of the project.

The main source code is drawn as a diagram using the draw.io editor. The use of draw.io is historical and not based on any fundamental features of draw.io.

Fig. 1 Top Level Diagram

Fig. 2 Larson Scanner

Workflow

Fig. 3 Workflow

The programmer creates the diagram (in this case “scanner.drawio”) and a set of Leaf components (in this case “Delay”, “Count”, “Reverser”, “Decode” and “👀” [names are not restricted to using only the ASCII character set]) .

A systems programmer creates (once) the 0D kernel (“rt0d.rt”) and a mainline (in this case “main.py”).

The diagram is compiled to a JSON file.

The kernel is compiled to Python3 code.

The parts are combined together and run by make.

The output consists of numbers ranging from 0-9, counting up, then reversing and counting down. The code (in component “👀”) prettifies the output by inserting indents and exdents.

Fig. 4 Output of Larson Scanner (command line version)

Github Repository

Larson scanner: https://github.com/guitarvydas/rtlarson

‘main.py’ which includes
- the mainline
- The 5 Leaf components shown on the above workflow sketch (Delay, Count, Reverser, Decode, and, 👀).
‘rtpy0d.py’ is the generated Python 0D kernel shown as generated by ‘rt t2t’ in the above workflow sketch
Run ‘make’

Rt to Python compiler: https://github.com/guitarvydas/rt

generates ‘rtpy0d.py’
Run ‘make’ to generate rtpy0d.py, then manually copy the generated code (‘rtpy0d.py’) into ‘rtlarson’

Container and Leaf Components

Container Components

The above draw.io diagram shows 2 Containers, Fig. 1 and Fig 2.

Das2json compiles (transpiles, actually) the diagrams into JSON - scanner.drawio.json. You can use a text editor to view the generated JSON, if you wish.

Structure of Leaf Components

Each Leaf component consists of pure code in some underlying programming language, in this case Python.

Each component must be registered by name. This happens in the routine ‘main.py/components_to_include_in_project’.

Each component must supply 2 entry points:

Instantiation
Handler.

Instantiation code consists of about 3 lines of code.

Handler code varies in size, but, is fairly small. The most straight-forward component is the one named monitor. It’s handler is about 7 lines of code, consisting mostly of pretty-printing done as a Python while loop.

0D and the Kernel

Messages

Messages are objects with 2 fields

A port id (in this case, a string)
Binary data - a datum object. Not structured by the system, but, Senders and Receivers must agree on the contents of the data.

Datum objects contain 6 attributes:

The data.
Clone () - a function that clones the datum and returns the clone.
Reclaim () - a routine which garbage collects the datum. In this case, we allow the Python garbage collector to GC the datum, hence, this function is currently a no-op.
Srepr () - returns a string representation of the datum.
Kind () - returns a string, for debugging and bootstrapping, that describes the type of the datum, e.g. “string”, “bang”, etc.
Raw () - returns the data as raw bytes.

Connections and Routing

Connections are owned by Container components, which act like mini-routers. Containers route messages between children, and, their own ports.

This structure is different from the traditional 3GL view of functions.

In most modern programming languages, there is only one kind of thing which performs actions - the function.

In 0D, there are 2 kinds of things:

Leaf components
Container components.

Leaf components are essentially like functions in 3GLs. Leaves contain action code. Leaves must supply 2 callable functions (1) instantiation, and, (2) message handling.

Container components are like pre-fab functions that perform only one kind of action - routing messages. Routing is “layered” and “structured” due to the use of Container components.

This paradigm requires that connections contain 3 pieces of information, which is used by the routing action of containers:

Direction (down, across, up, through)
Sender {component, port}
Receiver {component, port}.

A component - whether Leaf or Container - cannot send messages directly to other components. Components must leave messages, in order of creation, on an output queue. Each message is tagged with a corresponding output port id.

Hence, when a Component executes, it creates a queue of output messages, where each message contains a port id that corresponds to an output queue of the component, itself.

During routing, messages are mapped to translate output port ids of the sender component, into input port ids of each receiver component.

Fan-out is allowed, i.e. one output message can be delivered to the input of more than one receiver. Fan-out implies copying or copy-on-write semantics. Fan-out is necessary for enabling abstraction, i.e. converting a block of components into a single component with fewer ports. This is a usability issue. Without fan-out, abstraction becomes impossible, and, hence, usability of component architectures is greatly diminished. This kind of abstraction allows layering of designs. Traditional function-based programming provides only one construct for abstraction - the function - which does not encourage / enforce layering, and, makes large programs more difficult to comprehend.

Other fallout from the design decision for having 2 kinds of components, and, allowing fan-out includes:

components have only one input queue, each, i.e. not the more “obvious” choice of one queue per port (think of this as a kind of “structured programming” constraint for message-passing systems)
Routing must be performed atomically - i.e. messages must be delivered to all receivers without allowing intervening message deliveries
One connection represents a single connection between one Sender and one Receiver, fan-out is represented by multiple such connections (connection descriptors), hence, atomicity is required to ensure that all receivers receive fanned-out messages without interruption nor interleaving [Note that processing message delivery can be very efficient, needing only very small windows of atomicity. Function-based programming has atomicity built into it due to underlying synchrony, but, the cost of doing so adds bloat to multi-tasking and asynchrony.]
Containers can contain any kind of component - Container or Leaf. Hence, Containers are defined recursively. Leaves, though, do not contain other components, hence, are non-recursive. Using Lisp as an analogy, Containers are like lists and Leaf components are like atoms.

Random Notes About Features and Concepts Included in this Project

The code generator generates pseudo-Python code, then formats the generated code to be compatible with Python3 indentation. I use Unicode characters to insert indents and exdents during the code generation operations. ‘Indent.js’ formats the pseudo-code into properly indented Python code.
The code generator uses existing technology, i.e. OhmJS for parsing. I added another layer to the workflow in the form of a DSL (created using OhmJS) called ‘RWR’ for rewriting parsed input strings, instead of specifying the rewrites as raw Javascript code as would be required by off-the-shelf use of OhmJS.
Diagrams are drawn using existing technology. It would be better to use a custom-built programming diagram editor, but, we felt that using existing technology for the first step would be sufficient. We chose a drawing editor which saves diagrams in graphML format (a form of XML). We parse the graphML diagrams using existing technologies, like XML parser libraries and/or PEG parsing technology. We use draw.io for editing and OhmJS for parsing. Many of the reasons for these choices are historical. We expect that any drawing editor that saves diagrams in XML format could be used. We expect that any PEG parsing technology could be used to transpile the DPL (diagrammatic programming language) code, although I have a strong preference for OhmJS because it supports a clean separation between grammar and semantics, it supports left-recursion, it sports a syntax which aids readability, and, it comes with a grammar REPL called “Ohm-editor”.
“RT” stands for “Recursive Text”, another WIP subproject in which I explore the ideas of building a higher-than-high-level-language (HHLL) that uses ideas from Lisp and the new capabilities of PEG parsing (as opposed to context-free parsing (CFGs)).
The diagrams contain feedback loops. Feedback uses queues (FIFOs). This is not the same as recursion, which uses stacks (LIFOs). Function-based programming restricts programmers to using only stacks. 0D allows programmers to use stacks inside components while using queues for inter-component messaging. Feedback is used frequently in hardware design, but, borders on being a forgotten art in software design.
I argue that messaging, using queues, enables better expression of asynchrony and multi-tasking.
The ‘done’ output of the Larson scanner is an implementation detail which allows insertion of a Delay component in the main layer.
The ‘Delay’ component is an example of how one might interface with existing software and I/O. If a component is connected to software outside of the 0D world, the component explicitly tells the 0D kernel when it is running and when it becomes idle using the ‘set_active ()’ and ‘set_idle ()’ kernel calls. Active components are continuously polled with tick events by the 0D kernel (and all intervening layers of Containers). This allows one to use existing software even when the running time cannot be determined before hand. Other techniques for accomplishing similar interfacing with existing software are being successfully used at Kagi by Zac Nowicki, using the Crystal version of 0D. Continuous polling could be subsumed by using some external event to tell the 0D kernel when to poll once.
We use the name bang to represent a datum that has no data and is used to signal certain events. A bang is even simpler than a boolean in traditional 3GLs. A bang does not have true or false values, a bang simply exists.
Message port-mapping during routing adds a layer of indirection, but, greatly enhances the reconfigurability of component-based architectures. For example, if you wish to change a design, you only need to change one Container diagram, while the code within other components - Leaf or Container - remains unaltered. This is similar in effect to referential transparency provided by Functional Programming, but, does not suffer from the same restrictions that FP imposes on referential transparency. In 0D, one can create components with multiple input ports and multiple output ports and plug them into and out of designs at will. In hardware design, this is called “pin for pin compatibility”. In FP, though, components must have exactly one input and exactly one output (disregarding the anti-structured concept of exceptions) to support referential transparency.
Input and output ports act like input APIs and output APIs. Components are completely isolated. You don’t need to care how a component is implemented internally. A component could contain pure functional code, or, state-machine code, or, a rat’s nest of logic. It doesn’t matter, as long as the component adheres to its external guarantees for operation (e.g. “truth tables”, etc.).

Future Direction

The goal of this project is to create <script>…</script> Javascript code suitable for inclusion in an .HTML file for viewing in a browser.

This goal should probably be preceded by creating all of the above, on the command-line, using node.js.
It is my experience that emitting Lisp code causes one to redefine bits of the t2t (text-to-text transpiler) to use recursive grammar rules instead of the typical Kleene and BNF grammar operators (+ / * / ?). As such, it would probably be a good idea to create a Common Lisp transpiler before creating the node.js transpiler. Basically, if the transpiler can emit Common Lisp code, then emitting Javascript (and node.js) should be easier. The project has taken some 4 days up until now, I would expect that creating a Common Lisp version should take another few days. [I am more comfortable with Common Lisp than with the other Lisp variants, like Scheme, Racket, Janet, etc. I don’t see any particular reason why Scheme, Racket, Janet, etc. couldn’t be used instead of Common Lisp].
Currently, rt0d.rt looks like Python code that uses brace-brackets instead of indentation. I expect the code to change and improve as it is used for generating 0D kernels for more and more target languages. This project started out with a working 0D kernel written in Python. The Python 0D kernel was copied into ‘rt0d.rt’ and modified by

Writing and using a small t2t transpiler to bulk-replace indentation with matched left and right brace brackets. The transpiler was used as an editing tool to generate a new version of rt0d.rt. The transpiler was fairly simple at this point - it simply looked for the Python keywords “def” and “class” and inserted brace brackets, while skipping over, and, keeping intact, all characters between two keywords. This simple heuristic turned out to cover a large number of cases, say 90% (I didn’t actually measure this). The transpiler did not convert 100% of all indentation to bracketed notation, but, it was deemed easier to simply perform the remaining replacements manually than to continue tweaking the transpiler until it did 100% of the work. In the process, the transpiler rewrote the keyword “def” to “defn” and the keyword “class” to “defclass”. The target, at this point, was not to produce human-readable code, but, only to produce code that OhmJS could parse using its space-skipping feature. As such, spaces were added liberally to the generated code and human-readable indentation was ignored. Note that even this simple heuristic could not be easily implemented using REGEX technology, since this kind of pattern-matching spans many lines. Modern variants of REGEX include multi-line capabilities as an after-thought, but, PEG is fundamentally designed to deal with this kind of pattern-matching idiom. Using CFGs for this same task would devolve into a mega-project, since CFGs encourage programmers to specify every small detail, instead of simply “skipping over” characters/phrases that don’t match. The design of this OhmJS-based (PEG-based) pattern-match is similar in mentality to using “.*” in REGEX, with the addition that PEG uses a push-down stack which encourages writing mutually-recursive sub-rules that perform recursive matching, instead of writing flat REGEX expressions that are not recursive.
Once the Python 0D kernel was converted to pseudo-code using matched left and right brace brackets, an OhmJS grammar was written to inhale the code. Testing of the grammar was done using the Ohm-editor REPL. Once the grammar appeared to be satisfactory, an RWR script (“rt.rwr”) was written to rewrite the parsed code as Python. The Python generated in this way was regression tested against the hand-written Larson Scanner code, by simply replacing ‘py0d.py’ with the generated ‘rtpy0d.py’ and re-running the scanner to ensure that it still worked.
Now, that the pseudo-code kernel is seen to generate valid Python code, I have begun to edit a new RWR script to transpile the kernel into Common Lisp. In essence, I know that the pseudo-code kernel is “sound” (at least as good as the hand-written Python code), it becomes “easy” to edit the Python-generating RWR script to tune it to generate Common Lisp. Python and Lisp are at two different ends of the textual syntax continuum. If I can generate Python, and, generate Common Lisp code from the same pseudo-code kernel, I believe that it will be fairly easy to generate code for any language that is “in between” those two extremes, like Javascript. The major difference between Lisp and Python is the use of mutation for assignment. In Python, you create new local variables by assigning values to net-yet-seen variables. In Lisp, you use let bindings to create scoped variables bound to values. The Lisp version requires that Statements be recursively parsed, whereas in Python, Statements can be parsed one-by-one using Kleene-star/plus operators (zero-or-more-of and one-or-more-of operators, resp.). Kleene-star based grammars produce flat sequences of matches, whereas Lisp let syntax requires recursive, nested matches. Modifying the pseudo-code to accommodate Lisp generation will, likely, require rewriting the ‘Statement’ rule to parse recursively instead of using Kleene-star/plus operators.

Once everything is working, we might try to generate a 0D kernel in WASM code and build a Larson scanner from a diagram which generates and uses WASM.

Collaborators and Contributors

Zac Nowicki wrote the original 0D kernel that this work is based on. The Odin programming language was used for implementing the prototype 0D kernel. A Visual SHell (VSH) was, also, built by Zac. VSH allows using components written in any existing programming language with any of the 0D kernels, regardless of implementation language.

Rajiv Abraham, Steve Phillips, and many others, provided valuable critiques, questions and observations of these concepts.

Jos’h Fuller inspired the construction of a Larson Scanner.

Contributors are welcome. This code is WIP, but, I would be happy to provide explanations and discussion. Contact me if interested (see the Programming Simplicity Discord link below).

References

Draw.io

https://app.diagrams.net/

OhmJS

https://ohmjs.org/

Ohm-editor https://ohmjs.org/editor/

Blog:

https://guitarvydas.github.io/

Videos: https://www.youtube.com/@programmingsimplicity2980

Discord: https://discord.gg/Jjx62ypR

Leanpub: https://leanpub.com/u/paul-tarvydas

Gumroad: