Procedures
A procedure is a straight-line, single-threaded set of machine code operations. The operations are called “opcodes” and are usually encoded as one or more binary words whose size depends on the CPU itself.
In the MC6800, released in the early 1970s, words are 8 bits wide. Opcodes for the MC6800 are multiples of 8 bits. An 8-bit word is called a byte.
In more modern CPU architectures words are 64 bits wide.
The word width depends on a number of design trade-offs. These trade-offs involve cost of manufacturing and speed of memory access.
For example, modern programs store and access visual data media that contains many colours for each pixel. Such rich data uses many bytes for storage of even single pixels. On an 8-bit machine such rich data requires at least one unit of time (a “clock cycle”) to access each byte of the data. On a 64-bit machine, four bytes can be fetched from memory or poked into memory in one unit of time. This means that it is faster to access rich data using a 64-bit CPU than using an 8-bit CPU.
A trade-off is that 64-bit CPUs use up more space on CPU chips, with an attendant increase in price. Evolution of techniques for miniaturizing chips has resulted in a reduced cost for manufacturing chips, hence, making it feasible to sell affordable large-word CPUs.
Implementing Subroutines
Often, multiple procedures use the same, repetitive code sequence.
In such cases, the repetitive code sequence can be cut from all procedures and pasted into a single, CALL-able subroutine.
A subroutine is just a space-saving and maintenance convenience. To implement subroutines, CPUs must implement CALL and RETurn instructions and must designate a special place in memory to store bookmarks.
Usually the address for the bookmark area is stored in a designated CPU register, hence, reducing the number of registers available for general purpose use.
There is a small cost for using subroutines - the code contains extra opcodes and it takes time to perform CALL and RETURN operations. In general, though, there is a net savings when subroutines are used.
In the early days, machine code was hand-written by assembler programmers. This made it vital to improve maintenance by reusing and centralizing chunks of code, using a strategy of “don’t repeat yourself” called DRY. Today, most machine code is automatically written by compiler programs. DRY is not as important for automated code generation.
The terms “subroutine” and “procedure” are frequently employed interchangeably, yet in this article, I have opted to assign distinct meanings to each. A “procedure” refers to a script, recipe, or routine composed of opcodes. Conversely, a “subroutine” denotes a helper sub-procedure that facilitates maintainability by reducing redundancy and conserving code memory.
Implementing Functions
It is possible to implement functions using subroutines.
Functions receive input data. This kind of data is called parameters.
Functions produce output data. This kind of data is called return values.
To accomplish all of this with simple, off-the-shelf CPUs, such as those which were introduced in the 1970s, we need to add code and we need to designate space in memory for passing data back and forth.
To support function recursion and reentrancy, we use a chunk of memory in a stack-like manner.
Designated registers can be used to speed up binding of parameters and fetching of returned values. Again, this impacts the number of CPU registers available for general use.
The call stack contains all sorts of data types for parameters and for return values and for continuation (return) addresses. The use of single, heterogeneous stacks was driven by cost and efficiency concerns in the early days of computing. Today, with more memory and virtualization technology, it should be possible to create separate stacks for each data type used as parameters and continuation addresses.
FORTRAN differentiates between procedures and functions. Functions can be used in expressions whereas procedures and subroutines cannot. Fortran does not guarantee that functions are “pure” in the functional programming sense, i.e. functions in Fortran are not guaranteed to be side-effect free.
C did not differentiate between procedures and subroutines and functions at the language level. Every kind of software unit was called a function and could be used in expressions (at programmers’ peril). The first versions of C (K&R) specified that the default return value of functions was int. Later versions of C specified the use of void functions to declare subroutines that returned no value. In essence, C implements parameterized, value-returning subroutines.
Functions That Call Other Functions
The situation becomes more complicated when functions are allowed to call other functions.
Here, I’ve diagrammed the idea that function X calls function Y.
The calling procedures A and B have “no idea” that X is implemented using a call to another function. What if Y calls yet another function Z?
In general, we can’t know how deeply the function-calling goes, so we are compelled to create workarounds, like virtual memory for code, called paging.
Virtual memory techniques are indeed essential for supporting larger and more complex data formats. However, this does not necessarily imply that code itself should be virtualized. Treating all code as functions and expanding larger systems solely by adding more functions often necessitates the use of code virtualization and large, preemptive operating systems. Consequently, all programs are indirectly burdened with the amortization costs associated with code virtualization.
Simulation of Concurrency
By design, CPUs are single-threaded interpreters of operation sequences.
Due to the high cost of early CPUs, and a desire to escape the batch-processing mentality, time-sharing techniques were developed that allowed multiple programs to run on single CPUs by giving small slices of CPU time to each program.
Even early CPUs were fast enough to “fool” humans into perceiving that the programs ran in parallel, concurrently. In fact, no actual hardware parallelism occurs. This technique of time-slicing evolved into what is currently called multi-tasking and continues to give the impression to humans that actual hardware parallelism is occurring.
Function-calling, by definition, dictates ad-hoc control-flow behaviour, i.e. callers suspend (“block”) operation while waiting for callees to conclude operation. To break and regain control over such ad-hoc control-flow decisions imposed by the use of the functional model of programming, extra software, in the form of preemptive operating system schedulers, needs to be added to software systems.See Also
Email: ptcomputingsimplicity@gmail.com
References: https://guitarvydas.github.io/2024/01/06/References.html
Blog: guitarvydas.github.io
Videos: https://www.youtube.com/@programmingsimplicity2980
Discord: “programming simplicity” https://discord.gg/65YZUh6Jpq
Leanpub: [WIP] https://leanpub.com/u/paul-tarvydas
Twitter: @paul_tarvydas
Substack: paultarvydas.substack.com