Introduction
Somewhere in the last 30 years, building a CPU went from being something only an elite few could do to something anyone with a bit of free time could do. These days we are pretty much measuring in "CPUs per square millimetre", people compete for how many CPU cores they can fit in a FPGA (hint: thousands, with the CoreScore benchmark), and there are games taking you from relays to programming languages (inspired by courses and books taking you from NAND to Tetris).
The Zilog Z80 CPU in my first computer seemed magical technology at the time, but last year I implemented (a subset) of a Z80 CPU in Python. As but one part of an extended Rube Goldberg Machine.
The Challenge
One of the great things to come out of the 2020 Covid-19 lockdowns were entirely online conferences like ComfyCon, and PyConlineAU. These looked for ways of having "audience participation" while being entirely remote: ComfyCon had active sharing of pet pictures, home labs, etc; and PyConlineAU 2020 had an entire day -- Sunday -- dedicated to social events.
The PyConlineAU 2020 "Rube Codeberg" competition came out on the Sunday "audience participation" day, when Lilly Ryan challenged attendees to "unnecessarily elaborately" engineer a piece of Python code which:
was written in Python 3, up to 200 lines
ran successful to print out "Hello World!" somehow
used the Beautiful Soup Python library
was "understandable/easy to parse by humans", and yet ridiculously over engineered a la Rube Goldberg machines.
I did not end up writing something on the conference Sunday, as there were many other things to do that day, but since I had a great idea for what I "should have written", the following weekend I sat down for a few hours and actually implemented my solution, in Python 3.
In addition to the conference rules, I also:
wanted the text "Hello World!" to not appear anywhere in the program; and
wanted to use the (at the time not built into Python) Python switch/case implementation I had been fiddling with during PyConlineAU 2020, inspired by the talk and blog post by Christopher Neugebauer.
My solution
My brain decided the best way to combine all of this was to implement the program to print "Hello World!" in another language, and then implement a Python 3 program that could execute that other language implementation. Obviously 200 lines of "easy to parse by humans" Python code is not very much to implement an entire language in, so it was clear the best way to proceed was to implement something which was drive by byte codes. I briefly considered transpiling the language into Python byte codes, but I did not know the Python byte code implementation... which is when I realised the byte code implementation I knew best was the Z80 CPU. (I spent several years writing Z80 assembler, in the late 1980s and early 1990s, including a terminal emulation program.)
So that is how I ended up implementing (part of) a Z80 CPU, in Python 3, as a side quest of another project.
Obviously if you run the program it prints out "Hello World!" (assuming you have the required Beautiful Soup module installed from PyPI):
$ python3 hello
Hello World!
$
but the Python program does not contain that text anywhere within it, and the output takes a non-trivial amount of time to arrive:
$ time python3 hello
Hello World!
real 0m0.242s
user 0m0.199s
sys 0m0.032s
$
A quarter of a second might not seem long by historical standards of the Z80, but it is eons of time for a modern CPU.
One gets a hint at how much additional hidden depth is there by asking
for --help
:
$ python3 hello --help
usage: hello [-h] [--asm] [--bin BIN] [--html]
optional arguments:
-h, --help show this help message and exit
--asm Output assembly
--bin BIN Output binary to file
--html Output HTML
$
The assembly option gives you Z80 source code that can be assembled and run on a Z80 system:
$ python3 hello --asm
.ORG 0x100
LD HL,text
LD DE,key
decode: LD A,(HL)
OR A
JR Z,print
LD B,A
LD A,(DE)
XOR B
LD (HL),A
INC DE
INC HL
JR decode
print: LD HL,text
pc: LD A,(HL)
OR A
JR Z, done
LD E,A
LD C,2
PUSH HL
CALL 5
POP HL
INC HL
JR pc
done: RST 0
text: DB 0x18, 0x1c, 0x2f, 0x03, 0x01, 0x4c, 0x3e
DB 0x01, 0x17, 0x4c, 0x56, 0x11, 0x3f, 0x3a
DB 00
key: DB 0x50, 0x79, 0x43, 0x6f, 0x6e, 0x6c, 0x69
DB 0x6e, 0x65, 0x20, 0x32, 0x30, 0x32, 0x30
DB 00
$
in particular a Z80 CP/M system.
But you do not actually have to do that because the program can also
output the Z80 binary for you, ready to run with the --bin OUTPUT.com
option:
$ python3 hello --bin hello.com
$ ls -l hello.com
-rw-rw-r-- 1 ewen ewen 143 Sep 12 2020 hello.com
$ ../tools/cpm/cpm hello
Hello World!
$
(with the extra two newlines due to the CP/M emulator used for testing).
And finally the --html
option prints out HTML containing the
motivating text, and the assembler output
listing. (One might hope there was
a webserver providing that HTML, but alas I massively ran out of lines
for Python, so the HTML is a little BYO browser.)
The Implementation
The Z80 CP/M "Hello World" program
The first part of the implementation was that I needed a Z80 program
that would print "Hello World!", itself in a way that was
non-obvious: in particular I wanted it not to contain the literal
text as either a raw text string or in hexadecimal bytes. So there
is both the "text
" string and a "key
" string, both zero terminated,
that between the encode the bytes to print out. The "decode
"
section of the assembly simply walks the text
and key
bytes,
and XOR
s them together, forming a simple one time pad. Then the
print
section of the assembly walks of the decoded text, and
prints it out. In this case using the CP/M BDOS
conventions,
where the C
register contains the function, the E
(or DE
)
register contains the parameter, and you CALL 5
to access the
BDOS
functions. (Using the CP/M BDOS conventions was an obvious
choice both because I was familiar with them, and they are fairly
compact, but also because it enabled testing the Z80 program
independently, as shown above.)
The text
and key
were themselves generated with a small Python script:
#! /usr/bin/env python3 # XOR string A with string B, and print result as hex # # Written by Ewen McNeill , 2020-09-12 #--------------------------------------------------------------------------- import sys def xor(text, key): return [(ord(t) ^ ord(k)) for t, k in zip(text, key)] def main(): if (len(sys.argv) <= 2): print("Usage: {0} TEXT KEY".format(sys.argv[0]), file=sys.stderr) sys.exit(1) text = sys.argv[1] hexord = lambda x: hex(ord(x)) key = sys.argv[2] # Bourne style shells remove newlines in command line parameters during # command line substitution, so we may have to restore it to make our # plain text have the same length as the key # # https://unix.stackexchange.com/questions/383217/shell-keep-trailing-newlines-n-in-command-substitution # if (len(text) == (len(key) - 1)): text += "\n" if (len(text) != len(key)): print("Plaintext and key must be the same length! ({0} != {1})". format(len(text), len(key)), file=sys.stderr) sys.exit(1) print(", ".join(map(hex, xor(text, key)))) if __name__ == "__main__": main()
and for this challenge I obviously chose the required "Hello World!
"
text, and "PyConline 2020
" as an equal length string to be the one
time pad (since it is xor, the pad itself is visible in the assembly
output, but obscured by being specified as hex digits).
$ cat gentext
#! /bin/sh
./xor "$(printf 'Hello World!\r\n')" "PyConline 2020"
$ sh gentext
0x18, 0x1c, 0x2f, 0x3, 0x1, 0x4c, 0x3e, 0x1, 0x17, 0x4c, 0x56, 0x11, 0x3f, 0x3a
$
and then I translated those into Z80 Assembler formatting, and put it with the rest of the program I had written.
I assembled the initial program with the zasm -- Z80 Online Assembler for expedience, to avoid yak shaving a Z80 assembler locally as well. Which gave me the assembly listing included in my solution.
As shown above, that program prints out "Hello World!" when run under a Z80 emulator that understands CP/M BDOS printing conventions.
So I needed a Z80 emulator that understood CP/M BDOS printing conventions, written in Python.
The (partial) Z80 CPU implementation in Python
The Z80 emulator is (slightly hidden) in the hello()
function
of my implementation. Due to both time, and especially space, constraints
it implements just enough of the Z80 CPU to be able to run the example
program. The entire implementation is just a "forever" loop, that
executes one instruction per clock tick (not 100% realistic, as the Z80
instructions were variable length from 4 clock ticks upwards, but time
and space constraints limited the accuracy :-) ).
Each instruction is handled by a Switch
/Case
implementation
in Python,
inspired by Christopher Neugebauer's talk and blog
post, for
which the Switch
/Case
implementation can be seen near the top
of the
program,
and the individual implementations are compactly implemented in
the hello()
function. Since the cases all
needed distinct names (a limitation of the Switch
/Case
implementation),
they are named after the hexcode of the Z80 opcode they are implementing,
as that is both short and unique :-)
The majority of the implementations are trivial implementation of
a "Z80 State Machine" (as much as was required to execute the example
program), part from hcd()
(for CALL, addr
) which is special
cased just to handle the one CP/M BDOS printing routine required,
and hc7()
(for RST 0
) which I used as a "Z80 flavoured" end of
program sentinel, and simply leaves exits the Python program. There
are also a couple of helper
functions
to reduce the length of some of the implementations. The PUSH
/ POP
implementations are skipped for space reasons, as they turn out not
to be relevant since our CALL 5
implementation does not corrupt the
HL
variable, so we do not need special steps to preserve it (but they
are present in the original Z80 assembly, as the CP/M BDOS does corrupt
HL
).
It turned out to be important that the newline after the Python
function name is optional, in order to save some of the value 200
lines allowed, so all the short implementations are inline. But it also
turned out to be important I could support multi-line case statements,
an unstated requirement of the original
implementation.
Particularly LD C, immediate
and JR dest
/ JR condition dest
needed
multiple line implementations.
Putting it all together
At this point I had a Z80 (CP/M) program that would print the required text, and a (partial) Z80 emulator written in Python that would execute that Z80 program, and produce the output required. The remaining step was to bring the two together, and ensure that I used the required Beautiful Soup module.
Since I wanted to somewhat hide the Z80 program inside the Python program, to make it stand alone, and not immediately obviously connected, I came up with the idea of hiding the Z80 assembly listing output in a Python docstring at the start of the program. From which the obvious thing to do was to bury it in a (trivial, space constrained) HTML page which made fun of how "difficult" it was to do anything in Z80 assembler, and provided a nice counterpoint to the "understandable/easy to parse by humans" Python program that followed.
Using HTML, gave me a direct reason to use the Python Beautiful Soup module as required, which extracts the HTML, and assembly listing, without which the whole program would not work. From that assembly listing, we can get both the assembly input, and the Z80 opcodes for the program, give certain assumptions about the assembly listing output. (Fortunately it turns out the many 1970s/1980s derived output formats were very column aligned, thanks to a punch card / line printer influence!)
Which just leaves the top level program to parse the arguments and
dispatch to the right
function
which is not in main()
method, because I ran out of lines (199
out of 200!). It too uses a Switch
/Case
implementation
largely because it was there, and a fairly compact solution.
In the default run:
The
hello()
function (Z80 CPU partial implementation) is invoked on the output ofget_bin()
(which retrieves the Z80 opcodes from the assembly listing extracted from the Python docstring with Beautiful Soup)The
hello()
function then steps through the Z80 implementation to decode the string to be printed out in thetext
section of RAM, and thenThe
hello()
function steps through the Z80 implementation to print out the decodedHello World!
string, character by character
And hey presto, "Hello World!" is printed as required, simple as that. Taking only a quarter of a second, about 25x longer than a trivial implementation in Python:
$ time python -c 'print("Hello World!")'
Hello World!
real 0m0.012s
user 0m0.004s
sys 0m0.008s
$
(To be honest I am surprised it is only 25x slower, considering it has both an inefficient Switch/Case emulation, and an inefficient Z80 emulation to contend with!)
Most of the other run options also use Beautiful Soup, to get the
assembly listing, with the exception of --html
which simply prints
out the whole docstring "as is".
Despite various steps to reclaim lines of text as I was running up against the line limit, the program still feels surprisingly readable over a year later.
The original competition did not actually have
a line width limit, and Python does allow stacking multiple operations
on a single line separated by ;
-- semicolons -- but I tried to avoid
that except where required to reach the line limit. And to stick to
80 column lines for readability. Spending 50
lines on the "alternative" implementation in Z80 Assembler, and
another 20 lines on the Switch
/Case
implementation, did not
exactly help my line count constraints, but despite that I still
managed to retain another 33 entirely blank lines.
You can read my whole implementation on GitHub, and that also has links to all the other Rube Codeberg submissions I could find.
Conclusion
It was a fun program to write, and a nice distraction from the world outside in 2020. I was particularly surprised just how simple it was to implement a usable fraction of a Z80 CPU state machine in just a few hours, in a language entirely unsuited to state machine implementations. And it was a lovely loop back to one of the first open source projects I was involved with, "Libero", a Finite State Machine generator from iMatix.
It really is state machines all the way down :-)