Branches: No assembly required
Last time, we started from the barebones Miniforth kernel,1 and implemented branches by writing additional primitive words in assembly. For pragmatic reasons, that is the road I will be pursuing further, but I noticed that it is also possible to implement branches in pure Forth. I believe that this approach is quite interesting, so let's take a detour and get a closer look.
The relevant2 available primitives are:
+ - ! @ c! c@ dup drop swap emit u. >r r> [ ] : ;
Additionally, we have the implementations of the following from the previous post:
- system variables
- dictionary space access:
- defining words
- some miscellanea:
You can fire up Miniforth by following the instructions in the GitHub
repo. If you'd like to try implementing pure-Forth branches on your
own, now is the time to stop reading. Otherwise, we'll be studying the branches
purity, uh, branch.
(0branch) is compiled into a word, it will be immediately
followed by the branch target's address:
Implementing the unconditional branch isn't that complicated — manipulate the return stack to repoint the return address:
: (branch) r> \ pop the return address, which points at the cell containing jump target @ \ fetch the jump target >r \ make it the new return address ;
Choosing between values
Clearly, the difficulty in a conditional branch boils down to choosing between
the two possible values for the return address. This would be quite simple if we
0= — since Forth's
true has all bits set, we can
and with a
boolean flag to decide between a value of our choice and 0.3
This is easiest to use if we encode our branches as an offset, instead of an absolute address. In this case, the implementation would look like this:
: (0branch) 0= ( should-jump ) r> dup cell+ ( should-jump addr-of-offset retaddr-if-false ) >r @ and r> ( offset|0 retaddr-if-false ) + >r ;
Sadly, we don't have
0=. However, this is still a useful starting
point. Could we, perhaps, implement these words somehow?
Shifting the bits around
It would be nice if we could extract out individual bits out of a word. If we had that, we could implement bitwise functions by shifting out the input, computing what we need bit by bit, and shifting in the result:
Shifting left is easy enough, as that's just multiplication by two:
: 2* dup + ;
However, getting a bit to the least-significant position is trickier. If we leverage a memory access, though, we can extract the higher byte of a value:4
variable x : lobyte x ! x c@ ; : hibyte x ! x 1 + c@ ;
This is essentially an 8-bit right shift. Let's use this to check if a number is
zero. We'd need to OR all the bits together, but we don't have
Addition is somewhat similar, though, so let's count the bits in a value.
s ( c v -- c' v' ) handles one iteration of the "loop" — it will shift out a
bit out of the 8-bit wide value
v, and add it to the counter
: s 2* dup >r hibyte + r> lobyte ;
Running this 8 times will count the bits in a byte, so that's what
: nb 0 swap s s s s s s s s drop ;
bits in a
word) does the same for a full 16-bit value, by
nb on each half:
: nbw dup hibyte nb swap lobyte nb + ;
How do we turn this into a comparison with zero? We iterate
nb a few times:
nbw, you'll have a value that's at most 16,
nbw nb, you'll have a value that's at most 4,
nbw nb nb, you'll have a value that's at most 2,
nbw nb nb nb, you'll have a value that's either 0 or 1.
: 1bit nbw nb nb nb ;
Choosing between values
While we could use a similar bitshifting strategy to implement
and and choose
between the two return addresses using that, there is a simpler way: use the
1-bit value we compute to index into an array.5 We'll use a 2-entry
array called the
create bb 2 cells allot : (0branch) r> dup \ two copies @ bb ! \ bb = return address if 0 on the stack cell+ bb cell+ ! \ bb = return address if something else on the stack 1bit cells bb + @ >r ;
Other solutions: a time-memory tradeoff
While elegant, our solution is quite inefficient, executing thousands of instructions on every branch. While I wouldn't expect the best performance when we're limiting ourselves in this manner, there still are ways to make this better.
For example, we could prepare a 256-byte lookup table for
1bit. Since we don't
have any way to loop, we'll need to repeat things manually. Since 255 = 3 × 5
× 17, it could look like this:
: x 1 c, ; \ write 1 one : y x x x ; \ write 3 ones : z y y y y y ; \ write 15 ones create tab 0 c, z z z z z z z z z z z z z z z z z \ 17 times : 1bit-half tab + c@ ; : 1bit dup hibyte 1bit-half swap lobyte 1bit-half + 1bit-half ;
Is that all?
Yup, we're done. The rest of the code needed to define
and other control-flow words looks exactly like in the previous post.
You might ask, is that everything we need for Turing-completeness?6 Perhaps there's a primitive we won't be able to define for some reason? I don't think we need to worry. Our branch can be used to implement a loop-until-zero control structure, and that's all brainfuck has.
Thus, I will end this digression here and continue bootstrapping without artificially limiting my usage of assembly. Next on our agenda, we've got Forth's exception handling mechanisms, and how to extend them for better error messages than the bare minimum usually encountered in Forth.
If this is your first time here and this journey of bootstrapping sounds interesting, you can subscribe to the RSS feed or follow me on Twitter to get notified of future posts. See you next time!
The word "kernel" is used here in the language implementation sense, and not the operating system one. If you know a better term than this, please let me know, as there will be a point where I'll have to talk about both things at once...
s:, since they won't help, and describing
them is out of scope for this post. I describe them in the previous
post if you're curious.
Recall that x86 is little-endian, and as such, a value like
1234 is stored as
34 12 in memory.
I believe I learned this technique from the M/o/Vfuscator.
Well, since memory is finite, everything we've actually ever made is just a very large state machine. I suppose a closer notion would be LBA-completeness if we're being pedantic, but I wouldn't hope for a fully formal definition that captures what we usually mean by "Turing-complete" when talking about things that actually exist.
No branches? No problem — a Forth assembler