Better assembler and stack-based register file

During the last few weeks, I've made a lot of changes to the programming language I'm working on. It looks like I'm close to being able to write actually useful programs in it (at least on the level of assembly). So let's go over the interesting parts.

S-expression-based assembler

In the previous post, I described the "reader" and "writer": two modules that allow to parse and serialize data structures in the form of extended S-expressions. Originally, S-expressions are just a way to represent nested lists containing "atomic" values. My extension allows to represent dictionaries and arrays as well.

Here's an example such data structure represented as an S-expression:

;; a list
1 2 3 ;; numbers
foobar ;; symbol
{"foo" "bar"} ;; dict with str key

Since my language would be based on Lisp, all programs will be written as sequences of S-expressions. In Lisp, function calls look like this:

(function_name arg1 arg2 arg3)

This would call function_name with 3 arguments. Because of how universal this notation is, any program can be represented this way. For example, this is what factorial implementation looks like:

(defun factorial (x)
   (if (zerop x)
       (* x (factorial (- x 1)))))

My "reader" can already successfully parse the syntax tree of this program and turn it into a data structure. I don't yet have a compiler for the high level language that would turn it into bytecode, but I already have an assembler.

The only problem with the assembler was that I had to implement a custom parser for it. For example, this is a program from one of the older posts:

; Load value '2' into register r0
li r0, 2
; Multiply register r0 by 2 and
; put result back to r0
mul r0, r0, 2

As you can see, it has a custom syntax where a mnemonic needs to be writen first, and then followed by arguments separated by a comma. Because of the subtleties of this format, it has taken around 700 lines of C to implement and was very hacky. So I thought to myself: if assembly programs are just a list of opcodes with arguments, then maybe I can write them as S-expressions as well and save on writing a custom parser?

And this is what I ended up doing. Now the same program would look like this:

; Load value '2' into register r0
(li r0 2)
; Multiply register r0 by 2 and
; put result back to r0
(mul r0 r0 2)

And here's an example of evaluating a factorial of 15 and printing it to the screen:

(const c1 "! is")
(const max 15)
(const one 1)

(sr 10)

(mov r0 max)
(mov r3 one)
(mov r2 one)

(label factorial)
(mul r2 r2 r3)
(add r3 r3 one)
(jle r3 r0 factorial)

(label end)

(lfi r5 print)
(mov r6 r0)
(mov r7 c1)
(mov r8 r2)

(call r5 3)

Stack-based register file

Initially, I designed the virual machine loosely after a RISC-V architecture. There were few opcodes, a stack, and 16 general-purpose registers. Of course, since the virtual machine is not real hardware and is targeting a dynamic language - it means that the registers and stack were tagged with a type identifier of an object they point to. But otherwise the design was quite close.

Eventually, when I started thinking how I'm going to compile actual high-level code to the virtual machine bytecode, I've found a problem called "register allocation". Usually, when you compile a C or C++ program, you have local variables in your functions. Those variables live on a stack, and are loaded into registers when your program performs computation on them (say, adding two variables). Because operating with registers is very fast, and loading data from a stack is a lot slower, compilers have an intricate optimizer that would try to use registers to store and access the intermediate results as much as possible.

On one hand, I don't want to write a complex register allocation optimizer. On the other hand, I don't want to write the virtual machine purely as a stack machine. Mostly because stack machines shuffle around values too much, and it's very hard to read their disassembly because of this. Try reading Python disassembly, and you'll see what I mean.

After thinking about this for a few evenings, I had a thought: if in my virtual machine there's no difference in performance between accessing a register and a value on the stack - maybe I can put registers on a separate stack? In this case, I can have essentially unlimited number of registers available for a function to work with, and it will have its own set of registers that are independent from the registers of the function that called it.

Excited, I did a bit of googling to see if there are any other runtimes that did this, and sure enough some did! This was the case for CLR (.NET runtime) and Lua virtual machine. I didn't want to dive into CLR, because it was too big, but this Lua 5 paper was short and lovely, and described everything I needed.

Changing the approach to accessing registers wasn't easy, and took almost a whole day of refactoring. But after finishing that, I can write things like this:

(const c1 42)
(const c2 5)
(const c3 47)
(sr 3)

;; Call a function that
;; adds two numbers and
;; puts result back to r0
(setjump r0 add2)
(mov r1 c1)
(mov r2 c2)
(call r0 2)

;; Check that r0 == 47
(aeq r0 c3)


;; Function that adds
;; two numbers
(label add2)
(sr 3)
;; These registers are
;; not the same as the
;; caller's
(add r2 r0 r1)
(ret r2)

In this example you see only a few registers used, like r0, r1 and r2. But the current implementation allows you to go all the way up to r65536, which should be plenty for addressing local variables of pretty much any program.

Now, with this freedom of using large range of registers, writing a compiler would be a lot simpler.