VM progress update: strings, slices and function bindings
Last few days I've been working on getting the initial string support landed in the VM implementation. As soon as I finish this, it would be possible to write programs in assembly that can show something to the user.
String implementation actually consists of two things:
- strings themselves (objects on the heap that contain a character array inside and its size)
- string slices (that can reference parts of the string without creating a copy)
String slices are convenient because in theory they can be small enough to be put into registers or stored on the stack. This means that code that walks the strings (parsing, splitting, etc) won't put a lot of pressure on the garbage collector. And since strings are immutable, it should always be safe to keep the slice around.
From the garbage collector's point of view, slices point to the beginning of the string, and contain a range. This allows to hold the original string in memory if you have a slice pointing to it.
In theory, string and array slices should work the same way. I don't yet have array slice support, but in the end it would likely be just the same VM opcode for both.
The latest patches also removed a custom implementation of slices from the assembler code, and it's now based on the same functionality that the VM uses. It complicated the assembler code a little bit, as I have to carry around VM data structures in order to do memory allocations. This is because memory arenas aren't flexible enough to do allocation of arbitrary sizes. The arena has a fixed limit, and as soon as you hit it, a garbage collection is triggered. In the VM bytecode that doesn't pose a problem since registers and stack are serving as GC roots. In the assembler which is written in C, the GC roots are spread around the code and it's not easy to wrap them.
Fixing the GC issues is a matter of a separate implementation, where I would borrow a few ideas on how memory arenas work in Zig. It would allow me to safely work with the VM memory from C code, and only do GC once the execution fully leaves the C procedure. This mostly means implementing a linked-list of memory pages (pretty much what malloc does).
All in all, a few more steps and I would be able to implement a print
function and be able to write a "game of life" in the VM assembly. Can't wait for that to start working.