VM progress update: simple garbage collection

Today I've reached another milestone with the virtual machine. It can now allocate memory and perform garbage collection.

It is interesting that I don't yet have a high-level language implemented, but the garbage collection is still useful. In bare-metal assembly, you have to track memory allocations and deallocations yourself, since the memory is unstructured, and a potential garbage collector won't be able to differentiate between pointers and integers. In my virtual machine (and its assembly language), all memory is tagged and structured, and thus if you have a pointer to the virtual machine data structure, the garbage collector is able to walk it. So even the assembly/bytecode will soon be able to work as if memory is not a concern.

For the implementation of garbage collection, I've chosen a modified Cheney's algorithm. This algorithm contains two distinct "heaps", one of which is currently in use for allocation. If it runs out, the algorithm would walk all objects that are reachable from the registers and the stack of the virtual machine, and move them to the other heap. The previous heap would then contain only dead objects, and can be grown/shrunk for the next cycle if needed.

For any practical purposes, Cheney's algorithm would be quite slow, and many real programming languages contain multiple heaps (python has 3) and vary the size of the heaps to reclaim short-lived objects faster. However, I predict that the naive algorithm would get me pretty far along the path to the implementation of the actual language, so I don't care that much about it now.

What follows below is a piece of code that demonstrates the C interface for interacting with VM memory allocation and GC capabilities. It is taken almost verbatim from the unittest and annotated.


# Allocate all structures required by the virtual machine
vm_t* vm = new_vm();

# Allocate an array of 10 elements
tagged_value_t arr = vm_mk_array(vm, 10);
array_set(arr, 0, mk_i64(42));

# Save pointer to the array in a register so it survives garbage collection
vm->registers[R1] = arr;

assert(arena_generation(&vm->arena, arr) == 0,
       "Array should be in first generation\n");

# Perform garbage collection
vm_gc(vm);

# The array has moved, so retreive a pointer to it from the register
arr = vm->registers[R1];

# Check that the array contains correct value at 0-th position
tagged_value_t val = array_get(arr, 0);
assert(op_eq(val, mk_i64(42)), "0-th element of an array == %ld\n",
       val.value.i64);

# Make sure that the array has really moved between generations
assert(arena_generation(&vm->arena, arr) == 1,
       "Array should be in second generation\n");

The API is in place, so the next step would be to add a few instructions to the bytecode that can call memory allocations, and write an inefficient merge sort (the one that would copy sub-ranges).

As usual, the code can be found here (note that it's experimental, so don't expect any clarity).