VM progress update: hashes and function calls

Today I'd like to tell you about 2 new opcodes I've added to the virtual machine. The first is for calculating hashes, and the second is for calling variadic functions.

Just as a reminder, I'm designing my virtual machine to be very high-level. Even though it's a register VM, it contains opcodes for many typical data structures such as lists, arrays, strings, slices and others. Ideally, generating bytecode for such machine should be trivial in any language.

Having all data structures defined on the level of the virtual machine and assembly is important, because the VM will make heavy use of frozen data structures. Frozen data structures can be loaded from disk or sent through network, and be accessed without deserialization. To support frozen data structures, every piece of the VM needs to understand their memory layout - starting from the bytecode evaluator and up to the garbage collector and FFI. And since there is a frozen couterpart for every standard data type, it means that all such basic data types would be "fixed" in the implementation.

Now I'm working on an implementation of a hash table and a hash set, which should conclude the list of basic data structures. They turned out to be a little bit trickier than the rest.

One of the properties I would like to have in addition to freezing is total order. Which means that every frozen data structure including a hash table should be comparable for equality/inequality and greater/less. This means that hash tables must have predictable layout. This is not true for many implementations of hash tables in existing programming languages.

The reason for that is how they deal with handling collisions. If one uses open addressing, or just use lists to store elements that have the same key, the order in which you insert items would matter: keys will be in different order when you compare two hash tables.

One of the solutions to this is how C++ handles the map class: it's implemented as a red-black tree which has predictable rebalancing rules. Independently of which order you use to insert elements, you'd always end up with the same layout and order of keys.

I'm gradually leaning towards either a hash-trie or a list-based hash table where the colliding leafs would be sorted. This would be a little suboptimal speed-wise, but should be fine for now.

Anyhow, while I'm working on the hash table which I can't yet demonstrate, here are two new opcodes.

hr - opcode for calculating a hash value

This opcode has two parameters that are passed as registers. The first is where the result will be stored, and the second contains an object for which hash value would be computed. It can be an arbitrarily complex nested data structure without loops.

jafi - calls a variadic procedure with specific number of parameters

The first register passed to this opcode is the address of the procedure/closure, and the immediate parameter is the count of arguments. Currently you can pass up to 12 arguments in the registers as per convention. But as soon as I extend this, you'd be able to pass the rest through the stack. Usually, the functions won't have that many arguments, so mostly you'd just have the register parameters.

Example usage

This is a simple example that demonstrates the new opcodes:


;; Loads a constant string to register r1
loadc r1, "foobar"

;; Calculates a hash of the string and
;; stores it in r0
hr r0, r1

;; Loads the address of the "print"
;; function in register r5
lfi r5, print

;; Calls function that is stored in r5
;; with 1 argument (r0)
jafi r5, 1

;; End the execution by returning
;; from the top level
ret

As usual, all assembler instructions are turned into 32-bit opcodes and serialized to a frozen uint32 array. This array would then get loaded by the virtual machine and executed.