Calling convention dilemma

Statically typed languages usually have knowledge of how they should pass arguments to a function, because that function's signature is known in advance. If you look at the x86 C calling convention for example, you'd see that either parameters are passed through registers (for small values like integers or pointers), or on the stack (for larger values).

Even if you don't know what exact function you're calling (in case of function pointers), the prototype of that function tells the compiler everything it needs to know to produce the platform-specific machine code.

For dynamic languages, it is different. The prototype is not known in advance, and so you'd have to rely on a higher-level construct. For example, you always pass one parameter, which is an array. Or two parameters, one of which is an array and another is a hash table (for Python-like keyword arguments).

I've suddenly found myself somewhere in the middle. I design my programming language VM to be specifically made for dynamic languages. But still, the architecture that it uses has been derived from a RISC CPU. It has a stack, but opcodes deal exclusively with registers, and you need to call explicit load/store.

Having registers means that it would be nice to be able to pass parameters in them, in case the number of arguments to the function is short enough. And it would help with optimizing tail calls as well (less shuffling of the stack).

The problem here can be demonstrated on a "print" function. Imagine that it's just a built-in that accepts an arbitrary number of arguments. In a made-up assembly, it would look something like this:

;; Load two constant strings
;; into registers r0 and r1
loadc r0, hello
loadc r1, world

;; How to detect arg count?
call print


    hello: "Hello"
    world: "World"

In this example, the print function has no way to know that it should use registers r0 and r1. Even if the calling convention allows passing arguments through registers. It just has no way to know that there are two arguments (it could've been more). And even if we are not talking about a "variadic" function, we may just have a pointer to it and thus no way to inspect it.

What I'm leaning towards in this case is to embed the argument count into the low-level "virtual CPU" calling convention. So instead of call print, you'd have call 2, print which would have an "immediate" value encoded into the opcode. When this opcode is executed, it would set up a new call frame and put the information about the number of parameters into the frame itself (along with the return address and a link to the previous call frame). print can then look at the frame and deduce the correct number.

The benefit of this approach is that the caller always knows the number of arguments exactly. And the callee may then take up to a certain pre-defined number of arguments from the registers, and the rest from the stack.

You may be wondering -- why go to all these lengths when a purely stack-based virtual machine would be much simpler and probably already solves these problems? Well, the most straightforward answer to this would be that I want to make the VM a good target for code generation. Reading the code generated for a register-based machine is a lot easier than for a stack-based one. Same for debugging.

But at the end of the day, it's just fun to do. So let's see how it goes.