I've ported my language from C to C++ (a story of error handling)
I've been writing my programming language in pure C for quite some time, but recently I decided to port it to C++. The key problem that made me do so is error handling. While I was working on the bytecode virtual machine, it was all relatively simple. The virtual machine is just a large switch over the opcodes with relatively trivial functions for basic arithmetic operations, jumps and conditions.
As I started to work on the parser and runtime data structures, the code quickly became hard to reason about. This is in part because I decided to gracefully handle memory allocation errors. To understand the issue, let's consider a simple function, assoc_get
, which takes an indexable object and returns a value at index:
Value obj = mk_array(10);
Value index = mk_i64(5);
Value val = mk_i64(42);
// Writes "42" at array index 5
assoc_set(obj, index, val)
Value res = assoc_get(obj, index);
Now, there are 2 possible error cases here:
- The index can be out of range
- We couldn't allocate memory for a temporary value on the garbage-collected heap
In both of these cases, what should be the value of res
and how would we know that an error has occured? One of the options to deal with this is setting an errno
and returning some sort of "placeholder" that doesn't mean anything (e.g. nil
). Another is using "out parameters" like this:
Value res = mk_nil();
ErrorCode rc = assoc_get(&res, obj, index);
if (rc) {
// clean up and return
}
There are also more obscure ways that some of the interpreters utilize, like doing setjmp()
somewhere at the entry point of the virtual machine loop, and then longjmp()
if there's an error down the line. This works in some cases, but it easily leads to resource leaks.
What would be really awesome is if C had some sort of sum types, or ability to return two values from a function - a result and an error (pretty much like Zig or Go both do).
Initially I tried to bolt on the sum types by introducing separate structs like:
struct ValueOrError {
Value result;
ErrorCode error;
};
Following this approach, I've refactored the code so that all functions that can return an error would return such sum type. Like this:
ValueOrError res = assoc_get(obj, index);
if (res.error) {
// clean up and return
}
// do something with res.result
This worked, but it required too much ceremony and cluttered the code. Now for every separate type that would be returned from a function, I had to create a "wrapper" type that essentially implements a respective result type.
Eventually it led to a state where working on the codebase was no longer fun. Instead of implementing the logic, I had to be very verbose all the time. The worst of all is that refactoring the codebase became too taxing. Since error handling code needed to know the underlying structure of objects, every time I changes interfaces, things started to break in too many places at once (and often in runtime).
So finally, I gave up and decided to use C++ where you can implement a Result
sum type. My reasoning was that I can still go pretty minimal and disable exceptions, RTTI, and probably even at some point get rid of the standard library. But what I would get in return is a sane and clean error handling.
Imagine something line this:
Result<Value> sum(Value array) {
size_t size = TRY(assoc_size(array));
int res = 0;
for (size_t i = 0; i < size; ++) {
Value val = TRY(assoc_get(array, mk_i64(i)));
res += TRY(val.get_i64());
}
return mk_i64(res);
}
The interesting part here is the TRY
macro. It would automatically try to unpack the Result
object. If it contains an error - it would return the error from the current function. If not - the result of the expression would be the unpacked content of the Result
.
The implementation of the TRY
macro is pretty straightforward:
#define TRY(m) \
(({ \
auto ___res = (m); \
if (!___res.has_value()) return ___res.error(); \
std::move(___res); \
}).release_value())
The most interesting part here is ({ ... })
. This is a so-called "compound statement expression". It's a GCC and clang extension, that allows you to have one expression that consists of multiple operations. The value of the last one is what would be treated as a result of the expression. This is what allows you to call return
from within the expression, which is otherwise not possible (since return
is a statement).
If you use this macro, the code becomes easy to read. You immediately see which functions can fail, and can bubble up errors concisely to the place that knows how to deal with them. It is almost as easy to use as exceptions, with the added benefit of being explicit.
The reason I want to avoid exceptions is mainly because I would like to make my language embeddable, and exceptions don't play really well when you mix them with different language runtimes.