New spin on a C++ garbage collector
While doing another round of implementing my programming language in C++, I've reached a stage where garbage collection works and is integrated with the "host language". An example is worth a thousand words, so here you go:
StaticArena<64 * 1024 * 1024> arena;
TEST_CASE(dict_insert) {
auto d = DIEX(Dict::create(arena));
d = DIEX(d.insert(arena, 1, 2));
d = DIEX(d.insert(arena, 1, 3));
d = DIEX(d.insert(arena, 3, 3));
d = DIEX(d.insert(arena, 0, 4));
d = DIEX(d.insert(arena, 0, 5));
d = DIEX(d.insert(arena, 2, 6));
DIEX(arena.gc());
auto s = DIEX(write_one(arena, d));
DIEX(arena.gc());
ASSERT_EQUALS(s, "{0 5 1 3 2 6 3 3}");
}
Here, Dict::create()
creates a dictionary and places its data into a garbage-collected arena. The variable d
that is placed on the C++ stack is just a garbage collector "root". When the object is moved by the garbage collector between different arena heaps, the roots get re-pointed to the new locations, so it is safe to operate with such variables and place them to other C++ objects.
I then call d.insert()
to insert key-value pairs to the dictionary. Note that this function returns a new object that I assign back to d
. This is because in my language objects are actually immutable, and d.insert()
creates a new dictionary instead of changing the old one.
The call to arena.gc()
performs a full garbage collection cycle and switches the primary and secondary arena heaps, achieving data compaction and removal of objects that are no longer accessible (here - previous versions of dictionary d
).
Then I convert dictionary into a string with write_one()
, which is a way to get textual representation of any hierarchy of objects.
Note that arena.gc()
is not the only way to trigger garbage-collection. Any memory allocation that overflows the active heap will trigger a GC cycle, and it means that you can't normally hold any direct pointers to such heap and need to always use gc roots to refer to objects.
Most importantly, the example doesn't use any additional dynamically allocated memory apart from a contiguous region occupied by the arena
. This is a property that I would like to keep, as it would make embedding into other programs easier: you'd always keep data of the host program and the extension language separate.
The end result looks pretty satisfactory to me to a degree where I think I can start prototyping a bytecode compiler.