Virtual Machine debugging and GDB pretty printers
My virtual machine has passed a 4000 lines of code mark. Since it's written in C, I occasionally get crashes or assertion failures. In part this is mitigated by the Address Sanitizer and GCC -fanalyzer
flag, but some of the problems still trickle in.
Unit tests in both C and the VM assembly help narrow down the area sometimes, and act as a good second line of defence. But occasionally they are not enough, and then I roll up my sleeves and fire up GDB. I must admit that I like GDB a lot, and am quite proficient at navigating frames and using it in just the command line mode. With the Virtual Machine however I've faced a tricky case.
You see, in normal C programs you usually don't have a lot of dynamism/polymorphism, because that's kind of the point of a statically typed language. You want as much information as possible available at a compile time so that the compiler can figure out the control flow and apply optimizations. In my case, I have one primary value type called tagged_value_t
, which can be anything. It is a tagged data structure that can be an integer, a pointer to a string, a pointer to a dictionary, a string slice, and many more. This structure is what the VM opcodes receive and return. It is also extensively used in tests and will be used as part of the SDK / C bindings.
Having a pointer to such data structure means that you can't just print it in the debugger. What you'd get as a result would be an enum of pointers and a tag, where you'd have to decipher the valued of the tag and then follow the correct enum field. This is very time consuming and confusing in certain cases. Especially when you have to debug a problem in nested data structures (say, an array of arrays or arrays of dictionaries). This is a demonstration from the GDB CLI:
(gdb) p cursor
$1 = {tag = 11 '\v',
immediate = 0,
value = {
u64 = 106721347371072,
i64 = 106721347371072,
u32 = 64,
i32 = 64,
u16 = 64,
i16 = 64,
u8 = 64 '@',
i8 = 64 '@',
f64 = 5.2727351413936707e-310,
pv = 0x611000000040,
ov = 106721347371072
}
}
In this case, you can probably look at the value 64
and guess that it's some kind of integer. But you would be wrong! It just so happens that the layout of memory is such that taking a lower 32 bits of an address gives you 0x40
. If you look up what tag 11 is, you'd see that it's a pointer. And you should follow the pointer field of the value
enum.
Now you can see how that can become really confusing really fast. But what can you do with it? Isn't this something that other people working on complex codebases should have solved a long time ago? The answer is "yes". You need to use GDB's support for Python scripting, and specifically the pretty-printers. GDB allows you to define a function that turns a specific structure type into a string for the purposes of printing it in a debugger. You write a Python program that registers such pretty printer, and it will allow you to evade manual labor. Let me demonstrate to you the same example as above, but with the pretty-printer:
(gdb) source prettyprint.py
(gdb) p cursor
$3 = " ;; Immediate operations (64-bit signed int)
li r0, 42
addi r0, r0, 5
aeqi r0, 47
;; return to caller
ret
"
Now, this is much better! We can see that the cursor
object represents a string, and contains assembly code. The source prettyprint.py
here is an instruction to GDB to load my pretty-printer.