Text to binary and back

In the last couple of days I've finished an important part of the virtual machine, that allows to translate objects from the binary representation in memory to a text form, and back.

For example, take this data structure:


;; this is an array
[
1 2 3 ;; numbers
foobar ;; symbol
{"foo" "bar"} ;; dict with str key
]

If you save it into data.txt, you can convert it into the binary format:


cat data.txt | ./sd > data.bin

The data.bin file would contain the encoded version, which we can examine with the standard xxd tool:


cat data.bin | xxd

00000000: 0000 0000 0000 0000 90c8 0000 0000 0000  ................
00000010: 0500 0000 0000 0000 0700 0000 0000 0000  ................
00000020: 0100 0000 0000 0000 0700 0000 0000 0000  ................
00000030: 0200 0000 0000 0000 0700 0000 0000 0000  ................
00000040: 0300 0000 0000 0000 9968 0000 0000 0000  .........h......
00000050: 4000 0000 1061 0000 9588 0000 0000 0000  @....a..........
00000060: 4000 0000 1061 0000 0600 0000 0000 0000  @....a..........
00000070: 6600 0000 6f00 0000 6f00 0000 6200 0000  f...o...o...b...
00000080: 6100 0000 7200 0000 0100 0000 0000 0000  a...r...........
00000090: 91b0 0000 0000 0000 4000 0000 1061 0000  ........@....a..
000000a0: 91c4 0000 0000 0000 4000 0000 1061 0000  ........@....a..
000000b0: 0300 0000 0000 0000 6600 0000 6f00 0000  ........f...o...
000000c0: 6f00 0000 0300 0000 0000 0000 6200 0000  o...........b...
000000d0: 6100 0000 7200 0000                      a...r...

And you can also decode the binary data back to the text form:


cat data.bin | ./sd -d

[1 2 3 foobar {"foo" "bar"}]

This would be exactly the same data structure we had initially, just without comments.

You can also do the same trick with bytecode produced by the assembler, since bytecode is also serialized as a frozen data structure:


cat examples/factorial.asm | ./asm | ./sd -d

[[31372u32 3084u32 2956u32 246661u32 50178u32 4294913042u32 779u32 3342u32 7447u32 24u32] ["! is"]]

And we can even turn the bytecode back from the text representation to the binary, and run it:


cat examples/factorial.asm | ./asm | ./sd -d | ./sd | ./vm

15 ! is 1307674368000

This serialization/deserialization mechanism is important, since it serves also as a parser. My programming language is based on S-Expressions, so the program is already represented as a hierarchical data structure that can be loaded with the same mechanism, directly to the VM data structures. Of course I still need to work on the compiler that would be smarter than just an assembler, but it is a good start.