VM progress update: dicts and parsing
This would be a short update, because I don't have much time at the moment to write. But still, the virtual machine progresses at a steady pace. I feel that I'm pretty close to having things in place to write a compiler on top of.
First, I've added support for proper dictionaries. As I wrote in the previous post, the dictionary implementation is based on the red-black trees, which means that I had to carefully refactor all existing basic data structures, so that they all have "strict total order" comparison capability. The dictionaries can be "frozen" as well as other data structures, in which case they turn into a binary search tree represented as an array (as there's no way to resize a frozen data structure, this is OK).
Then, arrays no longer have special case implementation for storing integer types. I thought this would be a good idea, but their implementation details have leaked into other pieces of the codebase, causing bloat and bugs. I've since refactored the special cases away, and things became much simpler. It means that an array of 64-bit integers is 2x as large as it may have been, but it's fine right now.
Arrays have received support for in-place resize. You can take a pointer to an array and increase or decrease its size. It works in a way that doesn't invalidate existing pointers to the array.
Strings are now represented as arrays of 32-bit UTF-8 codepoints. Initially I thought that for the sake of compactness it's better to just store variable length encoding internally, but the lack of random access to characters has messed up a few of the things that depended on strings. So I've rewritten them to have a less compact form, but be more user-friendly.
There is now support for proper boolean values. Initially I thought that having the "truthy" 't
symbol and nil
would do. But then I thought that many external real-world apps do care about proper boolean types (e.g. anything that consumes JSON). So I've added boolean type as a first-class citizen.
And finally, there is now a "writer" and a "reader". The purpose of the "writer" is to serialize data structures into the text form. This can be used to print data structures to the screen, or debug the raw frozen slices taken from memory. The "reader" does things in reverse - it takes the textual representation and turns it into the language's data structures.
Because I'm basing the language on S-expressions, it means that the textual representation that the "reader" consumes is almost the AST (abstract syntax tree) that the compiler can use to produce the lower-level bytecode.