Writing a simple RISC-V emulator

I decided to spend a few hours and write a simple toy emulator for the RISC-V CPU architecture. This is of course mostly for fun: I just imagined it would be cool to compile a C program with an off-the-shelf compiler and run it on your own virtual CPU.

If you want to look at the code, here it is.

The interesting thing about RISC-V is that it is built around a very small instruction set, and a bunch of extensions. The "base" is just about 40 instructions and it's about as few as you can get away with while still being generic enough. If you want to take a look, open this document and read the first page. All you need is essentially there!

Emulator loop

If you wonder what an emulator is essentially, I'm going to show you the main loop:


const int NUM_REGISTERS = 32;

void eval(uint8_t* memory) {
  uint32_t registers[NUM_REGISTERS] = {0};

  uint32_t pc = 0;

  while (true) {
    uint32_t instr = *(uint32_t*)&memory[pc];
    pc += 4;

    switch (opcode) {
      // handle individual opcodes
    }

  }

}

Yes, it's just a switch with about 40 branches that handle individual opcodes, each of which implements primitive arithmetic, logic, or jumps. For most of opcodes, the implementation spands just a few lines.

I'm not going to show the implementation here, but feel free to dive into the repository linked above.

Program layout

Your entire program and its data goes into the flat buffer which is called memory here. On your "real" computer, you'd of course have an operating system and a lot of other components. But for the sake of simplicity, we treat this virtual CPU more like a microcontroller, where the program is "flashed" directly to a contiguous region and then executed right there.

To make it possible, this virtual CPU has a few conventions:

There is nothing special about those addresses, I've just picked them "randomly". On real microcontrollers it matters where you put code and data, because some of the memory gets mapped from a read-only flash and other regions are actual RAM. And which is which is determined by the chip itself, and is often not programmable. For a virtual CPU, we can choose whatever we like.

Simple assembly program

At the beginning, when I've built the first iteration, I didn't have a way to compile programs in a high-level language, so I just took an online RISC-V assembler and written a small program that adds numbers from 0 to 9. After compiling it, I grabbed the hex representation of the machine code and copy-pasted it:


uint32_t program[] = {
  0x00000093, // addi x1, x0, 0
  0x00100113, // addi x2, x0, 1
  0x00a00193, // addi x3, x0, 10
  0x10000213, // addi x4, x0, 256
  0x002080b3, // add x1, x1, x2 <- loop
  0x00110113, // addi x2, x2, 1
  0xfe311ce3, // bne x2, x3, loop
  0x00122023, // sw x1, 0(x4)
};

This actually worked quite well, and resulted in 45 being written to memory location 0x256. Not bad, but we can do better.

Compiling C programs for our virtual CPU

Now, this is a real deal. It would be great if we could compile a C program and execute it. This would immediately make the emulator a lot more useful. But this comes with a lot of surprises. If you've ever worked with C, you've probably seen that programs are compiled like this:


gcc main.c -o my_program

Unfortunately, it wouldn't work for our emulator. This is because any regular program links with libc which provides operating system abstractions like memory allocation, reading and writing files, and such. And also your normal C compiler on your OS likely can't produce binaries for RISC-V.

Step 1: grabbing the proper toolchain

"Toolchain" here means a bunch of tools including the compiler, linker, standard headers, and others which are specific for your architecture. If you're searching on the internet, the proper version is riscv32-none-elf-gcc, which means GCC built with support for 32-bit RISC-V architecture, which can produce ELF binaries for operating system called "none". It is a typical thing that you would use for physical microcontrollers.

If you're running on Linux, it could be quite challenging to get the toolchain from package repositories, and you may have to build it from source. But if you're ready to try out the Nix package manager, you can look at the repository that I provided above. It has the necessary toolchain pre-built.

Step 2: a simple program

Because we don't have libc and can't use any OS abstractions, our program will be simple and just demonstrate the use of basic arithmetic:


static int mem = 56;

int main() {
  int a = 42;
  int b = 5;
  mem = mem + a*b + 3;

  return 0;
}

If you compile it for your regular PC, it will put 269 to the variable mem. We'd not be able to print the variable, but we will be able to examine the memory as soon as the program finishes.

Step 3: compiling with proper flags

Now, we need to compile the program. But to do so, we need to pass flags to dissuade the compiler from adding any extra libraries on top:


riscv32-none-elf-gcc -fno-builtin \
                     -fvisibility=hidden \
                     -nostdlib \
                     -nostartfiles \
                     -march=rv32im \
                     -mabi=ilp32 \
                     -c example.c -o example.o

The important parts here are as follows:

Disabling floating point support is required because we don't have it in our virtual CPU, and we don't want the compiler to produce instructions we can't handle.

OK, now we have the main function. But how does the machine know where to place it?

Step 4: boot and linker scripts

Under normal circumstances, the programs compiled for a usual operating system contain a "prologue" which is executed before the control is passed to the main function. This prologue can set up things that are required for the libc and your operating system (like passing command line arguments). It then executes the main function by making a normal call to it.

But again, we don't have an OS, and so we need to write the prologue ourselves. The best way to do so is to write it in RISC-V assembly. Let's create a boot.s file:


.globl _boot
_boot:
    li x2, 0x8000
    call main
    sbreak
    j .

I won't go into all details here, but what's going on here is:

We can compile this assembly code into another object file:


riscv32-none-elf-as -march=rv32i \
                    -mabi=ilp32 \
                    boot.s -o boot.o

This is almost the same as how we compiled the C code.

Step 5: linking

Now we actually get to a tricky part. With object files we still haven't glued things together to properly distribute them across memory locations.

If you have never dealt with embedded development, you may not know about "linker scripts". Essentially, they allow fine-grained control over how the binary code of your program is layed out in memory when the linker composes the resulting executable.

This is what it would look like if we target our virtual CPU:


ENTRY(_start)

SECTIONS {
    /* Start address of the program */
    . = 0x0000;

    /* Program code starts here */
    .text : {
        *(.text)
    }

    . = 0x1000;

    /* Initialized data starts here */
    .data : {
        *(.data)
    }

    . = 0x2000;

    /* Uninitialized data starts here */
    .bss : {
        *(.bss)
    }

    /* Discard stack section */
    /DISCARD/ : { *(.note.GNU-stack) }
}

Apart from the slightly weird syntax, it should be pretty clear what's happening here. We have a few different segments that the C compiler produces (.text, .data and .bss), and we are specifying exactly how to lay them out.

We can how link our code:


riscv32-none-elf-ld boot.o example.o \
                    -T linker.ld -o example

# remove unnecessary sections
riscv32-none-elf-strip -R .riscv.attributes example
riscv32-none-elf-strip -R .comment example

After we do this, we get the resulting binary called example. This however is not the final thing we want, as it is an executable that contains metadata for the operating system around the machine code. To extract the raw machine code, we need to use objcopy:


riscv32-none-elf-objcopy -O binary example example.raw

Now this is finally what we need. If we examine this file with xxd (hex dump), we'll see:


cat example.raw | xxd

00000000: 3781 0000 ef00 c000 7300 1000 6f00 0000  7.......s...o...
00000010: 1301 01fe 232e 8100 1304 0102 9307 a002  ....#...........
00000020: 2326 f4fe 9307 5000 2324 f4fe 0327 c4fe  #&....P.#$...'..
00000030: 8327 84fe 3307 f702 b717 0000 83a7 0700  .'..3...........
00000040: b307 f700 1387 3700 b717 0000 23a0 e700  ......7.....#...
00000050: 9307 0000 1385 0700 0324 c101 1301 0102  .........$......
00000060: 6780 0000 0000 0000 0000 0000 0000 0000  g...............
...

Wrap-up: running the program in the virtual machine

That was quite a journey, but we can now run our virtual machine and pass the compiled binary:


./rve example.raw

result:269

Is it exceptional in any way? Nope. Am I proud of it? You bet!