Why doesn’t V8 fit on my microcontroller? | by Kasper Lund | Jun, 2021

The dreaded Blue Screen of Death was better than a spontaneous reboot.

One of my first experiences with low-level system software was building my own operating system for the x86 from scratch. Putting my code — and nothing but my code — on a floppy disk and booting from it, made me incredibly excited. I was slightly less excited whenever I inadvertently introduced some bug in my code that caused a triple fault that made my computer reboot. The trial-and-error approach to addressing such bugs was interesting detective work, but it wasn’t very productive. I felt like the primary purpose of my — and indeed any other — operating system was to provide meaningful, actionable feedback about misbehaving software, so I spent most of my time fiddling with page and descriptor tables to use the hardware-provided means for catching and reporting such issues.

When the first versions of Linux arrived on the scene, I realized just how much work it would be to build my own truly useful operating system, so when I joined Google in 2006 my interests had swung to the language implementation side of things (Java, Smalltalk) and I happily spent years making JavaScript run fast in your browser. It was all about software and enabling developers to do more. I liked it.

I know now that enabling software developers is definitely my thing, so I have been thrilled to see how the close cousin of the subject of my master’s thesis — the internet of things (IoT) — has been going through a phase where the focus on connectivity and hardware has been matched with a focus on functionality provided by software. Today, lots of developers come to IoT from software backgrounds and they pick tools that make them productive. They choose hardware like the Raspberry Pi instead of a power-efficient microcontroller, because coupled with Linux, it gives them a high-level and robust foundation for their work.

After a week of MCU programming, it is okay to let it all out.

If you want to build for microcontrollers, you’re forced to use a different, much less refined development workflow. Soldering irons, C compilers, and flashing via USB connectors is on the menu if you go that route. It can be a rather painful experience, so brace yourself. In a nutshell, the problem is that on microcontrollers everything is firmware that is compiled, linked, and deployed together using really old-fashioned tools. Changing anything means changing everything. All the different pieces of your codebase also crash together, so if you introduce a bug in one part of your code, chances are that it will ruin the entire functionality of your device, rendering your device useless on a good day and completely bricked on a rainy day.

Modern computers have grown operating systems that support developers by giving them safety rails and fault isolation, but this level of support and security hasn’t yet reached microcontrollers. It is time for a change!

Microcontrollers usually run so-called real-time operating systems (RTOS). The reason they run these stripped down operating systems that deemphasize security and robustness is that a typical operating system relies on the hardware to provide multiple protection domains and memory isolation. Without that hardware support, the operating system only deals with simpler things like scheduling, synchronization, and memory allocation.

It is possible to provide fault tolerance and isolation through software, but it requires a software layer that shields the applications that run on top of it from the underlying hardware. This layer is typically called a virtual machine.

You may have heard about two different kinds of virtual machines: The ones that emulate a concrete computing system and the ones that provide a platform-independent managed environment for a specific class of high-level languages to run in. Because of its significantly lower system requirements, we have focused on the latter kind and designed an embedded virtual machine in the tradition of Java — not Docker — for microcontrollers. It runs on the ESP32 chips from Espressif Systems, and it augments the primitive FreeRTOS operating system that is bundled in Espressif’s IoT Development Framework (ESP-IDF) with the capabilities for safely running platform-independent software applications side-by-side.

Toit introduces a split between software and firmware for microcontrollers.

The concepts of operating system, virtual machine and programming language are closely related. From the perspective of a developer, we have hidden the operating system and the physical hardware and given them a high-level language and environment to work in. It is just like how modern web browsers let developers run their applications across Windows, Linux and macOS on different kinds of processors, but this time for microcontrollers. The Toit virtual machine implementation is our secret sauce, and no it isn’t just another operating system.

An operating system is a collection of things that don’t fit into a language. There shouldn’t be one. Dan Ingalls

The Toit platform allows you to install independently developed applications side-by-side on a small microcontroller like the ESP32. The virtual machine has built-in support for constructing application images in flash, based on a stream of bits and relocation information. The relocation information is crucial, because it allows the device to freely pick the location in flash where it installs the application. We do not have the luxury of using virtual memory to let the system believe an application always runs from a particular location in memory, so we have to adapt the application image to the actual location in flash that it ends up being stored in.

The image is adapted to the actual location in flash while streaming over the incoming bits.

The Toit platform streams the application images via CoAP over TLS and the device receives 32 words at a time and relocates them before writing them into flash. We have designed it so we never have to keep the full image in RAM, because that is a fairly limited resource. Once we’re done with all the application image bits, we validate them using a checksum mechanism and finally commit the header, turning the application into a valid and runnable piece of functionality.

Bytecodes for a small method that computes Fibonacci’s nth term.

A typical Toit application image is around 30KB in total. The vast majority of that is the bytecodes that describe the behavior of individual methods in an easily interpretable form. We extract the essential information from the program’s hierarchy, classes and interfaces, and store them in a compact form. Similarly, we save space by collectively storing methods as one flattened sequence of bytes in something that resembles the .text segment of an ELF file. The only structured objects in the images are the compile-time constants that go with the application.

The Toit virtual machine ends up acting like a flash-based filesystem with a dynamic relocating linker for installing, upgrading, and uninstalling application images that can run directly from flash. The applications are completely separate and only share what is provided by the virtual machine on the device.

For robustness and security reasons, applications need a safe environment on your microcontroller to run within. When an application starts up, the Toit virtual machine allocates a new process structure in memory. The structure includes an object heap that is isolated from the rest of the system, so we have a place to keep all objects allocated by code running in that particular process. The footprint of a minimal process is 4KB of RAM and that includes the object heap. Start small and grow only when necessary!

Once the process has been set up, the virtual machine tells the scheduler that the process is ready to run. The processes are scheduled on top of a fixed number of FreeRTOS threads — one for each core of the CPU — but you can easily have more processes than threads. In most cases, we run with two threads on the ESP32, because it is a dual-core processor. The threads have a small, fixed execution stack associated with them and they service the individual processes one at a time by running the applications associated with them for a while until the threads are preempted by the scheduler and move on to another process. Multiple preemptively scheduled processes with completely isolated address spaces. Check.

Inside the processes, the application consists of multiple light-weight tasks that each have their own execution stack. These stacks are allocated in the object heap and they grow on demand, so you don’t have to preallocate nor pick the right stack sizes for your tasks. The system takes care of that for you. The tasks are cooperatively scheduled, so it is only when one task cannot make progress that we pick another task in the same process and continue running its code.

Within a process, the thread cooperatively switches between executing two tasks. When preempted, the thread will pick another process and start executing the current task in the newly scheduled process.

This setup gives your applications their own isolated memory areas. This is important for security and robustness, but it is also the perfect way to support decomposing your device functionality into meaningful and decoupled modular applications. Internally, applications can be composed of multiple task-based activities and it is extremely cheap to do blocking operations like waiting for events, because you are only blocking a light-weight task, not an entire process or the FreeRTOS thread. This leads to understandable code that is easy to write and it avoids the need for most asynchronous operations:

Use simple blocking operations like sleep instead of complex callback-based event handling.

Another benefit is that the object heaps of the processes can be garbage collected completely independently. We see pause times of less than a millisecond, because the individual heaps are small and manageable. Building scalable and sophisticated garbage collectors for complex systems like we did for JavaScript is fun, but for an embedded device it is better to design the system so the memory management can be kept simple. Really small heaps are a great way to allow that.

Once you have a good handful of processes running on your microcontroller, you want to make sure they don’t unnecessarily consume too much of your most precious resource: RAM. The best way to achieve that is to make sure that the virtual machine itself doesn’t come with too much overhead and stays lean when executing your code.

For dynamic and flexible programming languages, a common source of overhead is actually the optimization techniques used to make method calls fast. When a virtual machine executes your method calls like:

Invoke the append method on whatever object months is and pass the string “April” as the argument.

it needs to determine which append method to invoke based on the runtime type of months. Historically, this has been optimized by having a method lookup table that uses extremely fast hashing to find and validate the right append method, but the hash table is updated at runtime and must be stored in RAM. To avoid too many collisions, it also needs to have a pretty decent size, so this adds quite a bit of overhead for the sole purpose of making method calls fast.

In Google’s V8 JavaScript engine, we ended up using an even more expensive approach that associated so-called inline caches with all the individual places in your code where you call methods. Among other things, such an inline cache stores the last method called at that point in the code and since that changes over time, the caches must be in RAM. It is fast, but prohibitively expensive in terms of memory.

For Toit, we knew that we needed something better that would allow us to perform efficient method calls directly from flash with no RAM overhead. We started by doing a depth first numbering of all the Toit classes in the system and noted how that trivially led to consecutive numbers for all subclasses of a given class.

Single inheritance hierarchy with depth-first numbering of subclasses.

That means methods are inherited by classes within a specific class number range. As an example, if class B defines a method called append, the method B.append is inherited by all classes with numbers in the [1, 3] range. If class F also defines an append method, the F.append method is only applicable for instances of F. You can easily associate a method name with all the class number ranges its different implementations are inherited by. Through that you can form a one-dimensional dispatch table specific for the method name append that looks like this:

So to call append on an object, all you have to do is to find the entry in the table that corresponds to the class number of the object you’re calling append on. It is a cheap constant-time operation and nothing in the table needs to change as a result of the lookup, so the table fits nicely in flash. There are lots of holes in the table which is wasteful, but if you’re careful you can actually use those holes for other such tables by overlapping them and compressing the overall dispatch data structure you need to store in flash.

The compression technique is known as selector-based row displacement, and it removes pretty much all the unnecessary holes in the tables if done right. You can read much more about it in Karel Driesen’s book, Efficient Polymorphic Calls.

Some folks claim that the S in IoT stands for security. With the current state of the developer experience for microcontrollers, it could also stand for software or simplicity. They are all missing. Developers deserve a robust foundation for their on-device functionality. Armed with that, they will be empowered to experiment more and innovate faster. If the same foundation allows simple, modular software to replace complex, monolithic firmware and provides memory safety and sandboxing to strengthen security, are we not pretty close to putting the S in IoT? Your C compiler and your primitive real-time operating system will not get you there.

GCC and LLVM are awesome tools, but you need more than that!

Programming languages and virtual machines are here to help make you a better programmer — and through some of the techniques I’ve outlined in this post, Toit gives you those benefits even on microcontrollers. Toit is built by a team of software platform experts from Google and Uber that believe that if you feel uncomfortable updating the software on your devices in production, you are not doing it often enough. The problem isn’t you, it is your platform. Stop worrying about triple faults and start worrying about moving fast enough to stay relevant.

You can read more about our new language or even try it out if you’re curious to see some of this in action.

Related Articles

Back to top button