Module Mkernel

A simple scheduler for Solo5/Unikraft in OCaml.

A unikernel is a fully-fledged operating system that essentially wants to be virtualised into a host system such as Linux (KVM) or FreeBSD (Bhyve). In this sense, a unikernel's interactions with the outside world (with other systems) via components are standardised through two types of devices:

These interactions respond to events transmitted by the host system, which retains exclusive direct access to physical components, retrieved by a tender that runs in the host system's user space and is then retransmitted to the unikernel running in its own space.

The information transmitted between the unikernel and the tender is called a hypercall, and the information transmitted between the tender and the host system is called a syscall. In other words, a hypercall necessarily involves one or more syscalls, and the transmission of the result necessarily passes through the tender (which serves as a bridge between the unikernel and the host system).

Since these events originate from the outside world, they can occur at any time. It is therefore necessary to be able to manage these events asynchronously so as not to block the reception of a particular event and to be able to do something else while waiting for certain events.

This library therefore offers two essential features:

Scheduler.

A unikernel has exclusive use of a CPU and a memory area separate from the host system. This CPU simply executes the unikernel code. To date, there is no support for multiple cores. However, a scheduler is certainly needed in order to be able to execute multiple tasks at the same time (cooperatively).

As such, this library is based on the Miou scheduler. It is a small scheduler that uses the effects of OCaml 5 and allows tasks to be launched and managed asynchronously. Miou's objective is focused on the development of applications that are services (such as a web server). The task management policy is therefore designed so that the unikernel can handle as many hypercalls as possible. This contrasts with a scheduler that would optimise task scheduling in order to complete a calculation as quickly as possible (in other words, a CPU-bound application). Miou is therefore said to be a scheduler for I/O-bound applications.

For more information about Miou and its task management and API, please read the project documentation and tutorial available here.

In order to launch the Miou scheduler and be able to launch and manage asynchronous tasks, the user must define a first entry point that must necessarily call run:

  let () =
    Mkernel.(run []) @@ fun () ->
    let prm = Miou.async @@ fun () -> print_endline "Hello World!" in
    Mkernel.sleep 1_000_000_000;
    Miou.await_exn prm

All functions available through the Miou module work when implementing a unikernel except Miou.call, which wants to launch a task in parallel. Miou.call raises an exception because a unikernel only has one CPU.

Hypercalls.

Hypercalls are the only way for the unikernel to communicate with the outside world. A hypercall is a signal that we would like to obtain information from a specific external resource (such as a network interface or a block interface). This library offers several functions for emitting these hypercalls, which are then handled by the tender and then by the host system.

These hypercalls are standardised in a certain way and concern interactions with two types of resources:

Net interfaces.

A net interface is a TAP interface connected between your unikernel and the network of your host system. It is through this interface that you can communicate with your system's network and receive packets from it. The TCP/IP stack is also built from this interface.

The user can read and write packets on such an interface. However, you need to understand how reading and writing behave when developing an application as a unikernel using Solo5/Unikraft.

Writing a packet to the net interface is direct and failsafe. In other words, we don't need to wait for anything to happen before writing to the net device (if an error occurs on your host system, the tender will fail — and by extension, so will your unikernel). So, from the scheduler's point of view, writing to the net device is atomic and is never suspended by the scheduler in order to have the opportunity to execute other tasks.

However, this is not the case when reading on the net interface. You might expect to read packets, but they might not be available at the time you try to read them. Mkernel will make a first attempt at reading and if it fails, the scheduler will "suspend" the reading task (and everything that follows from it) to observe at another point in the life of unikernel whether a packet has just arrived.

Reading on the net interface is currently the only operation where suspension is necessary. In this way, the scheduler can take the opportunity to perform other tasks if reading failed in the first place. It is at the next iteration of the scheduler (after it has executed at least one other task) that Mkernel will ask the tender if a packet has just arrived. If this is the case, the scheduler will resume the reading task, otherwise it will keep it in a suspended state until the next opportunity.

Allocating a net device.

Whether it is Solo5 (and its tender) or Unikraft (via qemu or firecracker), it is necessary for the user to be able to allocate a network interface, often referred to as a tap interface. The latter is a virtualization of an Ethernet port that can be manipulated both by the unikernel (in order to communicate with the rest of the world) and the host system (to be configured so that the unikernel is connected to a network).

Here's how to create a tap interface on Linux:

  $ sudo ip tuntap add name tap0 mode tap
  $ sudo ip link set tap0 up

It is generally accepted that the tap interface should be connected to a bridge (also virtual) in order to connect multiple unikernels (and therefore multiple tap interfaces) to a network. Here is how to "connect" a tap interface to a bridge:

  $ sudo ip link add name service type bridge
  $ sudo ip addr add 10.0.0.1/24 dev service
  $ sudo ip link set tap0 master service
  $ sudo ip link set service up

Finally, it is often accepted that the unikernel can communicate with the Internet. It is generally necessary to configure the host system (its proxy) in order to correctly route incoming/outgoing packets to the unikernel (or your "output"). This last step depends on your network topology, but we recommend learning about iptables, the nat table, and the MASQUERADE target.

Block interfaces.

Block interfaces are different in that there is no expectation of whether or not there will be data. A block interface can be seen as content to which the user has one access per page (generally 4096 bytes). It can be read and written to. However, the read and write operation can take quite a long time — depending on the file system and your hardware on the host system.

There are therefore two types of read/write. An atomic read/write and a scheduled read/write.

An atomic read/write is an operation where you can be sure that it is not divisible (and that something else can be tried) and that the operation is currently being performed. Nothing else can be done until this operation has finished. It should be noted that once the operation has finished, the scheduler does not take the opportunity to do another task. It continues with what needs to be done after the read/write as you have implemented.

This approach is interesting when you want to have certain invariants (in particular the state of the memory) that other tasks cannot alter despite such an operation. The problem is that this operation can take a considerable amount of time and we can't do anything else at the same time.

This is why there is the other method, the read/write operation, which is suspended by default and will be performed when the scheduler has the best opportunity to do so — in other words, when it has nothing else to do.

This type of operation can be interesting when reading/writing does not depend on assumptions and when these operations can be carried out at a later date without the current time at which the operation is carried out having any effect on the result. For example, scheduling reads on a block device that is read-only is probably more interesting than using atomic reads (whether the read is done at time T0 or T1, the result remains exactly the same).

Allocating a block device.

A block device is basically a file. The only constraint is that its size must be aligned. This means that if you want to launch your unikernel with a page size of 512 (the default value for Solo5), the file must have a size that is a multiple of 512. Here's how to create a block device with dd:

  $ dd if=/dev/urandom of=block.img count=1024 bs=512
module Net : sig ... end
module Block : sig ... end
module Hook : sig ... end
val clock_monotonic : unit -> int

clock_monotonic () returns monotonic time since an unspecified period in the past.

The monotonic clock corresponds to the CPU time spent since the boot time. The monotonic clock cannot be relied upon to provide accurate results - unless great care is taken to correct the possible flaws. Indeed, if the unikernel is suspended (by the host system), the monotonic clock will no longer be aligned with the "real time elapsed" since the boot.

This operation is atomic. In other words, it does not give the scheduler the opportunity to execute another task.

val clock_wall : unit -> int

clock_wall () returns wall clock in UTC since the UNIX epoch (1970-01-01).

The wall clock corresponds to the host's clock. Indeed, each time clock_wall () is called, a syscall/hypercall is made to get the host's clock. Compared to the monotonic clock, getting the host's clock may take some time.

This operation is atomic. In other words, it does not give the scheduler the opportunity to execute another task.

val sleep : int -> unit

sleep ns blocks (suspends) the current task for ns nanoseconds.

The first entry-point of an unikernels.

A unikernel is an application that can require several devices. net devices (tap interfaces) and block devices (files). These devices can be acquired by name and transformed (via map.

For example, a block device can be transformed into a file system, provided that the latter implementation uses the read and write operations associated with block devices (see Block).

  let fs ~name =
    let open Mkernel in
    map [ block name ] @@ fun blk () -> Fat32.of_solo5_block blk

Mkernel acquires these devices, performs the transformations requested by the user and returns the results:

  let () =
    Mkernel.(run [ fs ~name:"disk.img" ]) @@ fun fat32 () ->
    let file_txt = Fat32.openfile fat32 "file.txt" in
    let finally () = Fat32.close file_txt in
    Fun.protect ~finally @@ fun () ->
    let line = Fat32.read_line file_txt in
    print_endline line

Finally, it executes the code given by the user. The user can therefore “build-up” complex systems (such as a TCP/IP stack from a net-device, or a file system from a block-device using the map function).

type 'a arg

'a arg knows the type of an argument given to run.

type ('k, 'res) devices =
  1. | [] : (unit -> 'res, 'res) devices
  2. | :: : 'a arg * ('k, 'res) devices -> ('a -> 'k, 'res) devices

Multiple devices are passed to run using a list-like syntax. For instance:

  let () =
    Mkernel.(run [ block "disk.img" ]) @@ fun _blk () ->
    print_endline "Hello World!"
val net : string -> (Net.t * Net.cfg) arg

net name is a net device which can be used by the Net module. The given name must correspond to the argument given to the Solo5 tender or the qemu tender. For example, if the invocation of our unikernel with Solo5 corresponds to:

  $ solo5-hvt --net:service=tap0 -- unikernel.hvt

The name of the block would be: "service".

The user can specify the MAC address of the virtual interface the user wishes to use. Otherwise, Solo5 will choose a random one. It is given via the Net.cfg value.

val block : string -> Block.t arg

block name is a block device which can be used by the Block module. The given name must correspond to the argument given to the Solo5 tender or the qemu tender. For example, if the invocation of our unikernel with Solo5 corresponds to:

  $ solo5-hvt --block:disk=file.txt -- unikernel.hvt

The name of the block would be: "disk".

val map : 'f -> ('f, 'a) devices -> 'a arg

map fn devices provides a means for creating devices using other devices. For example, one might use a TCP/IP stack from a net device:

  let tcpip ~name : Tcpip.t Mkernel.arg =
    Mkernel.(map [ net name ]) @@ fun ((net : Mkernel.Net.t), cfg) () ->
    Tcpip.of_net_device net
val const : 'a -> 'a arg

const v always returns v.

val run : ?now:(unit -> int) -> ?g:Random.State.t -> ('a, 'b) devices -> 'a -> 'b

The first entry-point of an unikernel with Solo5 and Miou.