Arkane OS — a specialised operating system in Rust

An operating system
built from isolated cells.

Built for three things: security, stability and less overhead. A ~10,000-line kernel that almost never changes — everything else, from the network to the AI model, runs in hardware-isolated cells that update and heal like a server fleet.

01 · Principle

What does not exist cannot be exploited.

Run a language model in a bank or a hospital today and it stands on thirty years of legacy: four hundred syscalls, shells, dynamic linkers — and a kernel that needs patching every month. Arkane starts from the other end — from nothing — and only adds what can be proven. Isolation is not a check the kernel performs on every operation: it is wired into the page tables once, at spawn. Drivers, network and storage live in cells — so when they break or need an update, the kernel doesn't.

That is the whole bargain: a smaller surface to attack, a kernel stable enough to leave alone, and nothing running that you never asked for. Security, stability, less overhead — not features bolted on, but what is left once everything unnecessary is gone.

23syscalls,
the entire API
~10klines of Rust,
the whole kernel
0lines of C.
POSIX. shell.
02 · Isolation

Try to break it.

This is the real v0.17 cluster — six cells around a kernel, each with its own page table. Isolation is enforced by the MMU, not by policy. Click a cell and watch what happens. Then try killing gw-admin — the control plane, which is also the ACP healer.

healer online · respawns 0 · drift pending 0 · kernel restarts 0

Crash → HEALER_DRIFT_DEAD → respawn: gw-admin heals the drift within moments. Kill gw-admin and nothing comes back — cells that die in the meantime pile up as pending drift, because the healer itself has no healer. Restoring it takes noticeably longer: the kernel has to rebuild it from the signed boot chain and the healer must reload and verify its state before it may act. Only once gw-admin is back online does it work through the backlog, cell by cell.

03 · Anatomy

What it is made of.

Every box is a real binary in the repository — no marketing architecture. Click a component.

mTLS · HTTP/2 · gRPC →
gateway cells
↓ capability-mediated IPC ↓
workload cells
↓ 23 syscalls — every one a capability check ↓

// select a component

Everything shown here runs today in v0.17 — spawned, signed and isolated on the real kernel.

04 · Law

Two keys, one law.

The developer signs the shape of a cell into its binary. The operator provides the values in the manifest. The kernel enforces both — structurally, default-deny, at every spawn. Access exists only as a capability: an unforgeable token, derivable and transitively revocable.

cells/inference · signed ed25519 ✓
#[arkane_contract(
    needs(memory ≤ 512 MiB, share::read),
    forbid(net_in, net_out, spawn)
)]

What is forbidden here is not monitored — it is never wired up.

05 · Cage

The AI that cannot phone home.

A real language model runs inside the inference cell — its own no_std engine, token-identical to the reference implementation. The only thing it can do is answer whoever called it — and the only cell holding a network card is the gateway, where every request is audited.

gw-public :8081 · /v1/chat/completions

    

Why can't it phone home? Not because a firewall says no — because the path does not exist. A cell receives exactly what its manifest grants, on top of what its signed contract allows. The inference manifest grants memory and the model weights — and nothing network-shaped. Since no capability for a network card is ever declared, the kernel never wires one up: there is nothing to block, monitor, or misconfigure. This is what the operator's manifest looks like:

manifest.yaml · cell: inference
arkane: v1
kind: Cell
metadata:
  name: inference
spec:
  image: prod.images.inference@blake3:9c41…   # signed — contract forbids net
  resources: { memory: 512MB, cpu_shares: 1.0 }
  capabilities:
    - name: model-weights
      type: blob
      spec: { hash: blake3:7c8a…, rights: [read] }
  # no mmio, no port, no endpoint —
  # a path to a network card is never created
06 · Proof

Pull the plug.

Every apply is written to a Blake3-chained write-ahead log; a persistence cell mirrors the cluster state to its own disk, anchored through a double-buffered root pointer the kernel reads at boot. Kill the power mid-flight and the next boot replays: every cell, every capability, every model mapping comes back — without re-applying anything. This page mirrors that mechanism. Cut the power and see what's left.

In the real system this test is called make t4-demo: apply → hard kernel reboot → the cluster is still there. It has been part of the acceptance suite since v0.14.

07 · Chronicle

Every version a proof.

  • v0.13Persistence — Blake3 write-ahead log, crash recovery
  • v0.14Kernel minimization — state survives the reboot
  • v0.15Trust anchors — three-stage Ed25519 signature chain
  • v0.16Cell contracts — two keys become law
  • v0.17AI workloads — in-cell inference, chat API, MCP tool cells
  • v0.18Preemption, multicore, exclusive core pinning
  • v0.19Observability & hash-chained audit log
  • v0.20SDK & app lifecycle — signed app bundles
  • v0.21+ArkDB, RAG — applications as contracted cells
  • v0.23Hardware trust — IOMMU, TPM. Required before 1.0.