Doom on ONNX
A single self-contained ONNX model (doom.onnx, ~8.4 MB) that, when run on
any ONNX Runtime CPU EP, boots and renders the original 1993 Doom.
No custom operators, no execution-provider plugins, no Python in the loop β
just standard ONNX ops (Add, BitwiseAnd, Where, Gather, ScatterElements,
Loop, If, β¦) executing inside a single InferenceSession.run call.
The model contains:
- an RV32IM CPU built entirely out of ONNX operators,
- the doom1.wad shareware game data as a read-only initializer,
- the doomgeneric Doom source cross-compiled to bare-metal RV32IM and baked into RAM as another initializer.
Reference render
The doom.gif in this repo was assembled from 74 PNG frames captured during
a single InferenceSession.run invocation:
- Total: 80,000,000 RV32IM instructions, 10.8 hours wall time
- Rate: 1,562 IPS (init code) β 2,053 IPS (in-game rendering)
- Reached: title wipe β menu β DEMO1 load β game logic β 3D BSP rendering of actual gameplay (frames 54β75)
Performance
~2,000 simulated RV32IM instructions per second on a modern laptop CPU.
This is not a real-time emulator. One frame every ~9 minutes is the
reality on a single CPU thread. See PERF_INVESTIGATION.md in the source
repo for the full investigation (TL;DR: ORT's MayInplace alias doesn't
apply to Loop-carried state, so the 8 MiB RAM gets fully copied per
ScatterElements).
Running
import numpy as np
import onnxruntime as rt
sess = rt.InferenceSession("doom.onnx", providers=["CPUExecutionProvider"])
RAM_SIZE = 8 * 1024 * 1024
ram = np.load("initial_ram.npy") # Doom ELF baked into RAM
pc = np.array(0x1000, dtype=np.int32)
regs = np.zeros(32, dtype=np.int32)
MMIO_TICK = RAM_SIZE - 16
sim_ms = 0
for chunk in range(250):
ram[MMIO_TICK:MMIO_TICK + 4] = np.frombuffer(
np.uint32(sim_ms).tobytes(), dtype=np.uint8)
sim_ms += 100
pc, regs, ram = sess.run(None, {
"pc_in": pc, "regs_in": regs, "ram_in": ram,
"trip_count": np.array(100_000, dtype=np.int64),
})
# framebuffer at RAM_SIZE - 32 - 64000, 320Γ200 palette indices
The host only writes a millisecond counter into the MMIO tick register
between chunks and reads the framebuffer out of the returned ram_out.
Inputs / outputs
| Name | Type | Shape | Role |
|---|---|---|---|
pc_in |
int32 | scalar | program counter |
regs_in |
int32 | [32] | x0..x31 (x0 forced to 0) |
ram_in |
uint8 | [8 MiB] | full writable memory |
trip_count |
int64 | scalar | how many insts to execute |
pc_out / regs_out / ram_out |
(same types) | post-state |
rom (the WAD) is a read-only initializer baked into the model.
License
The CPU + DMA glue is MIT. doomgeneric and the original Doom source are GPL-2.0; the shareware doom1.wad ships under id Software's shareware terms. This model card and model file inherit GPL-2.0.
Acknowledgements
id Software for releasing Doom's source. The doomgeneric project for the platform-agnostic port. The ONNX team for an op set this absurdly expressive.
