aios-paper / README.md
acasavaraju's picture
Update README.md
0503d3f verified
---
license: apache-2.0
tags:
- llm-inference
- cpu-inference
- memory-bandwidth
- transformer
- quantization
- research
---
# AIOS: A CPU-Native Inference Architecture for Large Language Models
**This is not a model.** This is the framework paper and specification
for AIOS β€” a memory residency controller for CPU-native LLM inference.
## Paper
**Title:** AIOS: A CPU-Native Inference Architecture for Large Language Models
**Author:** Anand Casavaraju
**Published:** March 2026
**SSRN:** https://ssrn.com/abstract=6467298
**GitHub:** https://github.com/acasavaraju/AIOS
## What AIOS Is
AIOS is a memory residency controller that sits between inference
engines (llama.cpp, Ollama, vLLM) and hardware, managing how weight
data moves from DRAM to CPU. It addresses four resource dimensions:
- **Weight reads** β€” aliasing + sparsity maps
- **KV cache reads** β€” MQA/GQA + tiered residency
- **Activation spill** β€” chunked prefill
- **Attention compute** β€” sparsity map
## Current State
Framework and specification published. Runtime not yet implemented.
All performance projections are analytical. Empirical validation
tracked at github.com/acasavaraju/AIOS/issues.
## Citation
```bibtex
@misc{casavaraju2026aios,
title = {AIOS: A CPU-Native Inference Architecture for Large Language Models},
author = {Casavaraju, Anand},
year = {2026},
url = {https://ssrn.com/abstract=6467298}
}
```