---
license: apache-2.0
tags:
- llm-inference
- cpu-inference
- memory-bandwidth
- transformer
- quantization
- research
---

# AIOS: A CPU-Native Inference Architecture for Large Language Models

**This is not a model.** This is the framework paper and specification 
for AIOS — a memory residency controller for CPU-native LLM inference.

## Paper

**Title:** AIOS: A CPU-Native Inference Architecture for Large Language Models  
**Author:** Anand Casavaraju  
**Published:** March 2026  
**SSRN:** https://ssrn.com/abstract=6467298  
**GitHub:** https://github.com/acasavaraju/AIOS  

## What AIOS Is

AIOS is a memory residency controller that sits between inference 
engines (llama.cpp, Ollama, vLLM) and hardware, managing how weight 
data moves from DRAM to CPU. It addresses four resource dimensions:

- **Weight reads** — aliasing + sparsity maps
- **KV cache reads** — MQA/GQA + tiered residency  
- **Activation spill** — chunked prefill
- **Attention compute** — sparsity map

## Current State

Framework and specification published. Runtime not yet implemented. 
All performance projections are analytical. Empirical validation 
tracked at github.com/acasavaraju/AIOS/issues.

## Citation
```bibtex
@misc{casavaraju2026aios,
  title  = {AIOS: A CPU-Native Inference Architecture for Large Language Models},
  author = {Casavaraju, Anand},
  year   = {2026},
  url    = {https://ssrn.com/abstract=6467298}
}
```