| --- |
| license: apache-2.0 |
| tags: |
| - llm-inference |
| - cpu-inference |
| - memory-bandwidth |
| - transformer |
| - quantization |
| - research |
| --- |
| |
| # AIOS: A CPU-Native Inference Architecture for Large Language Models |
|
|
| **This is not a model.** This is the framework paper and specification |
| for AIOS β a memory residency controller for CPU-native LLM inference. |
|
|
| ## Paper |
|
|
| **Title:** AIOS: A CPU-Native Inference Architecture for Large Language Models |
| **Author:** Anand Casavaraju |
| **Published:** March 2026 |
| **SSRN:** https://ssrn.com/abstract=6467298 |
| **GitHub:** https://github.com/acasavaraju/AIOS |
|
|
| ## What AIOS Is |
|
|
| AIOS is a memory residency controller that sits between inference |
| engines (llama.cpp, Ollama, vLLM) and hardware, managing how weight |
| data moves from DRAM to CPU. It addresses four resource dimensions: |
|
|
| - **Weight reads** β aliasing + sparsity maps |
| - **KV cache reads** β MQA/GQA + tiered residency |
| - **Activation spill** β chunked prefill |
| - **Attention compute** β sparsity map |
|
|
| ## Current State |
|
|
| Framework and specification published. Runtime not yet implemented. |
| All performance projections are analytical. Empirical validation |
| tracked at github.com/acasavaraju/AIOS/issues. |
|
|
| ## Citation |
| ```bibtex |
| @misc{casavaraju2026aios, |
| title = {AIOS: A CPU-Native Inference Architecture for Large Language Models}, |
| author = {Casavaraju, Anand}, |
| year = {2026}, |
| url = {https://ssrn.com/abstract=6467298} |
| } |
| ``` |