aios-framework
/

aios-paper

memory-bandwidth

Model card Files Files and versions

aios-paper / README.md

acasavaraju's picture

Update README.md

0503d3f verified 21 days ago

|

history blame contribute delete

1.43 kB

	---
	license: apache-2.0
	tags:
	- llm-inference
	- cpu-inference
	- memory-bandwidth
	- transformer
	- quantization
	- research
	---

	# AIOS: A CPU-Native Inference Architecture for Large Language Models

	This is not a model. This is the framework paper and specification
	for AIOS — a memory residency controller for CPU-native LLM inference.

	## Paper

	Title: AIOS: A CPU-Native Inference Architecture for Large Language Models
	Author: Anand Casavaraju
	Published: March 2026
	SSRN: https://ssrn.com/abstract=6467298
	GitHub: https://github.com/acasavaraju/AIOS

	## What AIOS Is

	AIOS is a memory residency controller that sits between inference
	engines (llama.cpp, Ollama, vLLM) and hardware, managing how weight
	data moves from DRAM to CPU. It addresses four resource dimensions:

	- Weight reads — aliasing + sparsity maps
	- KV cache reads — MQA/GQA + tiered residency
	- Activation spill — chunked prefill
	- Attention compute — sparsity map

	## Current State

	Framework and specification published. Runtime not yet implemented.
	All performance projections are analytical. Empirical validation
	tracked at github.com/acasavaraju/AIOS/issues.

	## Citation
	```bibtex
	@misc{casavaraju2026aios,
	title = {AIOS: A CPU-Native Inference Architecture for Large Language Models},
	author = {Casavaraju, Anand},
	year = {2026},
	url = {https://ssrn.com/abstract=6467298}
	}
	```