File size: 642 Bytes

d3e8fea

---
license: apache-2.0
---
Key Result

O(N) learned causal convolution beats O(N²) softmax attention on both perplexity AND throughput, with the advantage growing at longer sequences:

Model	PPL	Change	TPS (128)	TPS (2048)	Speedup
Learned Conv O(N)	8.08	-3.2%	378,066	1,009,622	5.5x
Standard QKV O(N²)	8.34	baseline	317,968	183,408	1.0x
At 2048 tokens, the O(N) model is 5.5x faster while achieving better perplexity. The gap widens with sequence length because O(N) scales linearly while O(N²) scales quadratically.

https://github.com/MikeyBeez/DifferentialLR
https://medium.com/p/6659a3793322
https://doi.org/10.5281/zenodo.18498944