File size: 642 Bytes
d3e8fea
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
---
license: apache-2.0
---
Key Result

O(N) learned causal convolution beats O(N²) softmax attention on both perplexity AND throughput, with the advantage growing at longer sequences:

Model	PPL	Change	TPS (128)	TPS (2048)	Speedup
Learned Conv O(N)	8.08	-3.2%	378,066	1,009,622	5.5x
Standard QKV O(N²)	8.34	baseline	317,968	183,408	1.0x
At 2048 tokens, the O(N) model is 5.5x faster while achieving better perplexity. The gap widens with sequence length because O(N) scales linearly while O(N²) scales quadratically.

https://github.com/MikeyBeez/DifferentialLR
https://medium.com/p/6659a3793322
https://doi.org/10.5281/zenodo.18498944