DiffConv-N / README.md
MikeyBeez's picture
Update README.md
d3e8fea verified
|
Raw
History Blame Contribute Delete
642 Bytes
---
license: apache-2.0
---
Key Result
O(N) learned causal convolution beats O(N²) softmax attention on both perplexity AND throughput, with the advantage growing at longer sequences:
Model PPL Change TPS (128) TPS (2048) Speedup
Learned Conv O(N) 8.08 -3.2% 378,066 1,009,622 5.5x
Standard QKV O(N²) 8.34 baseline 317,968 183,408 1.0x
At 2048 tokens, the O(N) model is 5.5x faster while achieving better perplexity. The gap widens with sequence length because O(N) scales linearly while O(N²) scales quadratically.
https://github.com/MikeyBeez/DifferentialLR
https://medium.com/p/6659a3793322
https://doi.org/10.5281/zenodo.18498944