MikeyBeez
/

DiffConv-N

Model card Files Files and versions

DiffConv-N / README.md

MikeyBeez's picture

Update README.md

d3e8fea verified 5 months ago

|

History Blame Contribute Delete

642 Bytes

	---
	license: apache-2.0
	---
	Key Result

	O(N) learned causal convolution beats O(N²) softmax attention on both perplexity AND throughput, with the advantage growing at longer sequences:

	Model PPL Change TPS (128) TPS (2048) Speedup
	Learned Conv O(N) 8.08 -3.2% 378,066 1,009,622 5.5x
	Standard QKV O(N²) 8.34 baseline 317,968 183,408 1.0x
	At 2048 tokens, the O(N) model is 5.5x faster while achieving better perplexity. The gap widens with sequence length because O(N) scales linearly while O(N²) scales quadratically.

	https://github.com/MikeyBeez/DifferentialLR
	https://medium.com/p/6659a3793322
	https://doi.org/10.5281/zenodo.18498944