gmongaras
/

medium_8192sl_gpu_64bs__softmax

Model card Files Files and versions

medium_8192sl_gpu_64bs__softmax / README.md

gmongaras's picture

Update README.md

0d2afad verified 9 days ago

|

history blame contribute delete

508 Bytes

	Weights for the `softmax` model from the paper [2Mamba2Furious: Linear in Complexity, Competitive in Accuracy](https://huggingface.co/papers/2602.17363).
	This model variant just uses plain ol softmax (FlashAttention), used for the NIAH experiment. It was trained for 400K steps with a batch size of 32.
	More details of the setup can be found in the Github repo.

	Instructions on how to use this model can be found in [https://github.com/gmongaras/2Mamba2Furious](https://github.com/gmongaras/2Mamba2Furious)