gmongaras's picture
Update README.md
0d2afad verified

Weights for the softmax model from the paper 2Mamba2Furious: Linear in Complexity, Competitive in Accuracy. This model variant just uses plain ol softmax (FlashAttention), used for the NIAH experiment. It was trained for 400K steps with a batch size of 32. More details of the setup can be found in the Github repo.

Instructions on how to use this model can be found in https://github.com/gmongaras/2Mamba2Furious