Update README.md
Browse files
README.md
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
Weights for the `softmax` model from the paper [2Mamba2Furious: Linear in Complexity, Competitive in Accuracy](https://huggingface.co/papers/2602.17363).
|
| 2 |
-
This model variant
|
| 3 |
More details of the setup can be found in the Github repo.
|
| 4 |
|
| 5 |
Instructions on how to use this model can be found in [https://github.com/gmongaras/2Mamba2Furious](https://github.com/gmongaras/2Mamba2Furious)
|
|
|
|
| 1 |
Weights for the `softmax` model from the paper [2Mamba2Furious: Linear in Complexity, Competitive in Accuracy](https://huggingface.co/papers/2602.17363).
|
| 2 |
+
This model variant just uses plain ol softmax (FlashAttention), used for the NIAH experiment. It was trained for 400K steps with a batch size of 32.
|
| 3 |
More details of the setup can be found in the Github repo.
|
| 4 |
|
| 5 |
Instructions on how to use this model can be found in [https://github.com/gmongaras/2Mamba2Furious](https://github.com/gmongaras/2Mamba2Furious)
|