DeciMamba Checkpoint (Baseline)

The official checkpoint of Mamba-130m, finetuned for Language Modeling over the PG-19 dataset as presented in DeciMamba: Exploring the Length Extrapolation Potential of Mamba.

See our Github Repo for evalution and training scripts.

Bibtex:

@misc{benkish2024decimambaexploringlengthextrapolation,
      title={DeciMamba: Exploring the Length Extrapolation Potential of Mamba}, 
      author={Assaf Ben-Kish and Itamar Zimerman and Shady Abu-Hussein and Nadav Cohen and Amir Globerson and Lior Wolf and Raja Giryes},
      year={2024},
      eprint={2406.14528},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2406.14528}, 
}

Downloads last month: 5

Paper for assafbk/mamba-130m-pg19

DeciMamba: Exploring the Length Extrapolation Potential of Mamba

Paper • 2406.14528 • Published Jun 20, 2024