Description

2 layer mamba2 models distilled from JunxiongWang/Llama3.2-Mamba2-3B-distill. Early stop at 48000 step.

Used in STree: Speculative Tree Decoding for Hybrid State-Space Models as a draft model for speculative decoding for hybrid models.

For more details on installation, training, and evaluation, please refer to the GitHub repository.

Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ycwu97/mamba2-distilled-small

Finetuned
(1)
this model

Paper for ycwu97/mamba2-distilled-small