STree: Speculative Tree Decoding for Hybrid State-Space Models
Paper
•
2505.14969
•
Published
•
1
2 layer mamba2 models distilled from JunxiongWang/Llama3.2-Mamba2-3B-distill. Early stop at 48000 step.
Used in STree: Speculative Tree Decoding for Hybrid State-Space Models as a draft model for speculative decoding for hybrid models.
For more details on installation, training, and evaluation, please refer to the GitHub repository.
Base model
JunxiongWang/Llama3.2-Mamba2-3B-distill