metadata
base_model:
- JunxiongWang/Llama3.2-Mamba2-3B-distill
language:
- en
license: apache-2.0
pipeline_tag: text-generation
library_name: transformers
Description
2 layer mamba2 models distilled from JunxiongWang/Llama3.2-Mamba2-3B-distill. Early stop at 48000 step.
Used in STree: Speculative Tree Decoding for Hybrid State-Space Models as a draft model for speculative decoding for hybrid models.
For more details on installation, training, and evaluation, please refer to the GitHub repository.