|
|
--- |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- zwhe99/DeepMath-103K |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
This repository contains remote code and weights for a **Native Sparse Attention** distillation of [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B), distilled on mathematical reasoning data. Our parameter naming scheme refers to the **parameter count of the teacher model** |
|
|
|
|
|
### Installation |
|
|
|
|
|
To use this model, please ensure the following dependencies are installed: |
|
|
|
|
|
#### Install the required Native Sparse Attention library from our custom fork: |
|
|
```bash |
|
|
pip install git+https://github.com/fnite1604/native-sparse-attention-pytorch.git |
|
|
``` |
|
|
|
|
|
#### Install standard dependencies: |
|
|
```bash |
|
|
pip install transformers torch ... |
|
|
``` |
|
|
|
|
|
Note: We recommend using the latest stable of Pytorch (currently 2.7.0) with CUDA 12.6 and the latest available version of Transformers |
|
|
|
|
|
### Example Usage |
|
|
|
|
|
A `quick_start.py` script is included to help you get started with inference: |
|
|
|
|
|
```bash |
|
|
python quick_start.py |
|
|
``` |
|
|
|
|
|
This will load the model and generate text based on a predefined prompt (`"What is 1 + 1?"`) using our Native Sparse Attention enabled reasoning model. |
|
|
|
|
|
|
|
|
|