File size: 1,194 Bytes
a6f5422
 
 
 
 
 
6dabb43
 
 
 
6d6161d
6dabb43
 
 
 
 
99c33d6
6dabb43
 
 
 
99c33d6
6dabb43
99c33d6
6dabb43
 
99c33d6
6dabb43
 
 
 
 
 
 
 
 
e0ae94f
6dabb43
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
---
license: apache-2.0
datasets:
- zwhe99/DeepMath-103K
language:
- en
---

## Quick Start

This repository contains remote code and weights for a **Native Sparse Attention** distillation of [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B), distilled on mathematical reasoning data. Our parameter naming scheme refers to the **parameter count of the teacher model**

### Installation

To use this model, please ensure the following dependencies are installed:

#### Install the required Native Sparse Attention library from our custom fork:
```bash
pip install git+https://github.com/fnite1604/native-sparse-attention-pytorch.git
```

#### Install standard dependencies:
```bash
pip install transformers torch ...
```

Note: We recommend using the latest stable of Pytorch (currently 2.7.0) with CUDA 12.6 and the latest available version of Transformers 

### Example Usage

A `quick_start.py` script is included to help you get started with inference:

```bash
python quick_start.py
```

This will load the model and generate text based on a predefined prompt (`"What is 1 + 1?"`) using our Native Sparse Attention enabled reasoning model.