doubleblind commited on
Commit
6dabb43
·
verified ·
1 Parent(s): 0d5c998

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -1
README.md CHANGED
@@ -4,4 +4,37 @@ datasets:
4
  - zwhe99/DeepMath-103K
5
  language:
6
  - en
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - zwhe99/DeepMath-103K
5
  language:
6
  - en
7
+ ---
8
+
9
+ ## Quick Start
10
+
11
+ This repository contains remote code and weights for a **Native Sparse Attention** distillation of [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-RL-Distill-Qwen-1.5B), distilled on mathematical reasoning data.
12
+
13
+ ### Installation
14
+
15
+ To use this model, please ensure the following dependencies are installed:
16
+
17
+ #### 1. Install the required sparse attention library from our custom fork:
18
+ ```bash
19
+ pip install git+https://github.com/fnite1604/native-sparse-attention-pytorch.git
20
+ ```
21
+
22
+ #### 2. Install other standard dependencies:
23
+ These are handled automatically by the Transformers library and include:
24
+ ```bash
25
+ pip install transformers torch
26
+ ```
27
+
28
+ Note: We recommend using Python 3.8+ and PyTorch 2.0+ for compatibility.
29
+
30
+ ### Example Usage
31
+
32
+ A `quick_start.py` script is included to help you get started with inference:
33
+
34
+ ```bash
35
+ python quick_start.py
36
+ ```
37
+
38
+ This will load the model and generate text based on a predefined prompt using Native Sparse Attention.
39
+
40
+