doubleblind commited on
Commit
bf9cdc3
·
verified ·
1 Parent(s): 99c33d6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -8,7 +8,7 @@ language:
8
 
9
  ## Quick Start
10
 
11
- This repository contains remote code and weights for a **Native Sparse Attention** distillation of [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-RL-Distill-Qwen-1.5B), distilled on mathematical reasoning data.
12
 
13
  ### Installation
14
 
 
8
 
9
  ## Quick Start
10
 
11
+ This repository contains remote code and weights for a **Native Sparse Attention** distillation of [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-RL-Distill-Qwen-1.5B), distilled on mathematical reasoning data. Our parameter naming scheme refers to the **parameter count of the teacher model**
12
 
13
  ### Installation
14