wandermay commited on
Commit
105607a
·
verified ·
1 Parent(s): fd9d2eb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +49 -3
README.md CHANGED
@@ -1,3 +1,49 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+ ## Achieving Superior Performance over QwQ-32B Using Only 965 Strategically Curated Samples
5
+
6
+ ### Model description
7
+ Most existing mthods focused on distilling DeepSeek-R1 to improve reasoning ability. However, as far as we know, there is no distilled model could surpass DeepSeek-R1 or QwQ-32B. We introduce NTele-R1-32B-DS , a state-of-the-art mathematical reasoning model that outperforms QwQ-32B across common reasoning benchmarks, including AIME2024/2025, MATH500 and GPQA-Diamond.
8
+ Notebly, NTele-R1-32B-DS is the first that achieves **more than 80/70 in challenging AIME2024/2025**.
9
+ | Model | Trained From | Release Date | AIME2024(ours/reported) | AIME2025(ours/reported) | MATH500(ours/reported) | GPQA-Diamond(ours/reported) |
10
+ |-------|-------|-------|-------|-------|-------|-------|
11
+ | QwQ-32B | - | 25.3.6 | 76.25 / 79.5 | 67.30 / - | 94.6 / - | 63.6 / - |
12
+ | DeepSeek-32B-Distill | Qwen2.5-32B-Instruct | 25.1.20 | 64.17 / 72.6 | 55.21 / - | 89.8 / 94.3 | 62.1 / 62.1 |
13
+ | Light-R1-32B-DS | DeepSeek-R1-Distill-Qwen-32B | 25.3.12 | 74.79 / 78.1 | 68.54 / 65.9 | 92 / - | **69.19 / 68.0** |
14
+ | AReal-boba-SFT-32B | DeepSeek-R1-Distill-Qwen-32B | 25.3.30 | 70.63 / 78.8 | 63.54 / 62.1 | 88.8 / - | 64.65 / 60.1 |
15
+ | Ntele-R1-32B-DS | DeepSeek-R1-Distill-Qwen-32B | 25.4.17 | **80.42**| **73.54** | **95.4** | 66.16 |
16
+
17
+
18
+ ### Data Curation
19
+ We start from the S1 dataset and conduct the following procedures:
20
+ 1. QwQ-32B as a Better Teacher :
21
+ - We find that QwQ-32B, with its smoother flow in CoT reasoning, serves as a better teacher compared to DeepSeek-R1. For each question in S1 dataset, we sampled 50 responses from QwQ-32B.
22
+ 2. Focusing on Harder Questions :
23
+ - We evaluated the correctness of the responses for each question. After that, we filtered out the easier questions with a pass rate exceeding 0.6.
24
+ 3. Diverse Reasoning Paths Break the Limitation of Distillation :
25
+ - To maximize the diversity of reasoning paths, we calculated the Levenshtein distance between all answers for each question. For every question, we selected up to 5 answers for each question with the greatest distances, resulting in the final dataset with 965 samples.
26
+
27
+ You can access our [dataset](https://huggingface.co/datasets/ZTE-AIM/NTele-R1-Data) to get 965 training data
28
+
29
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/67ff7f05a93c489f94a58c74/pOg0t34yxTmrL158xsX1Y.png)
30
+
31
+ ### Evaluation
32
+ We evaluate models with [SkyThought](https://github.com/NovaSky-AI/SkyThought).
33
+
34
+ ### Training Details
35
+ NTele-R1-32B-DS was trained from DeepSeek-32B-Distill on 8xH800.
36
+
37
+ #### Training hyperparameter
38
+ - learning_rate: 1e-05
39
+ - train_batch_size: 1
40
+ - eval_batch_size: 1
41
+ - seed: 42
42
+ - distributed_type: multi-GPU
43
+ - num_devices: 8
44
+ - gradient_accumulation_steps: 6
45
+ - total_train_batch_size: 48
46
+ - total_eval_batch_size: 48
47
+ - lr_scheduler_type: cosine
48
+ - lr_scheduler_warmup_ratio: 0.1
49
+ - num_epochs: 10.0