Update README.md
Browse files
README.md
CHANGED
|
@@ -40,6 +40,14 @@ For complete details, codebase, and usage examples, please visit our GitHub repo
|
|
| 40 |
|
| 41 |
---
|
| 42 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
## 📖 Citation
|
| 44 |
|
| 45 |
If you find our work useful, please cite it as:
|
|
|
|
| 40 |
|
| 41 |
---
|
| 42 |
|
| 43 |
+
## 📦 Dataset: GRPO-LEAD-SFTData
|
| 44 |
+
|
| 45 |
+
We release [**GRPO-LEAD-SFTData**](https://huggingface.co/datasets/PlanePaper/GRPO-LEAD-SFTData), a curated collection of **12,153** high-quality mathematical reasoning samples for supervised fine-tuning. Generated via [**QwQ-32B**](https://huggingface.co/Qwen/QwQ-32B).
|
| 46 |
+
Derived primarily from the **DeepScaler** dataset ([DeepScaler](https://github.com/agentica-project/rllm)), we retain only examples with **difficulty > 1**, targeting challenging problem-solving scenarios. All entries are structured for seamless integration with [**LLaMA Factory**](https://github.com/hiyouga/LLaMA-Factory) and follow a standardized SFT-ready format.
|
| 47 |
+
|
| 48 |
+
Used as the training data for GRPO-LEAD’s supervised fine-tuning stage, this dataset is able to increase the model's base capability in solving mathematical problems.,
|
| 49 |
+
|
| 50 |
+
---
|
| 51 |
## 📖 Citation
|
| 52 |
|
| 53 |
If you find our work useful, please cite it as:
|