PlanePaper commited on
Commit
2872238
·
verified ·
1 Parent(s): e5d3d75

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -0
README.md CHANGED
@@ -40,6 +40,14 @@ For complete details, codebase, and usage examples, please visit our GitHub repo
40
 
41
  ---
42
 
 
 
 
 
 
 
 
 
43
  ## 📖 Citation
44
 
45
  If you find our work useful, please cite it as:
 
40
 
41
  ---
42
 
43
+ ## 📦 Dataset: GRPO-LEAD-SFTData
44
+
45
+ We release [**GRPO-LEAD-SFTData**](https://huggingface.co/datasets/PlanePaper/GRPO-LEAD-SFTData), a curated collection of **12,153** high-quality mathematical reasoning samples for supervised fine-tuning. Generated via [**QwQ-32B**](https://huggingface.co/Qwen/QwQ-32B).
46
+ Derived primarily from the **DeepScaler** dataset ([DeepScaler](https://github.com/agentica-project/rllm)), we retain only examples with **difficulty > 1**, targeting challenging problem-solving scenarios. All entries are structured for seamless integration with [**LLaMA Factory**](https://github.com/hiyouga/LLaMA-Factory) and follow a standardized SFT-ready format.
47
+
48
+ Used as the training data for GRPO-LEAD’s supervised fine-tuning stage, this dataset is able to increase the model's base capability in solving mathematical problems.,
49
+
50
+ ---
51
  ## 📖 Citation
52
 
53
  If you find our work useful, please cite it as: