Update README.md
Browse files
README.md
CHANGED
|
@@ -21,26 +21,21 @@ library_name: transformers
|
|
| 21 |
|
| 22 |
</div>
|
| 23 |
|
| 24 |
-
**Confucius3-Math** is a
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
<p></p>
|
| 26 |
<p align="center">
|
| 27 |
<img width="85%" src="figures/benchmark.png">
|
| 28 |
</p>
|
| 29 |
|
| 30 |
-
##
|
| 31 |
-
|
| 32 |
-
**Selection of the Base Model**: We selected the open-source [DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B) model as the initial model for training Confucius3-Math. This choice was made because the model already exhibits robust chain-of-thought capabilities and holds a greater initial edge in the field of mathematics compared to other models of comparable scale. Moreover, the responses in the answer section provided by this model align with our expectations for an educational model.
|
| 33 |
-
|
| 34 |
-
**Reinforcement Learning**: In particular, we introduce Recent Sample Recovery and Policy-Specific Hardness Weighting, one novel data scheduling policy and one improved group-relative advantage estimator, respectively, that significantly improves data efficiency, stabilize the RL training process, and boost performance. By integrating the DAPO algorithm with the two newly proposed technologies, we have achieved SOTA results on several Chinese K12 mathematics test sets.
|
| 35 |
-
|
| 36 |
-
**Data Formatting**: We standardize the output format of the model as follows: the chain-of-thought process is output in the `<think></think>` block, and then the step-by-step problem-solving process is summarized in the `<answer></answer>` block.
|
| 37 |
-
|
| 38 |
-
**Composition of Training Data**: The data used for training the model comes from two major sources: open-source and proprietary. To enhance the model’s mathematical capabilities, we collect a large number of open-source English mathematics dataset. For our proprietary data, We also collect Chinese K12 math questions, and their solutions, accumulated during the operation of our business. They cover various mathematics problems for the domestic K-12 stage (primary, middle, and high school), including a rich variety of types, such as single-choice, multiple-choice, true/false, fill-in-the-blank, calculation, proof, and mixtures of multiple question types etc.
|
| 39 |
-
|
| 40 |
-
**More Stringent Data filtering**: To ensure the quality and diversity of the learning data, we have implemented a rigorous preprocessing procedure for our data. For open-source data, we execute the following data processing workflow in sequence: exact deduplication, fuzzy deduplication, semantic deduplication and question type selection. For our proprietary data, we apply a cleaning stage additionally, as the data originates from mass-scale automated entry with manual correction, it inherently contains significant noise. After filtering, we retained approximately 540,000 examples of data for actual training, including 210,000 from open-source data and 330,000 from proprietary data.
|
| 41 |
-
|
| 42 |
-
## Evaluation and Results
|
| 43 |
-
For each model, we used the official system prompt provided by the corresponding model and took the question in the test set as the user content. For R1, we did not use the System Prompt because our evaluation found that this could achieve higher quality. The maximum generated response length of the models was uniformly set to 32,768 tokens. We used the sampling strategy to generate k response results and reported the pass@1 results. Specifically, for our model, we used a sampling temperature of 1.0 and a top-p value of 0.7, while for other models, we used the sampling parameters recommended by the official documentation. Regarding the setting of k, different test sets used different values. For MATH500, AIME24, and AIME25, we followed DeepSeek’s setting of k = 64, and for other test sets, k is set to approximately 2, 000/N where N is the number of samples in the set.
|
| 44 |
|
| 45 |
<div align="center">
|
| 46 |
|
|
|
|
| 21 |
|
| 22 |
</div>
|
| 23 |
|
| 24 |
+
**Confucius3-Math** is a 14B parameter open-source resoning LLM developed by the NetEase Youdao AI Team, specifically optimized for K-12 mathematics education. Unlike general-purpose models, Confucius3-Math:
|
| 25 |
+
- ✅ SOTA Performance on Math Tasks
|
| 26 |
+
Outperforms larger models on Chinese K-12 math problems through specialized RL training
|
| 27 |
+
- ✅ Cost-Effective Deployment
|
| 28 |
+
Runs efficiently on a single consumer-grade GPU (e.g., RTX 4090D)
|
| 29 |
+
- ✅ Cultural & Curriculum Alignment
|
| 30 |
+
Optimized for China's national mathematics standards and problem-solving methodologies
|
| 31 |
+
|
| 32 |
+
Confucius3-Math was developed through an RL-only post-training process with novel data scheduling policy and improved group-relative advantage estimator. Please refer to our technical report for details.
|
| 33 |
<p></p>
|
| 34 |
<p align="center">
|
| 35 |
<img width="85%" src="figures/benchmark.png">
|
| 36 |
</p>
|
| 37 |
|
| 38 |
+
## Evaluation Results
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
<div align="center">
|
| 41 |
|