Jasaxion
/

MathSmith-Hard-Problem-Synthesizer-Qwen3-8B

Text Generation

text-generation-inference

Model card Files Files and versions

Jasaxion commited on Nov 11, 2025

Commit

4ae0204

·

verified ·

1 Parent(s): 1a9e2da

Create README.md

Files changed (1) hide show

README.md +64 -0

README.md ADDED Viewed

	@@ -0,0 +1,64 @@

+---
+license: apache-2.0
+datasets:
+- Jasaxion/MathSmith-Hard-Problems
+language:
+- en
+base_model:
+- Qwen/Qwen3-8B
+tags:
+- verl
+---
+**MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced Policy**
+[![Paper](https://img.shields.io/badge/arXiv-2508.05592-b31b1b.svg)](https://arxiv.org/abs/2508.05592)
+[![License](https://img.shields.io/badge/License-Apache%202.0-green.svg)](LICENSE)
+[![Python](https://img.shields.io/badge/Python-3.10%2B-blue.svg)]()
+[![GitHub](https://img.shields.io/badge/-GitHub-181717?logo=github)](https://github.com/Jasaxion/MathSmith)
+## Overview
+The model generates <rationale>–<problem> pairs, where:
+- `<rationale>`: structured reasoning describing concept integration and difficulty design.
+- `<problem>`: a single Olympiad-level mathematical question that admits a verifiable numeric or symbolic answer.
+Compared with **MathSmith-HC** (complexity + consistency reward), **MathSmith-Hard** removes the consistency term to emphasize *maximum reasoning depth and difficulty*.
+---
+## MathSmith Pipeline
+The MathSmith framework consists of four main stages:
+1. **Concept Collection**: Randomly sample concept–explanation pairs from [PlanetMath](https://planetmath.org/) to ensure data independence.
+2. **Supervised Fine-tuning (SFT)**: Train the model on collected concept–explanation pairs to establish foundational understanding.
+3. **Reinforcement Learning (RL)**: Optimize the model using GRPO with rewards based on:
+   - Structural validity
+   - Reasoning complexity
+   - Answer consistency
+4. **Weakness-Focused Self-Improvement**: Iteratively identify and address model weaknesses by generating targeted problem variants.
+## Dependence
+- Transformers 4.52.4
+- Pytorch 2.7.0+cu126
+- Datasets 3.6.0
+- Tokenizers 0.21.1
+## Citation
+If you find this work useful, please cite:
+```bibtex
+@article{zhan2025mathsmith,
+  title={MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced Policy},
+  author={Zhan, Shaoxiong and Lai, Yanlin and Lu, Ziyu and Lin, Dahua and Yang, Ziqing and Tan, Fei},
+  journal={arXiv preprint arXiv:2508.05592},
+  year={2025}
+}
+```