Instructions to use Jasaxion/MathSmith-Hard-Problem-Synthesizer-Qwen3-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Jasaxion/MathSmith-Hard-Problem-Synthesizer-Qwen3-8B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Jasaxion/MathSmith-Hard-Problem-Synthesizer-Qwen3-8B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Jasaxion/MathSmith-Hard-Problem-Synthesizer-Qwen3-8B") model = AutoModelForCausalLM.from_pretrained("Jasaxion/MathSmith-Hard-Problem-Synthesizer-Qwen3-8B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Jasaxion/MathSmith-Hard-Problem-Synthesizer-Qwen3-8B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Jasaxion/MathSmith-Hard-Problem-Synthesizer-Qwen3-8B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Jasaxion/MathSmith-Hard-Problem-Synthesizer-Qwen3-8B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Jasaxion/MathSmith-Hard-Problem-Synthesizer-Qwen3-8B
- SGLang
How to use Jasaxion/MathSmith-Hard-Problem-Synthesizer-Qwen3-8B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Jasaxion/MathSmith-Hard-Problem-Synthesizer-Qwen3-8B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Jasaxion/MathSmith-Hard-Problem-Synthesizer-Qwen3-8B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Jasaxion/MathSmith-Hard-Problem-Synthesizer-Qwen3-8B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Jasaxion/MathSmith-Hard-Problem-Synthesizer-Qwen3-8B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Jasaxion/MathSmith-Hard-Problem-Synthesizer-Qwen3-8B with Docker Model Runner:
docker model run hf.co/Jasaxion/MathSmith-Hard-Problem-Synthesizer-Qwen3-8B
MathSmith-Hard-Problem-Synthesizer-Qwen3-8B
MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced Policy
Overview
MathSmith is a framework for synthesizing challenging mathematical problems to enhance LLM reasoning. Rather than modifying existing problems, MathSmith constructs new ones from scratch by randomly sampling concept-explanation pairs from PlanetMath.
The model generates <rationale>–<problem> pairs, where:
<rationale>: structured reasoning describing concept integration and difficulty design.<problem>: a single Olympiad-level mathematical question that admits a verifiable numeric or symbolic answer.
Compared with MathSmith-HC (complexity + consistency reward), MathSmith-Hard removes the consistency term to emphasize maximum reasoning depth and difficulty.
MathSmith Pipeline
The MathSmith framework consists of four main stages:
Concept Collection: Randomly sample concept–explanation pairs from PlanetMath to ensure data independence.
Supervised Fine-tuning (SFT): Train the model on collected concept–explanation pairs to establish foundational understanding.
Reinforcement Learning (RL): Optimize the model using GRPO with rewards based on:
- Structural validity
- Reasoning complexity
- Answer consistency
Weakness-Focused Self-Improvement: Iteratively identify and address model weaknesses by generating targeted problem variants.
Dependence
- Transformers 4.52.4
- Pytorch 2.7.0+cu126
- Datasets 3.6.0
- Tokenizers 0.21.1
Citation
If you find this work useful, please cite:
@article{zhan2025mathsmith,
title={MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced Policy},
author={Zhan, Shaoxiong and Lai, Yanlin and Lu, Ziyu and Lin, Dahua and Yang, Ziqing and Tan, Fei},
journal={arXiv preprint arXiv:2508.05592},
year={2025}
}
- Downloads last month
- 5