File size: 2,676 Bytes
666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 9830f56 405c23d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 | ---
library_name: transformers
license: apache-2.0
base_model:
- Qwen/Qwen2.5-Math-7B
---
# Qwen2.5-Math-7B-Oat-Zero
## Links
- 📜 [Paper](https://github.com/sail-sg/understand-r1-zero/blob/main/understand-r1-zero.pdf)
- 💻 [GitHub](https://github.com/sail-sg/understand-r1-zero)
- 🤗 [Oat-Zero Collection](https://huggingface.co/collections/sail/oat-zero-understanding-r1-zero-like-training-67dcdb07b9f3eb05f1501c4a)
## Introduction
This model is trained by the minimalist R1-Zero recipe introduced in our paper:
- **Algorithm**: Dr. DRPO
- **Data**: level 3-5 questions from MATH dataset
- **Base model**: [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B)
- **Template**: Qwen-Math
Evaluation results on widely used math benchmarks are shown below:
<img src="https://raw.githubusercontent.com/sail-sg/understand-r1-zero/refs/heads/main/assets/benchmark_table.png" width=100%/>
## Usage
```python
import vllm
def apply_qwen_math_template(question: str):
return (
"<|im_start|>system\nPlease reason step by step, and put your final answer within \\boxed{}.<|im_end|>\n<|im_start|>user\n"
+ question
+ "<|im_end|>\n<|im_start|>assistant\n"
)
def apply_r1_template(question: str):
return (
"A conversation between User and Assistant. The User asks a question, and the Assistant solves it. The Assistant first thinks about the reasoning process in the mind and then provides the User with the answer. "
"The reasoning process is enclosed within <think> </think> and answer is enclosed within <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>.\nUser: "
+ question
+ "\nAssistant: <think>"
)
model_name = "sail/Qwen2.5-Math-7B-Oat-Zero"
sampling_params = vllm.SamplingParams(
n=1,
temperature=0,
top_p=1,
max_tokens=3000,
)
model = vllm.LLM(
model_name,
max_model_len=4096,
dtype="bfloat16",
enable_prefix_caching=True,
)
if "Llama-3.2-3B-Oat-Zero" in model_name:
apply_template = apply_r1_template
else:
apply_template = apply_qwen_math_template
prompts = [
"How many positive whole-number divisors does 196 have?"
]
prompts = list(map(apply_template, prompts))
outputs = model.generate(prompts, sampling_params)
print(outputs)
```
## Citation
```latex
@article{liu2025understanding,
title={Understanding r1-zero-like training: A critical perspective},
author={Liu, Zichen and Chen, Changyu and Li, Wenjun and Qi, Penghui and Pang, Tianyu and Du, Chao and Lee, Wee Sun and Lin, Min},
journal={arXiv preprint arXiv:2503.20783},
year={2025}
}
```
|