| | --- |
| | library_name: transformers |
| | license: apache-2.0 |
| | base_model: |
| | - Qwen/Qwen2.5-Math-7B |
| | --- |
| | |
| | # Qwen2.5-Math-7B-Oat-Zero |
| |
|
| | ## Links |
| |
|
| | - ๐ [Paper](https://github.com/sail-sg/understand-r1-zero/blob/main/understand-r1-zero.pdf) |
| | - ๐ป [GitHub](https://github.com/sail-sg/understand-r1-zero) |
| | - ๐ค [Oat-Zero Collection](https://huggingface.co/collections/sail/oat-zero-understanding-r1-zero-like-training-67dcdb07b9f3eb05f1501c4a) |
| |
|
| | ## Introduction |
| |
|
| | This model is trained by the minimalist R1-Zero recipe introduced in our paper: |
| | - **Algorithm**: Dr. DRPO |
| | - **Data**: level 3-5 questions from MATH dataset |
| | - **Base model**: [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B) |
| | - **Template**: Qwen-Math |
| |
|
| | Evaluation results on widely used math benchmarks are shown below: |
| |
|
| | <img src="https://raw.githubusercontent.com/sail-sg/understand-r1-zero/refs/heads/main/assets/benchmark_table.png" width=100%/> |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | import vllm |
| | |
| | |
| | def apply_qwen_math_template(question: str): |
| | return ( |
| | "<|im_start|>system\nPlease reason step by step, and put your final answer within \\boxed{}.<|im_end|>\n<|im_start|>user\n" |
| | + question |
| | + "<|im_end|>\n<|im_start|>assistant\n" |
| | ) |
| | |
| | def apply_r1_template(question: str): |
| | return ( |
| | "A conversation between User and Assistant. The User asks a question, and the Assistant solves it. The Assistant first thinks about the reasoning process in the mind and then provides the User with the answer. " |
| | "The reasoning process is enclosed within <think> </think> and answer is enclosed within <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>.\nUser: " |
| | + question |
| | + "\nAssistant: <think>" |
| | ) |
| | |
| | model_name = "sail/Qwen2.5-Math-7B-Oat-Zero" |
| | |
| | sampling_params = vllm.SamplingParams( |
| | n=1, |
| | temperature=0, |
| | top_p=1, |
| | max_tokens=3000, |
| | ) |
| | |
| | model = vllm.LLM( |
| | model_name, |
| | max_model_len=4096, |
| | dtype="bfloat16", |
| | enable_prefix_caching=True, |
| | ) |
| | |
| | if "Llama-3.2-3B-Oat-Zero" in model_name: |
| | apply_template = apply_r1_template |
| | else: |
| | apply_template = apply_qwen_math_template |
| | |
| | prompts = [ |
| | "How many positive whole-number divisors does 196 have?" |
| | ] |
| | prompts = list(map(apply_template, prompts)) |
| | outputs = model.generate(prompts, sampling_params) |
| | |
| | print(outputs) |
| | ``` |
| |
|
| | ## Citation |
| |
|
| | ```latex |
| | @article{liu2025understanding, |
| | title={Understanding r1-zero-like training: A critical perspective}, |
| | author={Liu, Zichen and Chen, Changyu and Li, Wenjun and Qi, Penghui and Pang, Tianyu and Du, Chao and Lee, Wee Sun and Lin, Min}, |
| | journal={arXiv preprint arXiv:2503.20783}, |
| | year={2025} |
| | } |
| | ``` |
| |
|