sail
/

Qwen2.5-Math-7B-Oat-Zero

Text Generation

text-generation-inference

Model card Files Files and versions

Qwen2.5-Math-7B-Oat-Zero / README.md

lkevinzc's picture

Update README.md

9830f56 verified 8 months ago

|

history blame contribute delete

2.68 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model:
	- Qwen/Qwen2.5-Math-7B
	---

	# Qwen2.5-Math-7B-Oat-Zero

	## Links

	- 📜 [Paper](https://github.com/sail-sg/understand-r1-zero/blob/main/understand-r1-zero.pdf)
	- 💻 [GitHub](https://github.com/sail-sg/understand-r1-zero)
	- 🤗 [Oat-Zero Collection](https://huggingface.co/collections/sail/oat-zero-understanding-r1-zero-like-training-67dcdb07b9f3eb05f1501c4a)

	## Introduction

	This model is trained by the minimalist R1-Zero recipe introduced in our paper:
	- Algorithm: Dr. DRPO
	- Data: level 3-5 questions from MATH dataset
	- Base model: [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B)
	- Template: Qwen-Math

	Evaluation results on widely used math benchmarks are shown below:

	<img src="https://raw.githubusercontent.com/sail-sg/understand-r1-zero/refs/heads/main/assets/benchmark_table.png" width=100%/>

	## Usage

	```python
	import vllm


	def apply_qwen_math_template(question: str):
	return (
	"<\|im_start\|>system\nPlease reason step by step, and put your final answer within \\boxed{}.<\|im_end\|>\n<\|im_start\|>user\n"
	+ question
	+ "<\|im_end\|>\n<\|im_start\|>assistant\n"
	)

	def apply_r1_template(question: str):
	return (
	"A conversation between User and Assistant. The User asks a question, and the Assistant solves it. The Assistant first thinks about the reasoning process in the mind and then provides the User with the answer. "
	"The reasoning process is enclosed within <think> </think> and answer is enclosed within <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>.\nUser: "
	+ question
	+ "\nAssistant: <think>"
	)

	model_name = "sail/Qwen2.5-Math-7B-Oat-Zero"

	sampling_params = vllm.SamplingParams(
	n=1,
	temperature=0,
	top_p=1,
	max_tokens=3000,
	)

	model = vllm.LLM(
	model_name,
	max_model_len=4096,
	dtype="bfloat16",
	enable_prefix_caching=True,
	)

	if "Llama-3.2-3B-Oat-Zero" in model_name:
	apply_template = apply_r1_template
	else:
	apply_template = apply_qwen_math_template

	prompts = [
	"How many positive whole-number divisors does 196 have?"
	]
	prompts = list(map(apply_template, prompts))
	outputs = model.generate(prompts, sampling_params)

	print(outputs)
	```

	## Citation

	```latex
	@article{liu2025understanding,
	title={Understanding r1-zero-like training: A critical perspective},
	author={Liu, Zichen and Chen, Changyu and Li, Wenjun and Qi, Penghui and Pang, Tianyu and Du, Chao and Lee, Wee Sun and Lin, Min},
	journal={arXiv preprint arXiv:2503.20783},
	year={2025}
	}
	```