Improve model card: Add pipeline tag, library name, paper and code links

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +81 -4
README.md CHANGED
@@ -1,7 +1,84 @@
1
  ---
2
- license: mit
3
- datasets:
4
- - Open-Reasoner-Zero/orz_math_57k_collection
5
  base_model:
6
  - Qwen/Qwen2.5-7B
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
2
  base_model:
3
  - Qwen/Qwen2.5-7B
4
+ datasets:
5
+ - Open-Reasoner-Zero/orz_math_57k_collection
6
+ license: mit
7
+ pipeline_tag: text-generation
8
+ library_name: transformers
9
+ tags:
10
+ - code-generation
11
+ - tool-use
12
+ - mathematical-reasoning
13
+ - rlhf
14
+ ---
15
+
16
+ # Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving
17
+
18
+ This repository contains the **ZeroTIR** model, a large language model fine-tuned for mathematical problem solving through spontaneous Python code generation and execution. This model was introduced in the paper:
19
+
20
+ ๐Ÿ“š [**Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving**](https://huggingface.co/papers/2505.07773)
21
+
22
+ ## Model Description
23
+
24
+ Large Language Models (LLMs) often struggle with mathematical reasoning tasks requiring precise, verifiable computation. While Reinforcement Learning (RL) from outcome-based rewards enhances text-based reasoning, understanding how agents autonomously learn to leverage external tools like code execution remains crucial. This work investigates RL from outcome-based rewards for Tool-Integrated Reasoning (ZeroTIR), training base LLMs to spontaneously generate and execute Python code for mathematical problems without supervised tool-use examples. The central contribution is demonstrating that as RL training progresses, key metrics scale predictably. Specifically, strong positive correlations are observed where increased training steps lead to increases in spontaneous code execution frequency, average response length, and critically, final task accuracy. This suggests a quantifiable relationship between computational effort invested in training and the emergence of effective, tool-augmented reasoning strategies. ZeroTIR significantly surpasses non-tool ZeroRL baselines on challenging math benchmarks.
25
+
26
+ ## Usage
27
+
28
+ This model can be loaded and used with the `transformers` library for text generation tasks, specifically for mathematical problem solving by enabling code execution.
29
+
30
+ ```python
31
+ import torch
32
+ from transformers import AutoTokenizer, AutoModelForCausalLM
33
+
34
+ # Assuming the model ID on the Hugging Face Hub is named as below:
35
+ model_id = "Open-Reasoner-Zero/Agent-RL-Scaling-Law-ZeroTIR-Qwen2.5-7B"
36
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
37
+ model = AutoModelForCausalLM.from_pretrained(
38
+ model_id,
39
+ torch_dtype=torch.bfloat16,
40
+ device_map="auto"
41
+ )
42
+
43
+ # Example for mathematical problem solving with spontaneous code execution
44
+ # The model is trained to generate code blocks to solve problems.
45
+ prompt = (
46
+ "A rectangle has a perimeter of 40 units. Its length is 3 times its width. "
47
+ "What is the area of the rectangle? Provide your reasoning and use Python code to verify your answer.\
48
+ "
49
+ "```python\
50
+ "
51
+ )
52
+
53
+ input_ids = tokenizer(prompt, return_tensors="pt").to(model.device)
54
+
55
+ outputs = model.generate(
56
+ **input_ids,
57
+ max_new_tokens=512,
58
+ do_sample=True,
59
+ temperature=0.7,
60
+ top_p=0.9,
61
+ eos_token_id=tokenizer.eos_token_id
62
+ )
63
+ generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
64
+ print(generated_text)
65
+ ```
66
+
67
+ ## Code
68
+
69
+ The official code and further details regarding this work can be found on the GitHub repository:
70
+
71
+ ๐Ÿ”— [**https://github.com/yyht/openrlhf_async_pipline**](https://github.com/yyht/openrlhf_async_pipline)
72
+
73
+ ## Citation
74
+
75
+ If you use this model or the associated research, please cite the paper:
76
+
77
+ ```bibtex
78
+ @article{,
79
+ title={Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving},
80
+ author={},
81
+ journal={arXiv preprint arXiv:2505.07773},
82
+ year={2025}
83
+ }
84
+ ```