Improve model card: add tags, paper link, code link, and sample usage

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +113 -3
README.md CHANGED
@@ -1,3 +1,113 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: text-generation
5
+ ---
6
+
7
+ # RLFactory-Qwen3-8B-GRPO
8
+
9
+ This repository contains the `RLFactory-Qwen3-8B-GRPO` model, which is an agentic Large Language Model developed within the [RLFactory: A Plug-and-Play Reinforcement Learning Post-Training Framework for LLM Multi-Turn Tool-Use](https://huggingface.co/papers/2509.06980) framework.
10
+
11
+ RLFactory is an easy and efficient RL post-training framework for Agentic Learning, decoupling the environment from RL post-training, enabling training with just a tool config and reward function while supporting async tool-calling to make RL post-training faster.
12
+
13
+ <div align="center">
14
+ <img src="https://github.com/user-attachments/assets/9793f779-c80e-48e6-813a-1c8f377cf5d1" alt="Description" style="width:300px; height:auto;"/>
15
+ </div>
16
+
17
+ **Paper**: [RLFactory: A Plug-and-Play Reinforcement Learning Post-Training Framework for LLM Multi-Turn Tool-Use](https://huggingface.co/papers/2509.06980)
18
+ **Code**: https://github.com/Simple-Efficient/RL-Factory
19
+
20
+ ## Overview of RLFactory Framework
21
+
22
+ RLFactory maximizes the utility of labeled data through a bi-level knowledge *propagation-and-selection* framework, while leveraging collaborative learning among multiple LLMs to exploit unlabeled data, unleashing the full data potential.
23
+
24
+ <div align="center">
25
+ <img src="https://github.com/user-attachments/assets/883fd8c0-afa9-4ed2-95be-333a79ce7e36" alt="Framework Design" style="width:750px; height:auto;"/>
26
+ </div>
27
+
28
+ ## Quickstart
29
+
30
+ This section demonstrates how to load and use the `RLFactory-Qwen3-8B-GRPO` model for inference.
31
+ Ensure you have the necessary dependencies installed as specified in the [GitHub repository](https://github.com/Simple-Efficient/RL-Factory).
32
+
33
+ ### Inference with Code
34
+
35
+ You can use the provided `eagenerate` function for speedup generation, similar to using `generate` from Hugging Face. Here is an example:
36
+
37
+ ```python
38
+ from transformers import AutoTokenizer, AutoModelForCausalLM
39
+ import torch
40
+ from mcp.models.tool_model import ToolModel
41
+
42
+ # Define your model path and the tools for the agent
43
+ MODEL_PATH = "Simple-Efficient/RLFactory-Qwen3-8B-GRPO"
44
+ # Note: You'll need to define your tool configuration or replace this with a dummy setup
45
+ # For actual tool use, refer to the official RLFactory GitHub for tool definition
46
+ tools_config = {
47
+ "calculator": {
48
+ "description": "A calculator tool to perform arithmetic operations.",
49
+ "schema": {
50
+ "name": "calculator",
51
+ "description": "A calculator tool to perform arithmetic operations.",
52
+ "parameters": {
53
+ "type": "object",
54
+ "properties": {
55
+ "expression": {"type": "string", "description": "The arithmetic expression to evaluate."},
56
+ },
57
+ "required": ["expression"],
58
+ },
59
+ },
60
+ },
61
+ }
62
+
63
+ # Initialize tokenizer and model
64
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
65
+ model = AutoModelForCausalLM.from_pretrained(
66
+ MODEL_PATH,
67
+ torch_dtype=torch.bfloat16, # or torch.float16 depending on your setup
68
+ device_map="auto",
69
+ trust_remote_code=True
70
+ ).eval()
71
+
72
+ # Wrap the model with ToolModel for agentic capabilities
73
+ agent_model = ToolModel(model=model, tokenizer=tokenizer, tools_info=tools_config)
74
+
75
+ # Example conversation prompt
76
+ prompt = (
77
+ "<|im_start|>user
78
+ "
79
+ "What is the sum of 123 and 456?
80
+ "
81
+ "<|im_end|>
82
+ "
83
+ "<|im_start|>assistant
84
+ "
85
+ )
86
+
87
+ # Generate response
88
+ input_ids = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
89
+ output_ids = agent_model.generate(
90
+ input_ids,
91
+ max_new_tokens=512,
92
+ do_sample=False, # Set to True for creative responses
93
+ temperature=0.1, # Adjust for creativity
94
+ pad_token_id=tokenizer.eos_token_id,
95
+ )
96
+
97
+ generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
98
+ print(generated_text)
99
+ ```
100
+ **Note**: This `ToolModel` wrapping is a simplified example. For a complete understanding and proper integration with tools, please refer to the [official RLFactory documentation](https://github.com/Simple-Efficient/RL-Factory/blob/main/docs/rl_factory/en/main_tutorial.md).
101
+
102
+ ## Citation
103
+
104
+ If you find our work useful or helpful for your research, please cite our paper:
105
+
106
+ ```bibtex
107
+ @article{chen2025rlfactory,
108
+ title={RLFactory: A Plug-and-Play Reinforcement Learning Post-Training Framework for LLM Multi-Turn Tool-Use},
109
+ author={Chen, Chaoyu and Liu, Bingchang and Liao, Cong and Gong, Zi and Lei, Zhichao and Yu, Hang and Li, Jianguo},
110
+ journal={arXiv preprint arXiv:2509.06980},
111
+ year={2025}
112
+ }
113
+ ```