license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
RLFactory-Qwen3-8B-GRPO
This repository contains the RLFactory-Qwen3-8B-GRPO model, which is an agentic Large Language Model developed within the RLFactory: A Plug-and-Play Reinforcement Learning Post-Training Framework for LLM Multi-Turn Tool-Use framework.
RLFactory is an easy and efficient RL post-training framework for Agentic Learning, decoupling the environment from RL post-training, enabling training with just a tool config and reward function while supporting async tool-calling to make RL post-training faster.
Paper: RLFactory: A Plug-and-Play Reinforcement Learning Post-Training Framework for LLM Multi-Turn Tool-Use Code: https://github.com/Simple-Efficient/RL-Factory
Overview of RLFactory Framework
RLFactory maximizes the utility of labeled data through a bi-level knowledge propagation-and-selection framework, while leveraging collaborative learning among multiple LLMs to exploit unlabeled data, unleashing the full data potential.
Quickstart
This section demonstrates how to load and use the RLFactory-Qwen3-8B-GRPO model for inference.
Ensure you have the necessary dependencies installed as specified in the GitHub repository.
Inference with Code
You can use the provided eagenerate function for speedup generation, similar to using generate from Hugging Face. Here is an example:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
from mcp.models.tool_model import ToolModel
# Define your model path and the tools for the agent
MODEL_PATH = "Simple-Efficient/RLFactory-Qwen3-8B-GRPO"
# Note: You'll need to define your tool configuration or replace this with a dummy setup
# For actual tool use, refer to the official RLFactory GitHub for tool definition
tools_config = {
"calculator": {
"description": "A calculator tool to perform arithmetic operations.",
"schema": {
"name": "calculator",
"description": "A calculator tool to perform arithmetic operations.",
"parameters": {
"type": "object",
"properties": {
"expression": {"type": "string", "description": "The arithmetic expression to evaluate."},
},
"required": ["expression"],
},
},
},
}
# Initialize tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
MODEL_PATH,
torch_dtype=torch.bfloat16, # or torch.float16 depending on your setup
device_map="auto",
trust_remote_code=True
).eval()
# Wrap the model with ToolModel for agentic capabilities
agent_model = ToolModel(model=model, tokenizer=tokenizer, tools_info=tools_config)
# Example conversation prompt
prompt = (
"<|im_start|>user
"
"What is the sum of 123 and 456?
"
"<|im_end|>
"
"<|im_start|>assistant
"
)
# Generate response
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
output_ids = agent_model.generate(
input_ids,
max_new_tokens=512,
do_sample=False, # Set to True for creative responses
temperature=0.1, # Adjust for creativity
pad_token_id=tokenizer.eos_token_id,
)
generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(generated_text)
Note: This ToolModel wrapping is a simplified example. For a complete understanding and proper integration with tools, please refer to the official RLFactory documentation.
Citation
If you find our work useful or helpful for your research, please cite our paper:
@article{chen2025rlfactory,
title={RLFactory: A Plug-and-Play Reinforcement Learning Post-Training Framework for LLM Multi-Turn Tool-Use},
author={Chen, Chaoyu and Liu, Bingchang and Liao, Cong and Gong, Zi and Lei, Zhichao and Yu, Hang and Li, Jianguo},
journal={arXiv preprint arXiv:2509.06980},
year={2025}
}