nielsr's picture
nielsr HF Staff
Improve model card: add tags, paper link, code link, and sample usage
fed0df8 verified
|
raw
history blame
4.62 kB
metadata
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation

RLFactory-Qwen3-8B-GRPO

This repository contains the RLFactory-Qwen3-8B-GRPO model, which is an agentic Large Language Model developed within the RLFactory: A Plug-and-Play Reinforcement Learning Post-Training Framework for LLM Multi-Turn Tool-Use framework.

RLFactory is an easy and efficient RL post-training framework for Agentic Learning, decoupling the environment from RL post-training, enabling training with just a tool config and reward function while supporting async tool-calling to make RL post-training faster.

Description

Paper: RLFactory: A Plug-and-Play Reinforcement Learning Post-Training Framework for LLM Multi-Turn Tool-Use Code: https://github.com/Simple-Efficient/RL-Factory

Overview of RLFactory Framework

RLFactory maximizes the utility of labeled data through a bi-level knowledge propagation-and-selection framework, while leveraging collaborative learning among multiple LLMs to exploit unlabeled data, unleashing the full data potential.

Framework Design

Quickstart

This section demonstrates how to load and use the RLFactory-Qwen3-8B-GRPO model for inference. Ensure you have the necessary dependencies installed as specified in the GitHub repository.

Inference with Code

You can use the provided eagenerate function for speedup generation, similar to using generate from Hugging Face. Here is an example:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
from mcp.models.tool_model import ToolModel

# Define your model path and the tools for the agent
MODEL_PATH = "Simple-Efficient/RLFactory-Qwen3-8B-GRPO"
# Note: You'll need to define your tool configuration or replace this with a dummy setup
# For actual tool use, refer to the official RLFactory GitHub for tool definition
tools_config = {
    "calculator": {
        "description": "A calculator tool to perform arithmetic operations.",
        "schema": {
            "name": "calculator",
            "description": "A calculator tool to perform arithmetic operations.",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {"type": "string", "description": "The arithmetic expression to evaluate."},
                },
                "required": ["expression"],
            },
        },
    },
}

# Initialize tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_PATH,
    torch_dtype=torch.bfloat16, # or torch.float16 depending on your setup
    device_map="auto",
    trust_remote_code=True
).eval()

# Wrap the model with ToolModel for agentic capabilities
agent_model = ToolModel(model=model, tokenizer=tokenizer, tools_info=tools_config)

# Example conversation prompt
prompt = (
    "<|im_start|>user
"
    "What is the sum of 123 and 456?
"
    "<|im_end|>
"
    "<|im_start|>assistant
"
)

# Generate response
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
output_ids = agent_model.generate(
    input_ids,
    max_new_tokens=512,
    do_sample=False, # Set to True for creative responses
    temperature=0.1, # Adjust for creativity
    pad_token_id=tokenizer.eos_token_id,
)

generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(generated_text)

Note: This ToolModel wrapping is a simplified example. For a complete understanding and proper integration with tools, please refer to the official RLFactory documentation.

Citation

If you find our work useful or helpful for your research, please cite our paper:

@article{chen2025rlfactory,
  title={RLFactory: A Plug-and-Play Reinforcement Learning Post-Training Framework for LLM Multi-Turn Tool-Use},
  author={Chen, Chaoyu and Liu, Bingchang and Liao, Cong and Gong, Zi and Lei, Zhichao and Yu, Hang and Li, Jianguo},
  journal={arXiv preprint arXiv:2509.06980},
  year={2025}
}