Simple-Efficient
/

RLFactory-Qwen3-8B-GRPO

+---
+license: apache-2.0
+library_name: transformers
+pipeline_tag: text-generation
+---
+# RLFactory-Qwen3-8B-GRPO
+This repository contains the `RLFactory-Qwen3-8B-GRPO` model, which is an agentic Large Language Model developed within the [RLFactory: A Plug-and-Play Reinforcement Learning Post-Training Framework for LLM Multi-Turn Tool-Use](https://huggingface.co/papers/2509.06980) framework.
+RLFactory is an easy and efficient RL post-training framework for Agentic Learning, decoupling the environment from RL post-training, enabling training with just a tool config and reward function while supporting async tool-calling to make RL post-training faster.
+<div align="center">
+  <img src="https://github.com/user-attachments/assets/9793f779-c80e-48e6-813a-1c8f377cf5d1" alt="Description" style="width:300px; height:auto;"/>
+</div>
+**Paper**: [RLFactory: A Plug-and-Play Reinforcement Learning Post-Training Framework for LLM Multi-Turn Tool-Use](https://huggingface.co/papers/2509.06980)
+**Code**: https://github.com/Simple-Efficient/RL-Factory
+## Overview of RLFactory Framework
+RLFactory maximizes the utility of labeled data through a bi-level knowledge *propagation-and-selection* framework, while leveraging collaborative learning among multiple LLMs to exploit unlabeled data, unleashing the full data potential.
+<div align="center">
+  <img src="https://github.com/user-attachments/assets/883fd8c0-afa9-4ed2-95be-333a79ce7e36" alt="Framework Design" style="width:750px; height:auto;"/>
+</div>
+## Quickstart
+This section demonstrates how to load and use the `RLFactory-Qwen3-8B-GRPO` model for inference.
+Ensure you have the necessary dependencies installed as specified in the [GitHub repository](https://github.com/Simple-Efficient/RL-Factory).
+### Inference with Code
+You can use the provided `eagenerate` function for speedup generation, similar to using `generate` from Hugging Face. Here is an example:
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+from mcp.models.tool_model import ToolModel
+# Define your model path and the tools for the agent
+MODEL_PATH = "Simple-Efficient/RLFactory-Qwen3-8B-GRPO"
+# Note: You'll need to define your tool configuration or replace this with a dummy setup
+# For actual tool use, refer to the official RLFactory GitHub for tool definition
+tools_config = {
+    "calculator": {
+        "description": "A calculator tool to perform arithmetic operations.",
+        "schema": {
+            "name": "calculator",
+            "description": "A calculator tool to perform arithmetic operations.",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "expression": {"type": "string", "description": "The arithmetic expression to evaluate."},
+                },
+                "required": ["expression"],
+            },
+        },
+    },
+}
+# Initialize tokenizer and model
+tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    MODEL_PATH,
+    torch_dtype=torch.bfloat16, # or torch.float16 depending on your setup
+    device_map="auto",
+    trust_remote_code=True
+).eval()
+# Wrap the model with ToolModel for agentic capabilities
+agent_model = ToolModel(model=model, tokenizer=tokenizer, tools_info=tools_config)
+# Example conversation prompt
+prompt = (
+    "<|im_start|>user
+"
+    "What is the sum of 123 and 456?
+"
+    "<|im_end|>
+"
+    "<|im_start|>assistant
+"
+)
+# Generate response
+input_ids = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
+output_ids = agent_model.generate(
+    input_ids,
+    max_new_tokens=512,
+    do_sample=False, # Set to True for creative responses
+    temperature=0.1, # Adjust for creativity
+    pad_token_id=tokenizer.eos_token_id,
+)
+generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
+print(generated_text)
+```
+**Note**: This `ToolModel` wrapping is a simplified example. For a complete understanding and proper integration with tools, please refer to the [official RLFactory documentation](https://github.com/Simple-Efficient/RL-Factory/blob/main/docs/rl_factory/en/main_tutorial.md).
+## Citation
+If you find our work useful or helpful for your research, please cite our paper:
+```bibtex
+@article{chen2025rlfactory,
+  title={RLFactory: A Plug-and-Play Reinforcement Learning Post-Training Framework for LLM Multi-Turn Tool-Use},
+  author={Chen, Chaoyu and Liu, Bingchang and Liao, Cong and Gong, Zi and Lei, Zhichao and Yu, Hang and Li, Jianguo},
+  journal={arXiv preprint arXiv:2509.06980},
+  year={2025}
+}
+```