Instructions to use RuishanFang/Qwen3-4B-RODS with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RuishanFang/Qwen3-4B-RODS with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="RuishanFang/Qwen3-4B-RODS")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("RuishanFang/Qwen3-4B-RODS")
model = AutoModelForMultimodalLM.from_pretrained("RuishanFang/Qwen3-4B-RODS")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use RuishanFang/Qwen3-4B-RODS with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "RuishanFang/Qwen3-4B-RODS"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RuishanFang/Qwen3-4B-RODS",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/RuishanFang/Qwen3-4B-RODS

SGLang

How to use RuishanFang/Qwen3-4B-RODS with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "RuishanFang/Qwen3-4B-RODS" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RuishanFang/Qwen3-4B-RODS",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "RuishanFang/Qwen3-4B-RODS" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RuishanFang/Qwen3-4B-RODS",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use RuishanFang/Qwen3-4B-RODS with Docker Model Runner:
```
docker model run hf.co/RuishanFang/Qwen3-4B-RODS
```

RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use Agents

Model Overview

The Qwen3-4B-RODS model is a high-performance Large Language Model (LLM) fine-tuned for complex, multi-turn Function Calling (FC) and agentic tool-use tasks. Built upon the Qwen3-4B-Instruct base model, it has been trained using the novel RODS (Reward-driven Online Data Synthesis) framework combined with GRPO reinforcement learning.

RODS closes the loop between RL training and data generation: it repurposes the progress reward variance as a zero-cost capability boundary detector, continuously synthesizes structurally isomorphic training data at the agent's learning frontier, and manages a dynamic replay buffer that co-evolves with the policy. Starting from only 400 human-annotated seeds, RODS achieves strong multi-turn tool-use performance with extreme data efficiency.

Base Model: Qwen3-4B-Instruct
Size: 4 Billion parameters
Key Capability: Advanced Multi-Turn Function Calling and Agentic Tool-Use

Evaluation Results

The model was evaluated on the Berkeley Function-Calling Leaderboard (BFCL).

BFCLv3 Multi-Turn Performance

Model	Size	Multi-Turn (Overall)	Base	Miss Func	Miss Param	Long Context
Qwen3-4B-Instruct (Base)	4B	22.13	26.50	21.00	15.50	25.50
Qwen3-4B + RODS (ours)	4B	56.00	68.00	59.00	44.00	53.00
Claude-Sonnet-4-5-20250929	-	61.38	69.00	65.00	52.50	59.00
Grok-4-1-fast-reasoning	-	58.88	70.50	59.50	43.00	62.50
Kimi-K2-Instruct	1043B	50.63	62.00	41.00	44.50	55.00
Qwen3-32B	32B	47.88	56.00	52.50	40.00	43.00
DeepSeek-V3.2-Exp	671B	44.88	55.00	49.00	27.00	48.50
GPT-4o-2024-11-20	-	42.50	55.50	34.50	29.00	51.00

Training Data and Framework

RODS Framework

RODS is a closed-loop RL-data synthesis framework with three co-evolving modules:

Reward-Based Boundary Detection: Uses GRPO rollout reward variance as a zero-cost probe to identify tasks at the agent's capability boundary, where gradient signal is richest.
Skill-Aligned Synthesis Pipeline: A multi-agent pipeline (Planner → Executor → Rewriter → Critic) generates structurally isomorphic variants that preserve API topology and dependency depth while introducing novel narratives and environment states.
Dynamic Replay Buffer Management: A dual-control lifecycle with staged injection and multi-layer retirement keeps the training pool anchored at the shifting capability boundary.

Training Details

Method: GRPO (Group Relative Policy Optimization)
Rollouts: K=16 per prompt
Training stages:
1. Format training (100 Base samples, format reward)
2. Base reasoning (100 Base samples, progress reward)
3. Full expansion (400 samples + dynamic synthesis, progress reward)
Synthesis backbone: Qwen3-32B via vLLM
Hardware: 8x A100 (training) + 8x A100 (synthesis)
Active training pool: ~800 samples (400 seeds + up to 400 generated)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "RuishanFang/Qwen3-4B-RODS"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

For tool-use inference, follow the Qwen3 function calling format. The model expects tools to be provided in the system prompt and generates structured <tool_call> responses.

Related Projects and Citation

This work is part of the open-source project AWorld, InclusionAI.

If you use RODS in your research, please cite:

@article{fang2026rods,
  title={RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use Agents},
  author={Fang, Ruishan and Lu, Siyuan and Zhuang, Chenyi and Lin, Tao},
  journal={arXiv preprint arXiv:2606.19047},
  year={2026}
}

Contact

For inquiries, please contact:

fangruishan@westlake.edu.cn

Downloads last month: 2

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for RuishanFang/Qwen3-4B-RODS

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Finetuned

(724)

this model

Dataset used to train RuishanFang/Qwen3-4B-RODS

Paper for RuishanFang/Qwen3-4B-RODS

RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use Agents

Paper • 2606.19047 • Published 2 days ago • 3

Evaluation results

Overall Accuracy on BFCL V3 Multi-Turn
self-reported

56.000