Spaces:

JSCPPProgrammer
/

gensearcher-firered

Paused

App Files Files Community

gensearcher-firered / vendor /rllm /docs /getting-started /quick-start.md

JSCPPProgrammer

Initial: GenSearcher workflow + FireRed /generate adapter + Gradio

80b7188 verified 2 months ago

preview code

raw

history blame contribute delete

5.45 kB

Quick Start with rLLM

This guide walks you through using rLLM to build AI agents with tool usage capabilities. We'll use the math tool agent example to demonstrate the complete workflow from dataset preparation through model training.

Overview

In this tutorial, you'll create a math reasoning agent that can:

Access a Python interpreter to solve mathematical problems
Perform step-by-step reasoning with interleaved tool usage
Learn and improve its math problem solving ability through reinforcement learning

The example uses:

Base Model: Qwen3-4B
Training Data: DeepScaleR-Preview-Math dataset
Evaluation Data: AIME 2024 mathematics competition problems
Tools: Python interpreter for mathematical computations

Prerequisites

Before starting, ensure you have:

rLLM Installation: Follow the installation guide
GPU Requirements: At least 1 GPU with 16GB+ memory for inference, 8+ GPUs for training
Model Server: We'll use vLLM or SGLang to serve the base model

Step 1: Dataset Preparation

rLLM's DatasetRegistry provides a centralized way to manage datasets. Let's prepare the math datasets:

--8<-- "examples/math_tool/prepare_math_data.py"

This registers the training dataset deepscaler_math and the testing dataset aime2024. Under the hood, rLLM stores the processed data as parquet files in a format suitable for both inference and training. Later, you can easily load the registered datasets using DatasetRegistry.load_dataset.

Run the preparation script:

cd examples/math_tool
python prepare_math_data.py

Step 2: Model Server Setup

rLLM requires a model server for inference. Choose one of these options:

Option A: vLLM Server

python -m vllm.entrypoints.openai.api_server \
    --model Qwen/Qwen3-4B \
    --host 0.0.0.0 \
    --port 30000 \
    --dtype bfloat16

Option B: SGLang Server

python -m sglang_router.launch_server \
    --model-path Qwen/Qwen3-4B \
    --dp-size 1 \
    --dtype bfloat16

The server provides an OpenAI-compatible API at http://localhost:30000/v1.

Step 3: Model Inference

Now let's run inference to see how agents solve math problems using tools:

--8<-- "examples/math_tool/run_math_with_tool.py"

Run the inference script:

cd examples/math_tool
python run_math_with_tool.py

The script above configures a ToolAgent from rLLM with access to the python tool for solving math problems in AIME2024, and a ToolEnvironment for handling Python tool calls and returning results.

The AgentExecutionEngine orchestrates the interaction between the ToolAgent and ToolEnvironment. The execute_tasks function launches 64 agent-environment pairs in parallel (n_parallel_agents=64) for rollout generation and returns results after all problems from the AIME2024 dataset are processed. Finally, the Pass@1 and Pass@K metrics for AIME are computed and printed.

Step 4: Agent Training with GRPO

Training improves the agent's ability to use tools effectively. rLLM uses verl as its training backend, which supports training language models with GRPO and various other RL algorithms.

--8<-- "examples/math_tool/train_math_with_tool.py"

Run the training script:

cd examples/math_tool
bash train_math_with_tool.sh

The script above launches an RL training job for our ToolAgent, using deepscaler_math as the training set and aime2024 as the test set. Under the hood, rLLM handles agent trajectory generation using our AgentExecutionEngine and transforms the trajectories into verl's format for model training using FSDP or Megatron. The training process works as follows:

Rollout Generation: A batch of data is passed to AgentExecutionEngine, which launches multiple agent-environment pairs in parallel to process the batch. The engine returns all trajectories along with rewards computed by the environment.
Transform Trajectories: Agent trajectories are transformed into the corresponding format for our training backend verl.
Advantage Calculation with GRPO: verl uses GRPO for advantage calculation.
Model Update: verl updates the model parameters to increase the probability of successful actions. The updated model is then used to generate trajectories for the next batch of data.

Key rLLM Components in This Example

Component	Purpose	Example Usage
`ToolAgent`	Agent with tool usage capabilities	Reasoning + Python execution
`ToolEnvironment`	Safe tool execution environment	Sandboxed Python interpreter
`DatasetRegistry`	Centralized dataset management	Load/register math datasets
`AgentExecutionEngine`	Parallel agent execution	Efficient batch inference
`AgentTrainer`	RL training orchestration	PPO-based agent improvement

Next Steps

Congratulations! You've successfully used rLLM to run and train a ToolAgent for math problem solving. For a deeper dive into rLLM's main components, check out Core Concepts in rLLM.