Spaces:

JSCPPProgrammer
/

gensearcher-firered

Paused

App Files Files Community

gensearcher-firered / vendor /rllm /examples /math_tool /README.md

JSCPPProgrammer

Initial: GenSearcher workflow + FireRed /generate adapter + Gradio

80b7188 verified 2 months ago

preview code

raw

history blame contribute delete

2.4 kB

Math Tool Agent Examples

This directory contains examples for training and running math reasoning agents with tool usage capabilities using the RLLM framework. The math tool agent has access to a Python interepreter to solve mathematical problems through step-by-step reasoning and tool-use.

Our examples uses the following:

Qwen3-4B as the base model
DeepScaleR-Math dataset for training
AIME2024 dataset for evaluation

Model Hosting

Option 1: Using vLLM

Start a vLLM server with OpenAI-compatible API:

python -m vllm.entrypoints.openai.api_server \
    --model Qwen/Qwen3-4B \
    --host 0.0.0.0 \
    --port 30000 \
    --dtype bfloat16

Option 2: Using SGLang

python -m sglang_router.launch_server \
    --model-path Qwen/Qwen3-4B \ 
    --dp-size 1 \
    --dtype bfloat16
# increase dp_size to enable data-parallel processing on multi-GPU

The server should be accessible at http://localhost:30000/v1

Dataset Preparation

Prepare the required datasets (AIME 2024 for testing, DeepScaleR for training):

cd examples/math_tool
python prepare_math_data.py

This will:

Download AIME 2024 dataset from HuggingFace
Download DeepScaleR math dataset for training
Register both datasets with the RLLM DatasetRegistry

Running Inference

Once your model server is running and datasets are prepared, you can run inference:

cd examples/math_tool
python run_math_with_tool.py

Configuration Options

You can modify the inference script parameters:

n_parallel_agents: Number of parallel agents (default: 64)
model_name: Model to use (default: "Qwen/Qwen3-4B")
base_url: API server URL (default: "http://localhost:30000/v1")
max_response_length: Maximum response length (default: 16384)
max_prompt_length: Maximum prompt length (default: 2048)
temperature: Sampling temperature (default: 0.6)
top_p: Top-p sampling (default: 0.95)

The script will:

Load the AIME 2024 test dataset
Repeat each problem 4 times for Pass@K evaluation
Run parallel inference using the async agent execution engine
Evaluate results and report Pass@1 and Pass@K accuracy

Training

Basic Training

To train a math reasoning agent with tool usage:

bash examples/math_tool/train_math_with_tool.sh