Multi-Agent RL
This directory provides an example of running multi-agent reinforcement learning (RL) with slime.
Environment Setup
The environment setup is identical to the standard RL setup used in slime.
Running the Script
You can either define your own multi-agent system or use the provided default configuration.
MULTI_AGENT_CONFIGS = {
"custom_multi_agent_function_path": "examples.multi_agent.agent_system.run_agent_system",
"num_parallel": 5,
"incorrect_reward_weight": 0.8,
"correct_reward_weight": 1.2,
}
To start a run, execute:
cd slime/
bash examples/multi_agent/run-qwen3-30B-A3B-multi-agent.sh
New Arguments
- Specify the agent rollout function with the
--custom-generate-function-pathargument. - Set the
--rollout-max-context-lenargument according to your model’s context window.
ROLLOUT_ARGS=(
--custom-generate-function-path examples.multi_agent.rollout_with_multi_agents.generate_with_multi_agents
--prompt-data /root/dapo-math-17k/dapo-math-17k.jsonl
--input-key prompt
--label-key label
--apply-chat-template
--rollout-shuffle
--rm-type deepscaler
--num-rollout 3000
--rollout-batch-size 32
--n-samples-per-prompt 8
--rollout-max-context-len 16384
--rollout-max-response-len 8192
--rollout-temperature 0.8
--global-batch-size 256
--balance-data
)