| # Multi-Agent RL | |
| This directory provides an example of running multi-agent reinforcement learning (RL) with slime. | |
| ## Environment Setup | |
| The environment setup is identical to the standard RL setup used in slime. | |
| ## Running the Script | |
| You can either define your own multi-agent system or use the provided default configuration. | |
| ```python | |
| MULTI_AGENT_CONFIGS = { | |
| "custom_multi_agent_function_path": "examples.multi_agent.agent_system.run_agent_system", | |
| "num_parallel": 5, | |
| "incorrect_reward_weight": 0.8, | |
| "correct_reward_weight": 1.2, | |
| } | |
| ``` | |
| To start a run, execute: | |
| ```bash | |
| cd slime/ | |
| bash examples/multi_agent/run-qwen3-30B-A3B-multi-agent.sh | |
| ``` | |
| ## New Arguments | |
| - Specify the agent rollout function with the `--custom-generate-function-path` argument. | |
| - Set the `--rollout-max-context-len` argument according to your model’s context window. | |
| ```bash | |
| ROLLOUT_ARGS=( | |
| --custom-generate-function-path examples.multi_agent.rollout_with_multi_agents.generate_with_multi_agents | |
| --prompt-data /root/dapo-math-17k/dapo-math-17k.jsonl | |
| --input-key prompt | |
| --label-key label | |
| --apply-chat-template | |
| --rollout-shuffle | |
| --rm-type deepscaler | |
| --num-rollout 3000 | |
| --rollout-batch-size 32 | |
| --n-samples-per-prompt 8 | |
| --rollout-max-context-len 16384 | |
| --rollout-max-response-len 8192 | |
| --rollout-temperature 0.8 | |
| --global-batch-size 256 | |
| --balance-data | |
| ) | |
| ``` |