--- license: apache-2.0 language: - en library_name: transformers base_model: Qwen/Qwen3-4B tags: - mapf - reinforcement-learning - grpo - qwen3 - arxiv:2606.17682 pipeline_tag: text-generation --- This is the checkpoint of the [From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning](https://arxiv.org/abs/2606.17682)