| | --- |
| | license: mit |
| | language: |
| | - en |
| | pipeline_tag: reinforcement-learning |
| | tags: |
| | - robotics |
| | - reinforcement_learning |
| | - humanoid |
| | - soccer |
| | - sai |
| | - mujoco |
| | --- |
| | |
| | ## Model Details |
| |
|
| | ### Model Description |
| |
|
| | This repository hosts the **Booster Soccer Controller Suite** — a collection of reinforcement learning policies and controllers powering humanoid agents in the [**Booster Soccer Showdown**](https://competesai.com/competitions/cmp_xnSCxcJXQclQ). |
| |
|
| | It contains: |
| | 1. **Low-Level Controller (robot/):** |
| | A proprioceptive policy for the **Lower T1** humanoid that converts high-level commands (forward, lateral, and yaw velocities) into joint angle targets. |
| | 2. **Competition Policies (model/):** |
| | High-level agents trained in SAI’s soccer environments that output those high-level commands for match-time play. |
| |
|
| | - **Developed by:** ArenaX Labs |
| | - **License:** MIT |
| | - **Frameworks:** PyTorch · MuJoCo · Stable-Baselines3 |
| | - **Environments:** Booster Gym / SAI Soccer tasks |
| |
|
| |
|
| | ## Testing Instructions |
| |
|
| | 1. **Clone the repo** |
| |
|
| | ```bash |
| | git clone https://github.com/ArenaX-Labs/booster_soccer_showdown.git |
| | cd booster_soccer_showdown |
| | ``` |
| |
|
| | 2. **Create & activate a Python 3.10+ environment** |
| |
|
| | ```bash |
| | # any env manager is fine; here are a few options |
| | # --- venv --- |
| | python3 -m venv .venv |
| | source .venv/bin/activate # Windows: .venv\Scripts\activate |
| | |
| | # --- conda --- |
| | # conda create -n booster-ssl python=3.11 -y && conda activate booster-ssl |
| | ``` |
| |
|
| | 3. **Install dependencies** |
| |
|
| | ```bash |
| | pip install -r requirements.txt |
| | ``` |
| |
|
| | --- |
| |
|
| | ### Teleoperation |
| |
|
| | Booster Soccer Showdown supports keyboard teleop out of the box. |
| |
|
| | ```bash |
| | python booster_control/teleoperate.py \ |
| | --env LowerT1GoaliePenaltyKick-v0 |
| | ``` |
| |
|
| | **Default bindings (example):** |
| |
|
| | * `W/S`: move forward/backward |
| | * `A/D`: move left/right |
| | * `Q/E`: rotate left/right |
| | * `L`: reset commands |
| | * `P`: reset environment |
| |
|
| | --- |
| |
|
| | ⚠️ **Note for macOS and Windows users** |
| | Because different renderers are used on macOS and Windows, you may need to adjust the **position** and **rotation** sensitivity for smooth teleoperation. |
| | Run the following command with the sensitivity flags set explicitly: |
| |
|
| | ```bash |
| | python booster_control/teleoperate.py \ |
| | --env LowerT1GoaliePenaltyKick-v0 \ |
| | --pos_sensitivity 1.5 \ |
| | --rot_sensitivity 1.5 |
| | ``` |
| |
|
| | (Tune `--pos_sensitivity` and `--rot_sensitivity` as needed for your setup.) |
| |
|
| | --- |
| |
|
| | ### Training |
| |
|
| | We provide a minimal reinforcement learning pipeline for training agents with **Deep Deterministic Policy Gradient (DDPG)** in the Booster Soccer Showdown environments in the `training_scripts/` folder. The training stack consists of three scripts: |
| |
|
| | #### 1) `ddpg.py` |
| |
|
| | Defines the **DDPG_FF model**, including: |
| | |
| | * Actor and Critic neural networks with configurable hidden layers and activation functions. |
| | * Target networks and soft-update mechanism for stability. |
| | * Training step implementation (critic loss with MSE, actor loss with policy gradient). |
| | * Utility functions for forward passes, action selection, and backpropagation. |
| | |
| | --- |
| | |
| | #### 2) `training.py` |
| | |
| | Provides the **training loop** and supporting components: |
| | |
| | * **ReplayBuffer** for experience storage and sampling. |
| | * **Exploration noise** injection to encourage policy exploration. |
| | * Iterative training loop that: |
| | |
| | * Interacts with the environment. |
| | * Stores experiences. |
| | * Periodically samples minibatches to update actor/critic networks. |
| | * Tracks and logs progress (episode rewards, critic/actor loss) with `tqdm`. |
| | |
| | --- |
| | |
| | #### 3) `main.py` |
| | |
| | Serves as the **entry point** to run training: |
| | |
| | * Initializes the Booster Soccer Showdown environment via the **SAI client**. |
| | * Defines a **Preprocessor** to normalize and concatenate robot state, ball state, and environment info into a training-ready observation vector. |
| | * Instantiates a **DDPG_FF model** with custom architecture. |
| | * Defines an **action function** that rescales raw policy outputs to environment-specific action bounds. |
| | * Calls the training loop, and after training, supports: |
| |
|
| | * `sai.watch(...)` for visualizing learned behavior. |
| | * `sai.benchmark(...)` for local benchmarking. |
| |
|
| | --- |
| |
|
| | #### Example: Run Training |
| |
|
| | ```bash |
| | python training_scripts/main.py |
| | ``` |
| |
|
| | This will: |
| |
|
| | 1. Build the environment. |
| | 2. Initialize the model. |
| | 3. Run the training loop with replay buffer and DDPG updates. |
| | 4. Launch visualization and benchmarking after training. |
| |
|
| |
|
| | #### Example: Test pretrained model |
| |
|
| | ```bash |
| | python training_scripts/test.py --env LowerT1KickToTarget-v0 |
| | ``` |
| |
|
| |
|
| |
|
| |
|