Update README.md

2a517dc verified 5 months ago

4.45 kB

	---
	license: mit
	language:
	- en
	pipeline_tag: reinforcement-learning
	tags:
	- robotics
	- reinforcement_learning
	- humanoid
	- soccer
	- sai
	- mujoco
	---

	## Model Details

	### Model Description

	This repository hosts the Booster Soccer Controller Suite — a collection of reinforcement learning policies and controllers powering humanoid agents in the [Booster Soccer Showdown](https://competesai.com/competitions/cmp_xnSCxcJXQclQ).

	It contains:
	1. Low-Level Controller (robot/):
	A proprioceptive policy for the Lower T1 humanoid that converts high-level commands (forward, lateral, and yaw velocities) into joint angle targets.
	2. Competition Policies (model/):
	High-level agents trained in SAI’s soccer environments that output those high-level commands for match-time play.

	- Developed by: ArenaX Labs
	- License: MIT
	- Frameworks: PyTorch · MuJoCo · Stable-Baselines3
	- Environments: Booster Gym / SAI Soccer tasks


	## Testing Instructions

	1. Clone the repo

	```bash
	git clone https://github.com/ArenaX-Labs/booster_soccer_showdown.git
	cd booster_soccer_showdown
	```

	2. Create & activate a Python 3.10+ environment

	```bash
	# any env manager is fine; here are a few options
	# --- venv ---
	python3 -m venv .venv
	source .venv/bin/activate # Windows: .venv\Scripts\activate

	# --- conda ---
	# conda create -n booster-ssl python=3.11 -y && conda activate booster-ssl
	```

	3. Install dependencies

	```bash
	pip install -r requirements.txt
	```

	---

	### Teleoperation

	Booster Soccer Showdown supports keyboard teleop out of the box.

	```bash
	python booster_control/teleoperate.py \
	--env LowerT1GoaliePenaltyKick-v0
	```

	Default bindings (example):

	* `W/S`: move forward/backward
	* `A/D`: move left/right
	* `Q/E`: rotate left/right
	* `L`: reset commands
	* `P`: reset environment

	---

	⚠️ Note for macOS and Windows users
	Because different renderers are used on macOS and Windows, you may need to adjust the position and rotation sensitivity for smooth teleoperation.
	Run the following command with the sensitivity flags set explicitly:

	```bash
	python booster_control/teleoperate.py \
	--env LowerT1GoaliePenaltyKick-v0 \
	--pos_sensitivity 1.5 \
	--rot_sensitivity 1.5
	```

	(Tune `--pos_sensitivity` and `--rot_sensitivity` as needed for your setup.)

	---

	### Training

	We provide a minimal reinforcement learning pipeline for training agents with Deep Deterministic Policy Gradient (DDPG) in the Booster Soccer Showdown environments in the `training_scripts/` folder. The training stack consists of three scripts:

	#### 1) `ddpg.py`

	Defines the DDPG_FF model, including:

	* Actor and Critic neural networks with configurable hidden layers and activation functions.
	* Target networks and soft-update mechanism for stability.
	* Training step implementation (critic loss with MSE, actor loss with policy gradient).
	* Utility functions for forward passes, action selection, and backpropagation.

	---

	#### 2) `training.py`

	Provides the training loop and supporting components:

	* ReplayBuffer for experience storage and sampling.
	* Exploration noise injection to encourage policy exploration.
	* Iterative training loop that:

	* Interacts with the environment.
	* Stores experiences.
	* Periodically samples minibatches to update actor/critic networks.
	* Tracks and logs progress (episode rewards, critic/actor loss) with `tqdm`.

	---

	#### 3) `main.py`

	Serves as the entry point to run training:

	* Initializes the Booster Soccer Showdown environment via the SAI client.
	* Defines a Preprocessor to normalize and concatenate robot state, ball state, and environment info into a training-ready observation vector.
	* Instantiates a DDPG_FF model with custom architecture.
	* Defines an action function that rescales raw policy outputs to environment-specific action bounds.
	* Calls the training loop, and after training, supports:

	* `sai.watch(...)` for visualizing learned behavior.
	* `sai.benchmark(...)` for local benchmarking.

	---

	#### Example: Run Training

	```bash
	python training_scripts/main.py
	```

	This will:

	1. Build the environment.
	2. Initialize the model.
	3. Run the training loop with replay buffer and DDPG updates.
	4. Launch visualization and benchmarking after training.


	#### Example: Test pretrained model

	```bash
	python training_scripts/test.py --env LowerT1KickToTarget-v0
	```