SaiResearch commited on
Commit
2a517dc
·
verified ·
1 Parent(s): d9501fc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +154 -1
README.md CHANGED
@@ -6,4 +6,157 @@ pipeline_tag: reinforcement-learning
6
  tags:
7
  - robotics
8
  - reinforcement_learning
9
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  tags:
7
  - robotics
8
  - reinforcement_learning
9
+ - humanoid
10
+ - soccer
11
+ - sai
12
+ - mujoco
13
+ ---
14
+
15
+ ## Model Details
16
+
17
+ ### Model Description
18
+
19
+ This repository hosts the **Booster Soccer Controller Suite** — a collection of reinforcement learning policies and controllers powering humanoid agents in the [**Booster Soccer Showdown**](https://competesai.com/competitions/cmp_xnSCxcJXQclQ).
20
+
21
+ It contains:
22
+ 1. **Low-Level Controller (robot/):**
23
+ A proprioceptive policy for the **Lower T1** humanoid that converts high-level commands (forward, lateral, and yaw velocities) into joint angle targets.
24
+ 2. **Competition Policies (model/):**
25
+ High-level agents trained in SAI’s soccer environments that output those high-level commands for match-time play.
26
+
27
+ - **Developed by:** ArenaX Labs
28
+ - **License:** MIT
29
+ - **Frameworks:** PyTorch · MuJoCo · Stable-Baselines3
30
+ - **Environments:** Booster Gym / SAI Soccer tasks
31
+
32
+
33
+ ## Testing Instructions
34
+
35
+ 1. **Clone the repo**
36
+
37
+ ```bash
38
+ git clone https://github.com/ArenaX-Labs/booster_soccer_showdown.git
39
+ cd booster_soccer_showdown
40
+ ```
41
+
42
+ 2. **Create & activate a Python 3.10+ environment**
43
+
44
+ ```bash
45
+ # any env manager is fine; here are a few options
46
+ # --- venv ---
47
+ python3 -m venv .venv
48
+ source .venv/bin/activate # Windows: .venv\Scripts\activate
49
+
50
+ # --- conda ---
51
+ # conda create -n booster-ssl python=3.11 -y && conda activate booster-ssl
52
+ ```
53
+
54
+ 3. **Install dependencies**
55
+
56
+ ```bash
57
+ pip install -r requirements.txt
58
+ ```
59
+
60
+ ---
61
+
62
+ ### Teleoperation
63
+
64
+ Booster Soccer Showdown supports keyboard teleop out of the box.
65
+
66
+ ```bash
67
+ python booster_control/teleoperate.py \
68
+ --env LowerT1GoaliePenaltyKick-v0
69
+ ```
70
+
71
+ **Default bindings (example):**
72
+
73
+ * `W/S`: move forward/backward
74
+ * `A/D`: move left/right
75
+ * `Q/E`: rotate left/right
76
+ * `L`: reset commands
77
+ * `P`: reset environment
78
+
79
+ ---
80
+
81
+ ⚠️ **Note for macOS and Windows users**
82
+ Because different renderers are used on macOS and Windows, you may need to adjust the **position** and **rotation** sensitivity for smooth teleoperation.
83
+ Run the following command with the sensitivity flags set explicitly:
84
+
85
+ ```bash
86
+ python booster_control/teleoperate.py \
87
+ --env LowerT1GoaliePenaltyKick-v0 \
88
+ --pos_sensitivity 1.5 \
89
+ --rot_sensitivity 1.5
90
+ ```
91
+
92
+ (Tune `--pos_sensitivity` and `--rot_sensitivity` as needed for your setup.)
93
+
94
+ ---
95
+
96
+ ### Training
97
+
98
+ We provide a minimal reinforcement learning pipeline for training agents with **Deep Deterministic Policy Gradient (DDPG)** in the Booster Soccer Showdown environments in the `training_scripts/` folder. The training stack consists of three scripts:
99
+
100
+ #### 1) `ddpg.py`
101
+
102
+ Defines the **DDPG_FF model**, including:
103
+
104
+ * Actor and Critic neural networks with configurable hidden layers and activation functions.
105
+ * Target networks and soft-update mechanism for stability.
106
+ * Training step implementation (critic loss with MSE, actor loss with policy gradient).
107
+ * Utility functions for forward passes, action selection, and backpropagation.
108
+
109
+ ---
110
+
111
+ #### 2) `training.py`
112
+
113
+ Provides the **training loop** and supporting components:
114
+
115
+ * **ReplayBuffer** for experience storage and sampling.
116
+ * **Exploration noise** injection to encourage policy exploration.
117
+ * Iterative training loop that:
118
+
119
+ * Interacts with the environment.
120
+ * Stores experiences.
121
+ * Periodically samples minibatches to update actor/critic networks.
122
+ * Tracks and logs progress (episode rewards, critic/actor loss) with `tqdm`.
123
+
124
+ ---
125
+
126
+ #### 3) `main.py`
127
+
128
+ Serves as the **entry point** to run training:
129
+
130
+ * Initializes the Booster Soccer Showdown environment via the **SAI client**.
131
+ * Defines a **Preprocessor** to normalize and concatenate robot state, ball state, and environment info into a training-ready observation vector.
132
+ * Instantiates a **DDPG_FF model** with custom architecture.
133
+ * Defines an **action function** that rescales raw policy outputs to environment-specific action bounds.
134
+ * Calls the training loop, and after training, supports:
135
+
136
+ * `sai.watch(...)` for visualizing learned behavior.
137
+ * `sai.benchmark(...)` for local benchmarking.
138
+
139
+ ---
140
+
141
+ #### Example: Run Training
142
+
143
+ ```bash
144
+ python training_scripts/main.py
145
+ ```
146
+
147
+ This will:
148
+
149
+ 1. Build the environment.
150
+ 2. Initialize the model.
151
+ 3. Run the training loop with replay buffer and DDPG updates.
152
+ 4. Launch visualization and benchmarking after training.
153
+
154
+
155
+ #### Example: Test pretrained model
156
+
157
+ ```bash
158
+ python training_scripts/test.py --env LowerT1KickToTarget-v0
159
+ ```
160
+
161
+
162
+