SAC G1 balancing policy - 1.91M steps, learning to balance
Browse files- .gitattributes +1 -0
- README.md +102 -0
- best/best_model.zip +3 -0
- checkpoints/sac_g1_1000000_steps.zip +3 -0
- checkpoints/sac_g1_100000_steps.zip +3 -0
- checkpoints/sac_g1_1100000_steps.zip +3 -0
- checkpoints/sac_g1_1200000_steps.zip +3 -0
- checkpoints/sac_g1_1300000_steps.zip +3 -0
- checkpoints/sac_g1_1400000_steps.zip +3 -0
- checkpoints/sac_g1_1500000_steps.zip +3 -0
- checkpoints/sac_g1_1600000_steps.zip +3 -0
- checkpoints/sac_g1_1700000_steps.zip +3 -0
- checkpoints/sac_g1_1800000_steps.zip +3 -0
- checkpoints/sac_g1_1900000_steps.zip +3 -0
- checkpoints/sac_g1_200000_steps.zip +3 -0
- checkpoints/sac_g1_300000_steps.zip +3 -0
- checkpoints/sac_g1_400000_steps.zip +3 -0
- checkpoints/sac_g1_500000_steps.zip +3 -0
- checkpoints/sac_g1_600000_steps.zip +3 -0
- checkpoints/sac_g1_700000_steps.zip +3 -0
- checkpoints/sac_g1_800000_steps.zip +3 -0
- checkpoints/sac_g1_900000_steps.zip +3 -0
- g1_balancing.mp4 +3 -0
- logs/evaluations.npz +3 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
g1_balancing.mp4 filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,102 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- reinforcement-learning
|
| 4 |
+
- robotics
|
| 5 |
+
- mujoco
|
| 6 |
+
- locomotion
|
| 7 |
+
- unitree
|
| 8 |
+
- g1
|
| 9 |
+
- humanoid
|
| 10 |
+
- sac
|
| 11 |
+
- stable-baselines3
|
| 12 |
+
- strands-robots
|
| 13 |
+
library_name: stable-baselines3
|
| 14 |
+
model-index:
|
| 15 |
+
- name: SAC-Unitree-G1-MuJoCo
|
| 16 |
+
results:
|
| 17 |
+
- task:
|
| 18 |
+
type: reinforcement-learning
|
| 19 |
+
name: Humanoid Locomotion
|
| 20 |
+
dataset:
|
| 21 |
+
type: custom
|
| 22 |
+
name: MuJoCo LocomotionEnv
|
| 23 |
+
metrics:
|
| 24 |
+
- type: mean_reward
|
| 25 |
+
value: 530
|
| 26 |
+
name: Best Mean Reward
|
| 27 |
+
- type: mean_distance
|
| 28 |
+
value: 2.65
|
| 29 |
+
name: Mean Forward Distance (m)
|
| 30 |
+
---
|
| 31 |
+
|
| 32 |
+
# SAC Unitree G1 — MuJoCo Locomotion Policy
|
| 33 |
+
|
| 34 |
+
A **Soft Actor-Critic (SAC)** policy trained for the Unitree G1 humanoid in MuJoCo simulation. Currently **learning to balance** — stays upright ~4 seconds and stumbles forward.
|
| 35 |
+
|
| 36 |
+
Trained entirely on a MacBook (CPU, no GPU, no Isaac Gym) using [strands-robots](https://github.com/cagataycali/strands-gtc-nvidia).
|
| 37 |
+
|
| 38 |
+
## Results
|
| 39 |
+
|
| 40 |
+
| Metric | Value |
|
| 41 |
+
|--------|-------|
|
| 42 |
+
| Algorithm | SAC (Soft Actor-Critic) |
|
| 43 |
+
| Training steps | 1.91M |
|
| 44 |
+
| Training time | ~60 min (MacBook M-series, CPU) |
|
| 45 |
+
| Parallel envs | 8 |
|
| 46 |
+
| Network | MLP [256, 256] |
|
| 47 |
+
| Best reward | **530** |
|
| 48 |
+
| Mean distance | **2.65m** |
|
| 49 |
+
| Episode length | ~200/1,000 (~4 seconds upright) |
|
| 50 |
+
| Status | Balancing + stumbling forward |
|
| 51 |
+
|
| 52 |
+
## Demo Video
|
| 53 |
+
|
| 54 |
+
See `g1_balancing.mp4` — the G1 attempting to balance and walk in MuJoCo.
|
| 55 |
+
|
| 56 |
+
## Why It's Hard
|
| 57 |
+
|
| 58 |
+
The G1 has **29 DOF** vs Go2's 12. Bipedal balance is fundamentally harder — the robot must coordinate hip, knee, ankle, and torso simultaneously while maintaining a tiny support polygon.
|
| 59 |
+
|
| 60 |
+
With more training (~5-10M steps, ~3 hours), it should learn to walk.
|
| 61 |
+
|
| 62 |
+
## Usage
|
| 63 |
+
|
| 64 |
+
```python
|
| 65 |
+
from stable_baselines3 import SAC
|
| 66 |
+
|
| 67 |
+
model = SAC.load("best/best_model")
|
| 68 |
+
|
| 69 |
+
obs, _ = env.reset()
|
| 70 |
+
for _ in range(1000):
|
| 71 |
+
action, _ = model.predict(obs, deterministic=True)
|
| 72 |
+
obs, reward, done, truncated, info = env.step(action)
|
| 73 |
+
```
|
| 74 |
+
|
| 75 |
+
## Reward Function
|
| 76 |
+
|
| 77 |
+
```
|
| 78 |
+
reward = forward_vel × 5.0 # primary: move forward
|
| 79 |
+
+ alive_bonus × 1.0 # stay upright
|
| 80 |
+
+ upright_reward × 0.3 # orientation bonus
|
| 81 |
+
- ctrl_cost × 0.001 # minimize energy
|
| 82 |
+
- lateral_penalty × 0.3 # don't drift sideways
|
| 83 |
+
- smoothness × 0.0001 # discourage jerky motion
|
| 84 |
+
```
|
| 85 |
+
|
| 86 |
+
## Files
|
| 87 |
+
|
| 88 |
+
- `best/best_model.zip` — Best checkpoint
|
| 89 |
+
- `checkpoints/` — All 100K-step checkpoints
|
| 90 |
+
- `logs/evaluations.npz` — Evaluation metrics
|
| 91 |
+
- `g1_balancing.mp4` — Demo video
|
| 92 |
+
|
| 93 |
+
## Environment
|
| 94 |
+
|
| 95 |
+
- **Simulator**: MuJoCo (via mujoco-python)
|
| 96 |
+
- **Robot**: Unitree G1 (29 DOF) from MuJoCo Menagerie
|
| 97 |
+
- **Observation**: joint positions, velocities, torso orientation, height (87-dim)
|
| 98 |
+
- **Action**: joint torques (29-dim, continuous)
|
| 99 |
+
|
| 100 |
+
## License
|
| 101 |
+
|
| 102 |
+
Apache-2.0
|
best/best_model.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9cb15b7292a646e6006013c0f5a4690d1ad5027083a77891607b123e56946d95
|
| 3 |
+
size 4149867
|
checkpoints/sac_g1_1000000_steps.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8ab5b8bfceab619c0db98044ad448cce0c0989b0e7e762622d18c643a03dbb6d
|
| 3 |
+
size 4149858
|
checkpoints/sac_g1_100000_steps.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9e83947df6948d3ba27c73adda5085f211ed7860016b6a25874ae29db20b4dc0
|
| 3 |
+
size 4149855
|
checkpoints/sac_g1_1100000_steps.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d3f1de8747ddfea37f610a9c0ee69cb16c123a365a1fe75a85d0a4f35d0cc315
|
| 3 |
+
size 4149858
|
checkpoints/sac_g1_1200000_steps.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0a88d00ed69afb4ee86e351645de936b8355ce92720e51a9c5782c34fa8fbed4
|
| 3 |
+
size 4149858
|
checkpoints/sac_g1_1300000_steps.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:be930413a97e485059f7aa7450f7ee0cd540c51c14447139d2bd90e60931e993
|
| 3 |
+
size 4149858
|
checkpoints/sac_g1_1400000_steps.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2511b3164819d1fd317a31628911cfbbd905f41681c7644631346612abce2daa
|
| 3 |
+
size 4149858
|
checkpoints/sac_g1_1500000_steps.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d29e8b66d4fea3c6387bbf269e770e508991888b1f998384c7ac3dcd78adac01
|
| 3 |
+
size 4149858
|
checkpoints/sac_g1_1600000_steps.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2cef573f5af407e987cf156b18334bc910e09c7b1e9ffd05f44757ae2874dd2e
|
| 3 |
+
size 4149858
|
checkpoints/sac_g1_1700000_steps.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a4b6d1081f3fc8d7e0d1211baf3046a0608b0286f8927330cbff2b57d4f91bf6
|
| 3 |
+
size 4149858
|
checkpoints/sac_g1_1800000_steps.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:036be3dfd0bebdd11877c3317efbfc7a642e78f65c135d430d9e7232d32c7e42
|
| 3 |
+
size 4149858
|
checkpoints/sac_g1_1900000_steps.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a51f5d225483aca91b3756396a1240db6223538eefe9ef5cf137390e48003927
|
| 3 |
+
size 4149867
|
checkpoints/sac_g1_200000_steps.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:697011f06fc560071e62ad698367474c9826628b10edcef70f3c54d7d1340cac
|
| 3 |
+
size 4149855
|
checkpoints/sac_g1_300000_steps.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4280e1c4d69f76147992f78ce7435563e45f48df9bb9ef42e9c340cf249b5750
|
| 3 |
+
size 4149856
|
checkpoints/sac_g1_400000_steps.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0812dc1a31801935b29fe168a51ab6e4777011c94c2f3f9fcd36069bc92d86b1
|
| 3 |
+
size 4149856
|
checkpoints/sac_g1_500000_steps.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1b4a8321660b6772888b08e7efba0cd814b4961d2ff1e609844e3487fa53c6f9
|
| 3 |
+
size 4149856
|
checkpoints/sac_g1_600000_steps.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b0c0c81e54c14bff79f44c526101618abae0495f33108faff0f2b2f83bc37993
|
| 3 |
+
size 4149865
|
checkpoints/sac_g1_700000_steps.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a2dad3e970cb109a6882d5699d9ec66e4a6bd37286750ec5628f03451cc10975
|
| 3 |
+
size 4149856
|
checkpoints/sac_g1_800000_steps.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e8f750b460f93cbf527c4a2c837fe80dfb1b435f45fd4e2759b5b7833af6da5a
|
| 3 |
+
size 4149856
|
checkpoints/sac_g1_900000_steps.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8b1926c26473d97870cfe4551ddd9219dbfc28be55bc326dad4c4c7d482c5218
|
| 3 |
+
size 4149857
|
g1_balancing.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f182d7a40f3f33391ad1e1d3a627d4f713ab90625afb0c6bf0caa36c27d59d80
|
| 3 |
+
size 1532731
|
logs/evaluations.npz
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b70027a19be27c7a415624c0c1605547588dfeaa27feb1c1201b5c3ff40c3ac2
|
| 3 |
+
size 17578
|