cagataydev commited on
Commit
94c56bc
·
verified ·
1 Parent(s): 8c0f9bf

SAC G1 balancing policy - 1.91M steps, learning to balance

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ g1_balancing.mp4 filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - reinforcement-learning
4
+ - robotics
5
+ - mujoco
6
+ - locomotion
7
+ - unitree
8
+ - g1
9
+ - humanoid
10
+ - sac
11
+ - stable-baselines3
12
+ - strands-robots
13
+ library_name: stable-baselines3
14
+ model-index:
15
+ - name: SAC-Unitree-G1-MuJoCo
16
+ results:
17
+ - task:
18
+ type: reinforcement-learning
19
+ name: Humanoid Locomotion
20
+ dataset:
21
+ type: custom
22
+ name: MuJoCo LocomotionEnv
23
+ metrics:
24
+ - type: mean_reward
25
+ value: 530
26
+ name: Best Mean Reward
27
+ - type: mean_distance
28
+ value: 2.65
29
+ name: Mean Forward Distance (m)
30
+ ---
31
+
32
+ # SAC Unitree G1 — MuJoCo Locomotion Policy
33
+
34
+ A **Soft Actor-Critic (SAC)** policy trained for the Unitree G1 humanoid in MuJoCo simulation. Currently **learning to balance** — stays upright ~4 seconds and stumbles forward.
35
+
36
+ Trained entirely on a MacBook (CPU, no GPU, no Isaac Gym) using [strands-robots](https://github.com/cagataycali/strands-gtc-nvidia).
37
+
38
+ ## Results
39
+
40
+ | Metric | Value |
41
+ |--------|-------|
42
+ | Algorithm | SAC (Soft Actor-Critic) |
43
+ | Training steps | 1.91M |
44
+ | Training time | ~60 min (MacBook M-series, CPU) |
45
+ | Parallel envs | 8 |
46
+ | Network | MLP [256, 256] |
47
+ | Best reward | **530** |
48
+ | Mean distance | **2.65m** |
49
+ | Episode length | ~200/1,000 (~4 seconds upright) |
50
+ | Status | Balancing + stumbling forward |
51
+
52
+ ## Demo Video
53
+
54
+ See `g1_balancing.mp4` — the G1 attempting to balance and walk in MuJoCo.
55
+
56
+ ## Why It's Hard
57
+
58
+ The G1 has **29 DOF** vs Go2's 12. Bipedal balance is fundamentally harder — the robot must coordinate hip, knee, ankle, and torso simultaneously while maintaining a tiny support polygon.
59
+
60
+ With more training (~5-10M steps, ~3 hours), it should learn to walk.
61
+
62
+ ## Usage
63
+
64
+ ```python
65
+ from stable_baselines3 import SAC
66
+
67
+ model = SAC.load("best/best_model")
68
+
69
+ obs, _ = env.reset()
70
+ for _ in range(1000):
71
+ action, _ = model.predict(obs, deterministic=True)
72
+ obs, reward, done, truncated, info = env.step(action)
73
+ ```
74
+
75
+ ## Reward Function
76
+
77
+ ```
78
+ reward = forward_vel × 5.0 # primary: move forward
79
+ + alive_bonus × 1.0 # stay upright
80
+ + upright_reward × 0.3 # orientation bonus
81
+ - ctrl_cost × 0.001 # minimize energy
82
+ - lateral_penalty × 0.3 # don't drift sideways
83
+ - smoothness × 0.0001 # discourage jerky motion
84
+ ```
85
+
86
+ ## Files
87
+
88
+ - `best/best_model.zip` — Best checkpoint
89
+ - `checkpoints/` — All 100K-step checkpoints
90
+ - `logs/evaluations.npz` — Evaluation metrics
91
+ - `g1_balancing.mp4` — Demo video
92
+
93
+ ## Environment
94
+
95
+ - **Simulator**: MuJoCo (via mujoco-python)
96
+ - **Robot**: Unitree G1 (29 DOF) from MuJoCo Menagerie
97
+ - **Observation**: joint positions, velocities, torso orientation, height (87-dim)
98
+ - **Action**: joint torques (29-dim, continuous)
99
+
100
+ ## License
101
+
102
+ Apache-2.0
best/best_model.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9cb15b7292a646e6006013c0f5a4690d1ad5027083a77891607b123e56946d95
3
+ size 4149867
checkpoints/sac_g1_1000000_steps.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8ab5b8bfceab619c0db98044ad448cce0c0989b0e7e762622d18c643a03dbb6d
3
+ size 4149858
checkpoints/sac_g1_100000_steps.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e83947df6948d3ba27c73adda5085f211ed7860016b6a25874ae29db20b4dc0
3
+ size 4149855
checkpoints/sac_g1_1100000_steps.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d3f1de8747ddfea37f610a9c0ee69cb16c123a365a1fe75a85d0a4f35d0cc315
3
+ size 4149858
checkpoints/sac_g1_1200000_steps.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0a88d00ed69afb4ee86e351645de936b8355ce92720e51a9c5782c34fa8fbed4
3
+ size 4149858
checkpoints/sac_g1_1300000_steps.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:be930413a97e485059f7aa7450f7ee0cd540c51c14447139d2bd90e60931e993
3
+ size 4149858
checkpoints/sac_g1_1400000_steps.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2511b3164819d1fd317a31628911cfbbd905f41681c7644631346612abce2daa
3
+ size 4149858
checkpoints/sac_g1_1500000_steps.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d29e8b66d4fea3c6387bbf269e770e508991888b1f998384c7ac3dcd78adac01
3
+ size 4149858
checkpoints/sac_g1_1600000_steps.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2cef573f5af407e987cf156b18334bc910e09c7b1e9ffd05f44757ae2874dd2e
3
+ size 4149858
checkpoints/sac_g1_1700000_steps.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a4b6d1081f3fc8d7e0d1211baf3046a0608b0286f8927330cbff2b57d4f91bf6
3
+ size 4149858
checkpoints/sac_g1_1800000_steps.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:036be3dfd0bebdd11877c3317efbfc7a642e78f65c135d430d9e7232d32c7e42
3
+ size 4149858
checkpoints/sac_g1_1900000_steps.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a51f5d225483aca91b3756396a1240db6223538eefe9ef5cf137390e48003927
3
+ size 4149867
checkpoints/sac_g1_200000_steps.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:697011f06fc560071e62ad698367474c9826628b10edcef70f3c54d7d1340cac
3
+ size 4149855
checkpoints/sac_g1_300000_steps.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4280e1c4d69f76147992f78ce7435563e45f48df9bb9ef42e9c340cf249b5750
3
+ size 4149856
checkpoints/sac_g1_400000_steps.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0812dc1a31801935b29fe168a51ab6e4777011c94c2f3f9fcd36069bc92d86b1
3
+ size 4149856
checkpoints/sac_g1_500000_steps.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1b4a8321660b6772888b08e7efba0cd814b4961d2ff1e609844e3487fa53c6f9
3
+ size 4149856
checkpoints/sac_g1_600000_steps.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b0c0c81e54c14bff79f44c526101618abae0495f33108faff0f2b2f83bc37993
3
+ size 4149865
checkpoints/sac_g1_700000_steps.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a2dad3e970cb109a6882d5699d9ec66e4a6bd37286750ec5628f03451cc10975
3
+ size 4149856
checkpoints/sac_g1_800000_steps.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e8f750b460f93cbf527c4a2c837fe80dfb1b435f45fd4e2759b5b7833af6da5a
3
+ size 4149856
checkpoints/sac_g1_900000_steps.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8b1926c26473d97870cfe4551ddd9219dbfc28be55bc326dad4c4c7d482c5218
3
+ size 4149857
g1_balancing.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f182d7a40f3f33391ad1e1d3a627d4f713ab90625afb0c6bf0caa36c27d59d80
3
+ size 1532731
logs/evaluations.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b70027a19be27c7a415624c0c1605547588dfeaa27feb1c1201b5c3ff40c3ac2
3
+ size 17578