| | --- |
| | tags: |
| | - reinforcement-learning |
| | - robotics |
| | - mujoco |
| | - locomotion |
| | - unitree |
| | - g1 |
| | - humanoid |
| | - sac |
| | - stable-baselines3 |
| | - strands-robots |
| | library_name: stable-baselines3 |
| | model-index: |
| | - name: SAC-Unitree-G1-MuJoCo |
| | results: |
| | - task: |
| | type: reinforcement-learning |
| | name: Humanoid Locomotion |
| | dataset: |
| | type: custom |
| | name: MuJoCo LocomotionEnv |
| | metrics: |
| | - type: mean_reward |
| | value: 530 |
| | name: Best Mean Reward |
| | - type: mean_distance |
| | value: 2.65 |
| | name: Mean Forward Distance (m) |
| | --- |
| | |
| | # SAC Unitree G1 β MuJoCo Locomotion Policy |
| |
|
| | A **Soft Actor-Critic (SAC)** policy trained for the Unitree G1 humanoid in MuJoCo simulation. Currently **learning to balance** β stays upright ~4 seconds and stumbles forward. |
| |
|
| | Trained entirely on a MacBook (CPU, no GPU, no Isaac Gym) using [strands-robots](https://github.com/cagataycali/strands-gtc-nvidia). |
| |
|
| | ## Results |
| |
|
| | | Metric | Value | |
| | |--------|-------| |
| | | Algorithm | SAC (Soft Actor-Critic) | |
| | | Training steps | 1.91M | |
| | | Training time | ~60 min (MacBook M-series, CPU) | |
| | | Parallel envs | 8 | |
| | | Network | MLP [256, 256] | |
| | | Best reward | **530** | |
| | | Mean distance | **2.65m** | |
| | | Episode length | ~200/1,000 (~4 seconds upright) | |
| | | Status | Balancing + stumbling forward | |
| |
|
| | ## Demo Video |
| |
|
| | <video src="https://huggingface.co/cagataydev/sac-unitree-g1-mujoco/resolve/main/g1_balancing.mp4" controls autoplay loop muted></video> |
| |
|
| | ## Why It's Hard |
| |
|
| | The G1 has **29 DOF** vs Go2's 12. Bipedal balance is fundamentally harder β the robot must coordinate hip, knee, ankle, and torso simultaneously while maintaining a tiny support polygon. |
| |
|
| | With more training (~5-10M steps, ~3 hours), it should learn to walk. |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | from stable_baselines3 import SAC |
| | |
| | model = SAC.load("best/best_model") |
| | |
| | obs, _ = env.reset() |
| | for _ in range(1000): |
| | action, _ = model.predict(obs, deterministic=True) |
| | obs, reward, done, truncated, info = env.step(action) |
| | ``` |
| |
|
| | ## Reward Function |
| |
|
| | ``` |
| | reward = forward_vel Γ 5.0 # primary: move forward |
| | + alive_bonus Γ 1.0 # stay upright |
| | + upright_reward Γ 0.3 # orientation bonus |
| | - ctrl_cost Γ 0.001 # minimize energy |
| | - lateral_penalty Γ 0.3 # don't drift sideways |
| | - smoothness Γ 0.0001 # discourage jerky motion |
| | ``` |
| |
|
| | ## Files |
| |
|
| | - `best/best_model.zip` β Best checkpoint |
| | - `checkpoints/` β All 100K-step checkpoints |
| | - `logs/evaluations.npz` β Evaluation metrics |
| | - `g1_balancing.mp4` β Demo video |
| |
|
| | ## Environment |
| |
|
| | - **Simulator**: MuJoCo (via mujoco-python) |
| | - **Robot**: Unitree G1 (29 DOF) from MuJoCo Menagerie |
| | - **Observation**: joint positions, velocities, torso orientation, height (87-dim) |
| | - **Action**: joint torques (29-dim, continuous) |
| |
|
| | ## License |
| |
|
| | Apache-2.0 |
| |
|