cagataydev's picture
Add video preview embed
c33a9e2 verified
metadata
tags:
  - reinforcement-learning
  - robotics
  - mujoco
  - locomotion
  - unitree
  - g1
  - humanoid
  - sac
  - stable-baselines3
  - strands-robots
library_name: stable-baselines3
model-index:
  - name: SAC-Unitree-G1-MuJoCo
    results:
      - task:
          type: reinforcement-learning
          name: Humanoid Locomotion
        dataset:
          type: custom
          name: MuJoCo LocomotionEnv
        metrics:
          - type: mean_reward
            value: 530
            name: Best Mean Reward
          - type: mean_distance
            value: 2.65
            name: Mean Forward Distance (m)

SAC Unitree G1 — MuJoCo Locomotion Policy

A Soft Actor-Critic (SAC) policy trained for the Unitree G1 humanoid in MuJoCo simulation. Currently learning to balance — stays upright ~4 seconds and stumbles forward.

Trained entirely on a MacBook (CPU, no GPU, no Isaac Gym) using strands-robots.

Results

Metric Value
Algorithm SAC (Soft Actor-Critic)
Training steps 1.91M
Training time ~60 min (MacBook M-series, CPU)
Parallel envs 8
Network MLP [256, 256]
Best reward 530
Mean distance 2.65m
Episode length 200/1,000 (4 seconds upright)
Status Balancing + stumbling forward

Demo Video

Why It's Hard

The G1 has 29 DOF vs Go2's 12. Bipedal balance is fundamentally harder — the robot must coordinate hip, knee, ankle, and torso simultaneously while maintaining a tiny support polygon.

With more training (~5-10M steps, ~3 hours), it should learn to walk.

Usage

from stable_baselines3 import SAC

model = SAC.load("best/best_model")

obs, _ = env.reset()
for _ in range(1000):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)

Reward Function

reward = forward_vel × 5.0       # primary: move forward
       + alive_bonus × 1.0       # stay upright
       + upright_reward × 0.3    # orientation bonus
       - ctrl_cost × 0.001       # minimize energy
       - lateral_penalty × 0.3   # don't drift sideways
       - smoothness × 0.0001     # discourage jerky motion

Files

  • best/best_model.zip — Best checkpoint
  • checkpoints/ — All 100K-step checkpoints
  • logs/evaluations.npz — Evaluation metrics
  • g1_balancing.mp4 — Demo video

Environment

  • Simulator: MuJoCo (via mujoco-python)
  • Robot: Unitree G1 (29 DOF) from MuJoCo Menagerie
  • Observation: joint positions, velocities, torso orientation, height (87-dim)
  • Action: joint torques (29-dim, continuous)

License

Apache-2.0