Go2+Z1 Standup Recovery Policy (RL, full 18-DOF)

PPO policy that lets the Unitree Go2 + Z1 composite robot recover from arbitrary fallen poses by using its legs and Z1 arm together as a self-righting kinematic chain, then automatically folds the Z1 back to the carry pose once the trunk is upright.

Behaviour

Reset spawns the robot at trunk_z = 0.18 m with random orientation (full quaternion sampled by reset_root_state_with_random_orientation)
Policy commands all 18 joints (12 leg + 6 arm) for up to 5 s
Reward favours: trunk lifted, projected gravity aligned with -Z, Z1 close to its Z1_FOLDED_DEFAULT pose, sparse +10 success bonus when all three are satisfied
Episode ends if trunk_z < 0.05 m (collapsed) or time-out

Highlights

4096 parallel envs × 3000 PPO iters
Mean reward climbs from 0.6 (random) → 89+ (final)
standup_success sparse bonus reaches 8.6 / episode (≈86 % of timesteps satisfy the success criterion)
trunk_collapsed termination rate ≈ 0 — robot does not give up

Architecture

Same rsl-rl actor-critic shape as our walking policies (3-layer MLP 512-256-128 ELU). Action dim = 18.

Reward composition

trunk_height_reward      weight  +5.0   # clamp(z / 0.32, 0, 1)
upright_alignment        weight  +3.0   # clamp(-projected_gravity_b[2], 0, 1)
z1_fold                  weight  +2.0   # exp(-||z1_pos - Z1_FOLDED_DEFAULT||)
standup_success (sparse) weight +10.0   # 1 if z>0.28 ∧ upright>0.92 ∧ fold_err<0.3
action_rate_l2           weight -0.005
joint_acc_l2             weight -2.5e-7
joint_torques_l2         weight -1e-5

Files

standup_v1.pt — rsl-rl OnPolicyRunner checkpoint

Usage

import torch, torch.nn as nn

state = torch.load("standup_v1.pt", map_location="cuda:0", weights_only=False)
sd = state["actor_state_dict"]
h, obs_dim = sd["mlp.0.weight"].shape[0], sd["mlp.0.weight"].shape[1]
act_dim = sd["mlp.6.weight"].shape[0]   # 18 for full-body recovery
actor = nn.Sequential(
    nn.Linear(obs_dim, h), nn.ELU(),
    nn.Linear(h, h), nn.ELU(),
    nn.Linear(h, h), nn.ELU(),
    nn.Linear(h, act_dim),
).cuda().eval()
actor.load_state_dict({k.replace("mlp.", ""): v for k, v in sd.items() if k.startswith("mlp.")})

For a full integration example (drop fallen robot into warehouse, run standup, verify Z1 auto-folds), see stage4_joint_eval/standup_recovery.py.

Training data

On-policy RL — no offline dataset. The Isaac Lab task is registered as Isaac-Standup-Go2Z1-v0 and lives at:

Repo: https://github.com/aws300/go2_z1_warehouse
Task config: go2_z1_warehouse/stage5_standup/standup_env_cfg.py
Training launcher: go2_z1_warehouse/stage5_standup/train_launcher.py

Citation

@misc{go2z1-standup-v1,
  title  = {Go2+Z1 Standup Recovery Policy (RL, full 18-DOF)},
  author = {m3},
  year   = {2026},
  url    = {https://huggingface.co/m3/go2z1-standup-rl-v1}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning