Go2+Z1 Standup Recovery Policy (RL, full 18-DOF)
PPO policy that lets the Unitree Go2 + Z1 composite robot recover from arbitrary fallen poses by using its legs and Z1 arm together as a self-righting kinematic chain, then automatically folds the Z1 back to the carry pose once the trunk is upright.
Behaviour
- Reset spawns the robot at
trunk_z = 0.18 mwith random orientation (full quaternion sampled byreset_root_state_with_random_orientation) - Policy commands all 18 joints (12 leg + 6 arm) for up to 5 s
- Reward favours: trunk lifted, projected gravity aligned with -Z, Z1 close to its
Z1_FOLDED_DEFAULTpose, sparse +10 success bonus when all three are satisfied - Episode ends if
trunk_z < 0.05 m(collapsed) or time-out
Highlights
- 4096 parallel envs × 3000 PPO iters
- Mean reward climbs from 0.6 (random) → 89+ (final)
standup_successsparse bonus reaches 8.6 / episode (≈86 % of timesteps satisfy the success criterion)trunk_collapsedtermination rate ≈ 0 — robot does not give up
Architecture
Same rsl-rl actor-critic shape as our walking policies (3-layer MLP 512-256-128 ELU). Action dim = 18.
Reward composition
trunk_height_reward weight +5.0 # clamp(z / 0.32, 0, 1)
upright_alignment weight +3.0 # clamp(-projected_gravity_b[2], 0, 1)
z1_fold weight +2.0 # exp(-||z1_pos - Z1_FOLDED_DEFAULT||)
standup_success (sparse) weight +10.0 # 1 if z>0.28 ∧ upright>0.92 ∧ fold_err<0.3
action_rate_l2 weight -0.005
joint_acc_l2 weight -2.5e-7
joint_torques_l2 weight -1e-5
Files
standup_v1.pt— rsl-rlOnPolicyRunnercheckpoint
Usage
import torch, torch.nn as nn
state = torch.load("standup_v1.pt", map_location="cuda:0", weights_only=False)
sd = state["actor_state_dict"]
h, obs_dim = sd["mlp.0.weight"].shape[0], sd["mlp.0.weight"].shape[1]
act_dim = sd["mlp.6.weight"].shape[0] # 18 for full-body recovery
actor = nn.Sequential(
nn.Linear(obs_dim, h), nn.ELU(),
nn.Linear(h, h), nn.ELU(),
nn.Linear(h, h), nn.ELU(),
nn.Linear(h, act_dim),
).cuda().eval()
actor.load_state_dict({k.replace("mlp.", ""): v for k, v in sd.items() if k.startswith("mlp.")})
For a full integration example (drop fallen robot into warehouse, run standup, verify Z1 auto-folds), see stage4_joint_eval/standup_recovery.py.
Training data
On-policy RL — no offline dataset. The Isaac Lab task is registered as Isaac-Standup-Go2Z1-v0 and lives at:
- Repo: https://github.com/aws300/go2_z1_warehouse
- Task config:
go2_z1_warehouse/stage5_standup/standup_env_cfg.py - Training launcher:
go2_z1_warehouse/stage5_standup/train_launcher.py
Citation
@misc{go2z1-standup-v1,
title = {Go2+Z1 Standup Recovery Policy (RL, full 18-DOF)},
author = {m3},
year = {2026},
url = {https://huggingface.co/m3/go2z1-standup-rl-v1}
}