HumanoidBench-TD-MPC2 · 自训通关 checkpoints

Self-trained TD-MPC2 checkpoints on HumanoidBench locomotion tasks.

🛠 训练源码 / Training source: https://github.com/vitorcen/humanoid-training 完整训练脚本、patches、eval harness、分析文档全在 GitHub 配套仓库。 Full training scripts, patches, eval harness, and analysis docs in the companion GitHub repo.

TD-MPC2 是 model-based RL 算法，结合 world model + sample-based MPC planning。本仓库收录在 HumanoidBench 上从零自训的 checkpoints。

TD-MPC2 is a model-based RL algorithm combining a world model with sample-based MPC planning. This repo hosts checkpoints trained from scratch on HumanoidBench tasks.

📊 性能 / Performance

Task	success_rate	mean_return	N	mean_steps	备注
`h1-walk-v0`	100%	816.7	3	1000/1000	训练全程稳定，从 step 800k 起 success=100%
`g1-walk-v0`	50%	601.7 ± 271.1	6	755/1000	高方差，1/6 集 early fall

success_bar = 700 (HumanoidBench locomotion threshold). Success = episode return ≥ success_bar.

🎬 演示 / Demos

H1-walk-v0 (Unitree H1, 19 DoF)

完整 walking cycle，1000 步不倒。Full walking cycle, runs through 1000 steps without falling.

G1-walk-v0 (Unitree G1, 23 DoF with PD + BlockedHands wrappers)

37D action 屏蔽 14D 手指剩 23D，配 PD 位置控制；含偶发踉跄但 50% 集 ≥ success_bar。 With 14D fingers masked (37→23 act dim) + PD position control; occasional stumbles but 50% of eps clear the success bar.

对比同任务 DR.Q 自训 (wsagi/HumanoidBench-DR.Q)

Task	Algo	Final step	mean_return	success_rate
h1-walk-v0	DR.Q	500k	801	90% (N=10)
h1-walk-v0	TD-MPC2 (this)	950k	817	100% (N=3) ⭐
g1-walk-v0	DR.Q PDBH	500k	711	70% (N=10)
g1-walk-v0	TD-MPC2 PDBH (this)	950k	602	50% (N=6)

结论：TD-MPC2 在 h1-walk 上略胜 DR.Q（同 step 范围，更稳定）；在更难的 g1-walk (37D + PDBH wrappers) 上落后于 DR.Q，但仍满足 ≥30% 通关阈值。

TD-MPC2 slightly outperforms DR.Q on h1-walk (more stable); falls behind DR.Q on the harder g1-walk task but still passes the 30% threshold.

🔧 训练配置 / Training config

Task	Robot	act_dim	Wrappers	Steps	Hardware	Wall time
`h1-walk-v0`	Unitree H1	19	none	1M	4090 24GB	~24h
`g1-walk-v0`	Unitree G1	23	PD + BlockedHands	1M	AutoDL 4080S 32GB	~22h (3-seed parallel)

Algorithm: TD-MPC2 model_size=5 (small, ~16M params)
Seed: 0 for h1-walk; 0 for g1-walk (best of 3 seeds 0/10/20, multi-seed parallel on same GPU)
Multi-seed parallel pattern: see feedback_tdmpc2_multiseed.md — 3 seeds time-slice one GPU, util 15% → 98%, total throughput 2.7×

Patches applied to upstream submodules

Both required for G1-walk — torque-only G1 will not learn to walk (memory record):

patches/g1-pos-control.patch — replaces torque actuators with PD position actuators
patches/humanoid-bench-g1-and-lazy.patch — BlockedHands wrapper to freeze 14 finger DoFs (irrelevant noise for walk task)
patches/tdmpc2-save-agent.patch — fixes upstream TD-MPC2 to actually save weights every eval (the only patch required for h1-walk)

Apply with bash patches/apply.sh from the training repo.

🚀 推理 / Inference

完整 deterministic eval + GUI viewer 脚本：

scripts/tdmpc2_eval.py — N-ep JSONL eval (headless)
scripts/tdmpc2_viewer.py — GUI viewer (GLFW)

均在配套 GitHub 仓库。

# headless N=10 eval
DISPLAY=:0 conda run -n humanoidbench python scripts/tdmpc2_eval.py \
    --task humanoid_g1-walk-v0 \
    --ckpt TDMPC2+HBench-g1-walk-v0+0/step_950000.pt \
    --seed 0 --eval 10 --out g1_eval.jsonl

# GUI replay
DISPLAY=:0 conda run -n humanoidbench python scripts/tdmpc2_viewer.py \
    --task humanoid_g1-walk-v0 \
    --ckpt TDMPC2+HBench-g1-walk-v0+0/step_950000.pt \
    --seed 0 --fps 50

📁 仓库结构 / Repo layout

HumanoidBench-TD-MPC2/
├── README.md                                  (this file)
├── assets/
│   ├── tdmpc2-h1-walk.mp4                     (515 KB — H1-walk GUI recording)
│   └── tdmpc2-g1-walk.mp4                     (257 KB — G1-walk GUI recording)
├── TDMPC2+HBench-h1-walk-v0+0/
│   ├── step_950000.pt                         (32 MB — agent + world model + critic)
│   ├── train.log                              (~370 KB — full training log)
│   └── ckpt_eval.csv                          (auto-eval per ckpt, N=3 quick)
└── TDMPC2+HBench-g1-walk-v0+0/
    ├── step_950000.pt                         (32 MB)
    └── train.log                              (~700 KB)

+0 表示 seed=0。后续如果发其他 seed 会按 +10 / +20 命名。

📜 License & Attribution

Code: MIT (consistent with TD-MPC2 and HumanoidBench upstream)
Algorithm: TD-MPC2 (Hansen et al., 2024)
Benchmark: HumanoidBench (Sferrazza et al., 2024)
Trained by: https://github.com/vitorcen on AutoDL infrastructure

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Paper for wsagi/HumanoidBench-TD-MPC2

HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation

Paper • 2403.10506 • Published Jun 18, 2024