HumanoidBench-TD-MPC2 · 自训通关 checkpoints
Self-trained TD-MPC2 checkpoints on HumanoidBench locomotion tasks.
🛠 训练源码 / Training source: https://github.com/vitorcen/humanoid-training 完整训练脚本、patches、eval harness、分析文档全在 GitHub 配套仓库。 Full training scripts, patches, eval harness, and analysis docs in the companion GitHub repo.
TD-MPC2 是 model-based RL 算法,结合 world model + sample-based MPC planning。 本仓库收录在 HumanoidBench 上从零自训的 checkpoints。
TD-MPC2 is a model-based RL algorithm combining a world model with sample-based MPC planning. This repo hosts checkpoints trained from scratch on HumanoidBench tasks.
📊 性能 / Performance
| Task | success_rate | mean_return | N | mean_steps | 备注 |
|---|---|---|---|---|---|
h1-walk-v0 |
100% | 816.7 | 3 | 1000/1000 | 训练全程稳定,从 step 800k 起 success=100% |
g1-walk-v0 |
50% | 601.7 ± 271.1 | 6 | 755/1000 | 高方差,1/6 集 early fall |
success_bar = 700 (HumanoidBench locomotion threshold). Success = episode return ≥ success_bar.
🎬 演示 / Demos
H1-walk-v0 (Unitree H1, 19 DoF)
完整 walking cycle,1000 步不倒。Full walking cycle, runs through 1000 steps without falling.
G1-walk-v0 (Unitree G1, 23 DoF with PD + BlockedHands wrappers)
37D action 屏蔽 14D 手指剩 23D,配 PD 位置控制;含偶发踉跄但 50% 集 ≥ success_bar。 With 14D fingers masked (37→23 act dim) + PD position control; occasional stumbles but 50% of eps clear the success bar.
对比同任务 DR.Q 自训 (wsagi/HumanoidBench-DR.Q)
| Task | Algo | Final step | mean_return | success_rate |
|---|---|---|---|---|
| h1-walk-v0 | DR.Q | 500k | 801 | 90% (N=10) |
| h1-walk-v0 | TD-MPC2 (this) | 950k | 817 | 100% (N=3) ⭐ |
| g1-walk-v0 | DR.Q PDBH | 500k | 711 | 70% (N=10) |
| g1-walk-v0 | TD-MPC2 PDBH (this) | 950k | 602 | 50% (N=6) |
结论:TD-MPC2 在 h1-walk 上略胜 DR.Q(同 step 范围,更稳定);在更难的 g1-walk (37D + PDBH wrappers) 上落后于 DR.Q,但仍满足 ≥30% 通关阈值。
TD-MPC2 slightly outperforms DR.Q on h1-walk (more stable); falls behind DR.Q on the harder g1-walk task but still passes the 30% threshold.
🔧 训练配置 / Training config
| Task | Robot | act_dim | Wrappers | Steps | Hardware | Wall time |
|---|---|---|---|---|---|---|
h1-walk-v0 |
Unitree H1 | 19 | none | 1M | 4090 24GB | ~24h |
g1-walk-v0 |
Unitree G1 | 23 | PD + BlockedHands | 1M | AutoDL 4080S 32GB | ~22h (3-seed parallel) |
- Algorithm: TD-MPC2
model_size=5(small, ~16M params) - Seed: 0 for h1-walk; 0 for g1-walk (best of 3 seeds 0/10/20, multi-seed parallel on same GPU)
- Multi-seed parallel pattern: see feedback_tdmpc2_multiseed.md — 3 seeds time-slice one GPU, util 15% → 98%, total throughput 2.7×
Patches applied to upstream submodules
Both required for G1-walk — torque-only G1 will not learn to walk (memory record):
patches/g1-pos-control.patch— replaces torque actuators with PD position actuatorspatches/humanoid-bench-g1-and-lazy.patch— BlockedHands wrapper to freeze 14 finger DoFs (irrelevant noise for walk task)patches/tdmpc2-save-agent.patch— fixes upstream TD-MPC2 to actually save weights every eval (the only patch required for h1-walk)
Apply with bash patches/apply.sh from the training repo.
🚀 推理 / Inference
完整 deterministic eval + GUI viewer 脚本:
scripts/tdmpc2_eval.py— N-ep JSONL eval (headless)scripts/tdmpc2_viewer.py— GUI viewer (GLFW)
均在 配套 GitHub 仓库。
# headless N=10 eval
DISPLAY=:0 conda run -n humanoidbench python scripts/tdmpc2_eval.py \
--task humanoid_g1-walk-v0 \
--ckpt TDMPC2+HBench-g1-walk-v0+0/step_950000.pt \
--seed 0 --eval 10 --out g1_eval.jsonl
# GUI replay
DISPLAY=:0 conda run -n humanoidbench python scripts/tdmpc2_viewer.py \
--task humanoid_g1-walk-v0 \
--ckpt TDMPC2+HBench-g1-walk-v0+0/step_950000.pt \
--seed 0 --fps 50
📁 仓库结构 / Repo layout
HumanoidBench-TD-MPC2/
├── README.md (this file)
├── assets/
│ ├── tdmpc2-h1-walk.mp4 (515 KB — H1-walk GUI recording)
│ └── tdmpc2-g1-walk.mp4 (257 KB — G1-walk GUI recording)
├── TDMPC2+HBench-h1-walk-v0+0/
│ ├── step_950000.pt (32 MB — agent + world model + critic)
│ ├── train.log (~370 KB — full training log)
│ └── ckpt_eval.csv (auto-eval per ckpt, N=3 quick)
└── TDMPC2+HBench-g1-walk-v0+0/
├── step_950000.pt (32 MB)
└── train.log (~700 KB)
+0 表示 seed=0。后续如果发其他 seed 会按 +10 / +20 命名。
📜 License & Attribution
- Code: MIT (consistent with TD-MPC2 and HumanoidBench upstream)
- Algorithm: TD-MPC2 (Hansen et al., 2024)
- Benchmark: HumanoidBench (Sferrazza et al., 2024)
- Trained by: https://github.com/vitorcen on AutoDL infrastructure