| --- |
| library_name: tdmpc2 |
| tags: |
| - reinforcement-learning |
| - humanoid |
| - mujoco |
| - humanoid-bench |
| - locomotion |
| - unitree-h1 |
| - unitree-g1 |
| - model-based-rl |
| - mpc |
| datasets: |
| - carlosferrazza/humanoid-bench |
| license: mit |
| --- |
| |
| # HumanoidBench-TD-MPC2 · 自训通关 checkpoints |
|
|
| _Self-trained TD-MPC2 checkpoints on HumanoidBench locomotion tasks._ |
|
|
| > 🛠 **训练源码 / Training source**: <https://github.com/vitorcen/humanoid-training> |
| > 完整训练脚本、patches、eval harness、分析文档全在 GitHub 配套仓库。 |
| > _Full training scripts, patches, eval harness, and analysis docs in the companion GitHub repo._ |
|
|
| TD-MPC2 是 model-based RL 算法,结合 world model + sample-based MPC planning。 |
| 本仓库收录在 [HumanoidBench](https://github.com/carlosferrazza/humanoid-bench) 上**从零自训**的 checkpoints。 |
|
|
| _TD-MPC2 is a model-based RL algorithm combining a world model with sample-based MPC planning. This repo hosts checkpoints **trained from scratch** on HumanoidBench tasks._ |
|
|
| --- |
|
|
| ## 📊 性能 / Performance |
|
|
| | Task | success_rate | mean_return | N | mean_steps | 备注 | |
| |---|---|---|---|---|---| |
| | **`h1-walk-v0`** | **100%** | **816.7** | 3 | 1000/1000 | 训练全程稳定,从 step 800k 起 success=100% | |
| | **`g1-walk-v0`** | **50%** | **601.7 ± 271.1** | 6 | 755/1000 | 高方差,1/6 集 early fall | |
| |
| `success_bar = 700` (HumanoidBench locomotion threshold). _Success = episode return ≥ success_bar._ |
|
|
| --- |
|
|
| ## 🎬 演示 / Demos |
|
|
| ### H1-walk-v0 (Unitree H1, 19 DoF) |
|
|
| <video controls width="720" src="https://huggingface.co/wsagi/HumanoidBench-TD-MPC2/resolve/main/assets/tdmpc2-h1-walk.mp4"></video> |
|
|
| 完整 walking cycle,1000 步不倒。_Full walking cycle, runs through 1000 steps without falling._ |
|
|
| ### G1-walk-v0 (Unitree G1, 23 DoF with PD + BlockedHands wrappers) |
|
|
| <video controls width="720" src="https://huggingface.co/wsagi/HumanoidBench-TD-MPC2/resolve/main/assets/tdmpc2-g1-walk.mp4"></video> |
|
|
| 37D action 屏蔽 14D 手指剩 23D,配 PD 位置控制;含偶发踉跄但 50% 集 ≥ success_bar。 |
| _With 14D fingers masked (37→23 act dim) + PD position control; occasional stumbles but 50% of eps clear the success bar._ |
|
|
| ### 对比同任务 DR.Q 自训 ([wsagi/HumanoidBench-DR.Q](https://huggingface.co/wsagi/HumanoidBench-DR.Q)) |
|
|
| | Task | Algo | Final step | mean_return | success_rate | |
| |---|---|---|---|---| |
| | h1-walk-v0 | DR.Q | 500k | 801 | 90% (N=10) | |
| | h1-walk-v0 | **TD-MPC2** (this) | 950k | **817** | **100%** (N=3) ⭐ | |
| | g1-walk-v0 | DR.Q PDBH | 500k | 711 | 70% (N=10) | |
| | g1-walk-v0 | **TD-MPC2** PDBH (this) | 950k | **602** | **50%** (N=6) | |
|
|
| **结论**:TD-MPC2 在 **h1-walk** 上略胜 DR.Q(同 step 范围,更稳定);在更难的 **g1-walk** (37D + PDBH wrappers) 上落后于 DR.Q,但仍满足 ≥30% 通关阈值。 |
|
|
| _TD-MPC2 slightly outperforms DR.Q on h1-walk (more stable); falls behind DR.Q on the harder g1-walk task but still passes the 30% threshold._ |
|
|
| --- |
|
|
| ## 🔧 训练配置 / Training config |
|
|
| | Task | Robot | act_dim | Wrappers | Steps | Hardware | Wall time | |
| |---|---|---|---|---|---|---| |
| | `h1-walk-v0` | Unitree H1 | 19 | none | 1M | 4090 24GB | ~24h | |
| | `g1-walk-v0` | Unitree G1 | 23 | PD + BlockedHands | 1M | AutoDL 4080S 32GB | ~22h (3-seed parallel) | |
| |
| - **Algorithm**: TD-MPC2 `model_size=5` (small, ~16M params) |
| - **Seed**: 0 for h1-walk; 0 for g1-walk (best of 3 seeds 0/10/20, multi-seed parallel on same GPU) |
| - **Multi-seed parallel pattern**: see [feedback_tdmpc2_multiseed.md](https://github.com/vitorcen/humanoid-training/blob/main/.claude/memory/feedback_tdmpc2_multiseed.md) — 3 seeds time-slice one GPU, util 15% → 98%, total throughput 2.7× |
|
|
| ### Patches applied to upstream submodules |
|
|
| Both **required** for G1-walk — torque-only G1 will not learn to walk ([memory record](https://github.com/vitorcen/humanoid-training/blob/main/.claude/memory/project_benchmark_validation.md)): |
|
|
| - `patches/g1-pos-control.patch` — replaces torque actuators with PD position actuators |
| - `patches/humanoid-bench-g1-and-lazy.patch` — BlockedHands wrapper to freeze 14 finger DoFs (irrelevant noise for walk task) |
| - `patches/tdmpc2-save-agent.patch` — fixes upstream TD-MPC2 to actually save weights every eval (the only patch required for h1-walk) |
|
|
| Apply with `bash patches/apply.sh` from the [training repo](https://github.com/vitorcen/humanoid-training). |
|
|
| --- |
|
|
| ## 🚀 推理 / Inference |
|
|
| 完整 deterministic eval + GUI viewer 脚本: |
|
|
| - `scripts/tdmpc2_eval.py` — N-ep JSONL eval (headless) |
| - `scripts/tdmpc2_viewer.py` — GUI viewer (GLFW) |
|
|
| 均在 [配套 GitHub 仓库](https://github.com/vitorcen/humanoid-training/tree/main/scripts)。 |
|
|
| ```bash |
| # headless N=10 eval |
| DISPLAY=:0 conda run -n humanoidbench python scripts/tdmpc2_eval.py \ |
| --task humanoid_g1-walk-v0 \ |
| --ckpt TDMPC2+HBench-g1-walk-v0+0/step_950000.pt \ |
| --seed 0 --eval 10 --out g1_eval.jsonl |
| |
| # GUI replay |
| DISPLAY=:0 conda run -n humanoidbench python scripts/tdmpc2_viewer.py \ |
| --task humanoid_g1-walk-v0 \ |
| --ckpt TDMPC2+HBench-g1-walk-v0+0/step_950000.pt \ |
| --seed 0 --fps 50 |
| ``` |
|
|
| --- |
|
|
| ## 📁 仓库结构 / Repo layout |
|
|
| ``` |
| HumanoidBench-TD-MPC2/ |
| ├── README.md (this file) |
| ├── assets/ |
| │ ├── tdmpc2-h1-walk.mp4 (515 KB — H1-walk GUI recording) |
| │ └── tdmpc2-g1-walk.mp4 (257 KB — G1-walk GUI recording) |
| ├── TDMPC2+HBench-h1-walk-v0+0/ |
| │ ├── step_950000.pt (32 MB — agent + world model + critic) |
| │ ├── train.log (~370 KB — full training log) |
| │ └── ckpt_eval.csv (auto-eval per ckpt, N=3 quick) |
| └── TDMPC2+HBench-g1-walk-v0+0/ |
| ├── step_950000.pt (32 MB) |
| └── train.log (~700 KB) |
| ``` |
|
|
| `+0` 表示 seed=0。后续如果发其他 seed 会按 `+10` / `+20` 命名。 |
|
|
| --- |
|
|
| ## 📜 License & Attribution |
|
|
| - **Code**: MIT (consistent with [TD-MPC2](https://github.com/nicklashansen/tdmpc2) and [HumanoidBench](https://github.com/carlosferrazza/humanoid-bench) upstream) |
| - **Algorithm**: [TD-MPC2 (Hansen et al., 2024)](https://www.tdmpc2.com/) |
| - **Benchmark**: [HumanoidBench (Sferrazza et al., 2024)](https://arxiv.org/abs/2403.10506) |
| - **Trained by**: <https://github.com/vitorcen> on AutoDL infrastructure |
|
|