File size: 6,456 Bytes
fd37551 4afe038 fd37551 4afe038 fd37551 4afe038 fd37551 4afe038 fd37551 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 | ---
library_name: tdmpc2
tags:
- reinforcement-learning
- humanoid
- mujoco
- humanoid-bench
- locomotion
- unitree-h1
- unitree-g1
- model-based-rl
- mpc
datasets:
- carlosferrazza/humanoid-bench
license: mit
---
# HumanoidBench-TD-MPC2 · 自训通关 checkpoints
_Self-trained TD-MPC2 checkpoints on HumanoidBench locomotion tasks._
> 🛠 **训练源码 / Training source**: <https://github.com/vitorcen/humanoid-training>
> 完整训练脚本、patches、eval harness、分析文档全在 GitHub 配套仓库。
> _Full training scripts, patches, eval harness, and analysis docs in the companion GitHub repo._
TD-MPC2 是 model-based RL 算法,结合 world model + sample-based MPC planning。
本仓库收录在 [HumanoidBench](https://github.com/carlosferrazza/humanoid-bench) 上**从零自训**的 checkpoints。
_TD-MPC2 is a model-based RL algorithm combining a world model with sample-based MPC planning. This repo hosts checkpoints **trained from scratch** on HumanoidBench tasks._
---
## 📊 性能 / Performance
| Task | success_rate | mean_return | N | mean_steps | 备注 |
|---|---|---|---|---|---|
| **`h1-walk-v0`** | **100%** | **816.7** | 3 | 1000/1000 | 训练全程稳定,从 step 800k 起 success=100% |
| **`g1-walk-v0`** | **50%** | **601.7 ± 271.1** | 6 | 755/1000 | 高方差,1/6 集 early fall |
`success_bar = 700` (HumanoidBench locomotion threshold). _Success = episode return ≥ success_bar._
---
## 🎬 演示 / Demos
### H1-walk-v0 (Unitree H1, 19 DoF)
<video controls width="720" src="https://huggingface.co/wsagi/HumanoidBench-TD-MPC2/resolve/main/assets/tdmpc2-h1-walk.mp4"></video>
完整 walking cycle,1000 步不倒。_Full walking cycle, runs through 1000 steps without falling._
### G1-walk-v0 (Unitree G1, 23 DoF with PD + BlockedHands wrappers)
<video controls width="720" src="https://huggingface.co/wsagi/HumanoidBench-TD-MPC2/resolve/main/assets/tdmpc2-g1-walk.mp4"></video>
37D action 屏蔽 14D 手指剩 23D,配 PD 位置控制;含偶发踉跄但 50% 集 ≥ success_bar。
_With 14D fingers masked (37→23 act dim) + PD position control; occasional stumbles but 50% of eps clear the success bar._
### 对比同任务 DR.Q 自训 ([wsagi/HumanoidBench-DR.Q](https://huggingface.co/wsagi/HumanoidBench-DR.Q))
| Task | Algo | Final step | mean_return | success_rate |
|---|---|---|---|---|
| h1-walk-v0 | DR.Q | 500k | 801 | 90% (N=10) |
| h1-walk-v0 | **TD-MPC2** (this) | 950k | **817** | **100%** (N=3) ⭐ |
| g1-walk-v0 | DR.Q PDBH | 500k | 711 | 70% (N=10) |
| g1-walk-v0 | **TD-MPC2** PDBH (this) | 950k | **602** | **50%** (N=6) |
**结论**:TD-MPC2 在 **h1-walk** 上略胜 DR.Q(同 step 范围,更稳定);在更难的 **g1-walk** (37D + PDBH wrappers) 上落后于 DR.Q,但仍满足 ≥30% 通关阈值。
_TD-MPC2 slightly outperforms DR.Q on h1-walk (more stable); falls behind DR.Q on the harder g1-walk task but still passes the 30% threshold._
---
## 🔧 训练配置 / Training config
| Task | Robot | act_dim | Wrappers | Steps | Hardware | Wall time |
|---|---|---|---|---|---|---|
| `h1-walk-v0` | Unitree H1 | 19 | none | 1M | 4090 24GB | ~24h |
| `g1-walk-v0` | Unitree G1 | 23 | PD + BlockedHands | 1M | AutoDL 4080S 32GB | ~22h (3-seed parallel) |
- **Algorithm**: TD-MPC2 `model_size=5` (small, ~16M params)
- **Seed**: 0 for h1-walk; 0 for g1-walk (best of 3 seeds 0/10/20, multi-seed parallel on same GPU)
- **Multi-seed parallel pattern**: see [feedback_tdmpc2_multiseed.md](https://github.com/vitorcen/humanoid-training/blob/main/.claude/memory/feedback_tdmpc2_multiseed.md) — 3 seeds time-slice one GPU, util 15% → 98%, total throughput 2.7×
### Patches applied to upstream submodules
Both **required** for G1-walk — torque-only G1 will not learn to walk ([memory record](https://github.com/vitorcen/humanoid-training/blob/main/.claude/memory/project_benchmark_validation.md)):
- `patches/g1-pos-control.patch` — replaces torque actuators with PD position actuators
- `patches/humanoid-bench-g1-and-lazy.patch` — BlockedHands wrapper to freeze 14 finger DoFs (irrelevant noise for walk task)
- `patches/tdmpc2-save-agent.patch` — fixes upstream TD-MPC2 to actually save weights every eval (the only patch required for h1-walk)
Apply with `bash patches/apply.sh` from the [training repo](https://github.com/vitorcen/humanoid-training).
---
## 🚀 推理 / Inference
完整 deterministic eval + GUI viewer 脚本:
- `scripts/tdmpc2_eval.py` — N-ep JSONL eval (headless)
- `scripts/tdmpc2_viewer.py` — GUI viewer (GLFW)
均在 [配套 GitHub 仓库](https://github.com/vitorcen/humanoid-training/tree/main/scripts)。
```bash
# headless N=10 eval
DISPLAY=:0 conda run -n humanoidbench python scripts/tdmpc2_eval.py \
--task humanoid_g1-walk-v0 \
--ckpt TDMPC2+HBench-g1-walk-v0+0/step_950000.pt \
--seed 0 --eval 10 --out g1_eval.jsonl
# GUI replay
DISPLAY=:0 conda run -n humanoidbench python scripts/tdmpc2_viewer.py \
--task humanoid_g1-walk-v0 \
--ckpt TDMPC2+HBench-g1-walk-v0+0/step_950000.pt \
--seed 0 --fps 50
```
---
## 📁 仓库结构 / Repo layout
```
HumanoidBench-TD-MPC2/
├── README.md (this file)
├── assets/
│ ├── tdmpc2-h1-walk.mp4 (515 KB — H1-walk GUI recording)
│ └── tdmpc2-g1-walk.mp4 (257 KB — G1-walk GUI recording)
├── TDMPC2+HBench-h1-walk-v0+0/
│ ├── step_950000.pt (32 MB — agent + world model + critic)
│ ├── train.log (~370 KB — full training log)
│ └── ckpt_eval.csv (auto-eval per ckpt, N=3 quick)
└── TDMPC2+HBench-g1-walk-v0+0/
├── step_950000.pt (32 MB)
└── train.log (~700 KB)
```
`+0` 表示 seed=0。后续如果发其他 seed 会按 `+10` / `+20` 命名。
---
## 📜 License & Attribution
- **Code**: MIT (consistent with [TD-MPC2](https://github.com/nicklashansen/tdmpc2) and [HumanoidBench](https://github.com/carlosferrazza/humanoid-bench) upstream)
- **Algorithm**: [TD-MPC2 (Hansen et al., 2024)](https://www.tdmpc2.com/)
- **Benchmark**: [HumanoidBench (Sferrazza et al., 2024)](https://arxiv.org/abs/2403.10506)
- **Trained by**: <https://github.com/vitorcen> on AutoDL infrastructure
|