File size: 6,456 Bytes
fd37551
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4afe038
 
 
 
 
 
 
 
 
 
 
fd37551
4afe038
fd37551
4afe038
 
fd37551
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4afe038
 
 
 
 
 
 
 
 
 
 
 
fd37551
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
---
library_name: tdmpc2
tags:
  - reinforcement-learning
  - humanoid
  - mujoco
  - humanoid-bench
  - locomotion
  - unitree-h1
  - unitree-g1
  - model-based-rl
  - mpc
datasets:
  - carlosferrazza/humanoid-bench
license: mit
---

# HumanoidBench-TD-MPC2 · 自训通关 checkpoints

_Self-trained TD-MPC2 checkpoints on HumanoidBench locomotion tasks._

> 🛠 **训练源码 / Training source**: <https://github.com/vitorcen/humanoid-training>
> 完整训练脚本、patches、eval harness、分析文档全在 GitHub 配套仓库。
> _Full training scripts, patches, eval harness, and analysis docs in the companion GitHub repo._

TD-MPC2 是 model-based RL 算法,结合 world model + sample-based MPC planning。
本仓库收录在 [HumanoidBench](https://github.com/carlosferrazza/humanoid-bench) 上**从零自训**的 checkpoints。

_TD-MPC2 is a model-based RL algorithm combining a world model with sample-based MPC planning. This repo hosts checkpoints **trained from scratch** on HumanoidBench tasks._

---

## 📊 性能 / Performance

| Task | success_rate | mean_return | N | mean_steps | 备注 |
|---|---|---|---|---|---|
| **`h1-walk-v0`** | **100%** | **816.7** | 3 | 1000/1000 | 训练全程稳定,从 step 800k 起 success=100% |
| **`g1-walk-v0`** | **50%** | **601.7 ± 271.1** | 6 | 755/1000 | 高方差,1/6 集 early fall |

`success_bar = 700` (HumanoidBench locomotion threshold). _Success = episode return ≥ success_bar._

---

## 🎬 演示 / Demos

### H1-walk-v0 (Unitree H1, 19 DoF)

<video controls width="720" src="https://huggingface.co/wsagi/HumanoidBench-TD-MPC2/resolve/main/assets/tdmpc2-h1-walk.mp4"></video>

完整 walking cycle,1000 步不倒。_Full walking cycle, runs through 1000 steps without falling._

### G1-walk-v0 (Unitree G1, 23 DoF with PD + BlockedHands wrappers)

<video controls width="720" src="https://huggingface.co/wsagi/HumanoidBench-TD-MPC2/resolve/main/assets/tdmpc2-g1-walk.mp4"></video>

37D action 屏蔽 14D 手指剩 23D,配 PD 位置控制;含偶发踉跄但 50% 集 ≥ success_bar。
_With 14D fingers masked (37→23 act dim) + PD position control; occasional stumbles but 50% of eps clear the success bar._

### 对比同任务 DR.Q 自训 ([wsagi/HumanoidBench-DR.Q](https://huggingface.co/wsagi/HumanoidBench-DR.Q))

| Task | Algo | Final step | mean_return | success_rate |
|---|---|---|---|---|
| h1-walk-v0 | DR.Q | 500k | 801 | 90% (N=10) |
| h1-walk-v0 | **TD-MPC2** (this) | 950k | **817** | **100%** (N=3) ⭐ |
| g1-walk-v0 | DR.Q PDBH | 500k | 711 | 70% (N=10) |
| g1-walk-v0 | **TD-MPC2** PDBH (this) | 950k | **602** | **50%** (N=6) |

**结论**:TD-MPC2 在 **h1-walk** 上略胜 DR.Q(同 step 范围,更稳定);在更难的 **g1-walk** (37D + PDBH wrappers) 上落后于 DR.Q,但仍满足 ≥30% 通关阈值。

_TD-MPC2 slightly outperforms DR.Q on h1-walk (more stable); falls behind DR.Q on the harder g1-walk task but still passes the 30% threshold._

---

## 🔧 训练配置 / Training config

| Task | Robot | act_dim | Wrappers | Steps | Hardware | Wall time |
|---|---|---|---|---|---|---|
| `h1-walk-v0` | Unitree H1 | 19 | none | 1M | 4090 24GB | ~24h |
| `g1-walk-v0` | Unitree G1 | 23 | PD + BlockedHands | 1M | AutoDL 4080S 32GB | ~22h (3-seed parallel) |

- **Algorithm**: TD-MPC2 `model_size=5` (small, ~16M params)
- **Seed**: 0 for h1-walk; 0 for g1-walk (best of 3 seeds 0/10/20, multi-seed parallel on same GPU)
- **Multi-seed parallel pattern**: see [feedback_tdmpc2_multiseed.md](https://github.com/vitorcen/humanoid-training/blob/main/.claude/memory/feedback_tdmpc2_multiseed.md) — 3 seeds time-slice one GPU, util 15% → 98%, total throughput 2.7×

### Patches applied to upstream submodules

Both **required** for G1-walk — torque-only G1 will not learn to walk ([memory record](https://github.com/vitorcen/humanoid-training/blob/main/.claude/memory/project_benchmark_validation.md)):

- `patches/g1-pos-control.patch` — replaces torque actuators with PD position actuators
- `patches/humanoid-bench-g1-and-lazy.patch` — BlockedHands wrapper to freeze 14 finger DoFs (irrelevant noise for walk task)
- `patches/tdmpc2-save-agent.patch` — fixes upstream TD-MPC2 to actually save weights every eval (the only patch required for h1-walk)

Apply with `bash patches/apply.sh` from the [training repo](https://github.com/vitorcen/humanoid-training).

---

## 🚀 推理 / Inference

完整 deterministic eval + GUI viewer 脚本:

- `scripts/tdmpc2_eval.py` — N-ep JSONL eval (headless)
- `scripts/tdmpc2_viewer.py` — GUI viewer (GLFW)

均在 [配套 GitHub 仓库](https://github.com/vitorcen/humanoid-training/tree/main/scripts)。

```bash
# headless N=10 eval
DISPLAY=:0 conda run -n humanoidbench python scripts/tdmpc2_eval.py \
    --task humanoid_g1-walk-v0 \
    --ckpt TDMPC2+HBench-g1-walk-v0+0/step_950000.pt \
    --seed 0 --eval 10 --out g1_eval.jsonl

# GUI replay
DISPLAY=:0 conda run -n humanoidbench python scripts/tdmpc2_viewer.py \
    --task humanoid_g1-walk-v0 \
    --ckpt TDMPC2+HBench-g1-walk-v0+0/step_950000.pt \
    --seed 0 --fps 50
```

---

## 📁 仓库结构 / Repo layout

```
HumanoidBench-TD-MPC2/
├── README.md                                  (this file)
├── assets/
│   ├── tdmpc2-h1-walk.mp4                     (515 KB — H1-walk GUI recording)
│   └── tdmpc2-g1-walk.mp4                     (257 KB — G1-walk GUI recording)
├── TDMPC2+HBench-h1-walk-v0+0/
│   ├── step_950000.pt                         (32 MB — agent + world model + critic)
│   ├── train.log                              (~370 KB — full training log)
│   └── ckpt_eval.csv                          (auto-eval per ckpt, N=3 quick)
└── TDMPC2+HBench-g1-walk-v0+0/
    ├── step_950000.pt                         (32 MB)
    └── train.log                              (~700 KB)
```

`+0` 表示 seed=0。后续如果发其他 seed 会按 `+10` / `+20` 命名。

---

## 📜 License & Attribution

- **Code**: MIT (consistent with [TD-MPC2](https://github.com/nicklashansen/tdmpc2) and [HumanoidBench](https://github.com/carlosferrazza/humanoid-bench) upstream)
- **Algorithm**: [TD-MPC2 (Hansen et al., 2024)](https://www.tdmpc2.com/)
- **Benchmark**: [HumanoidBench (Sferrazza et al., 2024)](https://arxiv.org/abs/2403.10506)
- **Trained by**: <https://github.com/vitorcen> on AutoDL infrastructure