Add files using upload-large-folder tool

4afe038 verified 3 days ago

6.46 kB

	---
	library_name: tdmpc2
	tags:
	- reinforcement-learning
	- humanoid
	- mujoco
	- humanoid-bench
	- locomotion
	- unitree-h1
	- unitree-g1
	- model-based-rl
	- mpc
	datasets:
	- carlosferrazza/humanoid-bench
	license: mit
	---

	# HumanoidBench-TD-MPC2 · 自训通关 checkpoints

	_Self-trained TD-MPC2 checkpoints on HumanoidBench locomotion tasks._

	> 🛠 训练源码 / Training source: <https://github.com/vitorcen/humanoid-training>
	> 完整训练脚本、patches、eval harness、分析文档全在 GitHub 配套仓库。
	> _Full training scripts, patches, eval harness, and analysis docs in the companion GitHub repo._

	TD-MPC2 是 model-based RL 算法，结合 world model + sample-based MPC planning。
	本仓库收录在 [HumanoidBench](https://github.com/carlosferrazza/humanoid-bench) 上从零自训的 checkpoints。

	_TD-MPC2 is a model-based RL algorithm combining a world model with sample-based MPC planning. This repo hosts checkpoints trained from scratch on HumanoidBench tasks._

	---

	## 📊 性能 / Performance

	\| Task \| success_rate \| mean_return \| N \| mean_steps \| 备注 \|
	\|---\|---\|---\|---\|---\|---\|
	\| `h1-walk-v0` \| 100% \| 816.7 \| 3 \| 1000/1000 \| 训练全程稳定，从 step 800k 起 success=100% \|
	\| `g1-walk-v0` \| 50% \| 601.7 ± 271.1 \| 6 \| 755/1000 \| 高方差，1/6 集 early fall \|

	`success_bar = 700` (HumanoidBench locomotion threshold). _Success = episode return ≥ success_bar._

	---

	## 🎬 演示 / Demos

	### H1-walk-v0 (Unitree H1, 19 DoF)

	<video controls width="720" src="https://huggingface.co/wsagi/HumanoidBench-TD-MPC2/resolve/main/assets/tdmpc2-h1-walk.mp4"></video>

	完整 walking cycle，1000 步不倒。_Full walking cycle, runs through 1000 steps without falling._

	### G1-walk-v0 (Unitree G1, 23 DoF with PD + BlockedHands wrappers)

	<video controls width="720" src="https://huggingface.co/wsagi/HumanoidBench-TD-MPC2/resolve/main/assets/tdmpc2-g1-walk.mp4"></video>

	37D action 屏蔽 14D 手指剩 23D，配 PD 位置控制；含偶发踉跄但 50% 集 ≥ success_bar。
	_With 14D fingers masked (37→23 act dim) + PD position control; occasional stumbles but 50% of eps clear the success bar._

	### 对比同任务 DR.Q 自训 ([wsagi/HumanoidBench-DR.Q](https://huggingface.co/wsagi/HumanoidBench-DR.Q))

	\| Task \| Algo \| Final step \| mean_return \| success_rate \|
	\|---\|---\|---\|---\|---\|
	\| h1-walk-v0 \| DR.Q \| 500k \| 801 \| 90% (N=10) \|
	\| h1-walk-v0 \| TD-MPC2 (this) \| 950k \| 817 \| 100% (N=3) ⭐ \|
	\| g1-walk-v0 \| DR.Q PDBH \| 500k \| 711 \| 70% (N=10) \|
	\| g1-walk-v0 \| TD-MPC2 PDBH (this) \| 950k \| 602 \| 50% (N=6) \|

	结论：TD-MPC2 在 h1-walk 上略胜 DR.Q（同 step 范围，更稳定）；在更难的 g1-walk (37D + PDBH wrappers) 上落后于 DR.Q，但仍满足 ≥30% 通关阈值。

	_TD-MPC2 slightly outperforms DR.Q on h1-walk (more stable); falls behind DR.Q on the harder g1-walk task but still passes the 30% threshold._

	---

	## 🔧 训练配置 / Training config

	\| Task \| Robot \| act_dim \| Wrappers \| Steps \| Hardware \| Wall time \|
	\|---\|---\|---\|---\|---\|---\|---\|
	\| `h1-walk-v0` \| Unitree H1 \| 19 \| none \| 1M \| 4090 24GB \| ~24h \|
	\| `g1-walk-v0` \| Unitree G1 \| 23 \| PD + BlockedHands \| 1M \| AutoDL 4080S 32GB \| ~22h (3-seed parallel) \|

	- Algorithm: TD-MPC2 `model_size=5` (small, ~16M params)
	- Seed: 0 for h1-walk; 0 for g1-walk (best of 3 seeds 0/10/20, multi-seed parallel on same GPU)
	- Multi-seed parallel pattern: see [feedback_tdmpc2_multiseed.md](https://github.com/vitorcen/humanoid-training/blob/main/.claude/memory/feedback_tdmpc2_multiseed.md) — 3 seeds time-slice one GPU, util 15% → 98%, total throughput 2.7×

	### Patches applied to upstream submodules

	Both required for G1-walk — torque-only G1 will not learn to walk ([memory record](https://github.com/vitorcen/humanoid-training/blob/main/.claude/memory/project_benchmark_validation.md)):

	- `patches/g1-pos-control.patch` — replaces torque actuators with PD position actuators
	- `patches/humanoid-bench-g1-and-lazy.patch` — BlockedHands wrapper to freeze 14 finger DoFs (irrelevant noise for walk task)
	- `patches/tdmpc2-save-agent.patch` — fixes upstream TD-MPC2 to actually save weights every eval (the only patch required for h1-walk)

	Apply with `bash patches/apply.sh` from the [training repo](https://github.com/vitorcen/humanoid-training).

	---

	## 🚀 推理 / Inference

	完整 deterministic eval + GUI viewer 脚本：

	- `scripts/tdmpc2_eval.py` — N-ep JSONL eval (headless)
	- `scripts/tdmpc2_viewer.py` — GUI viewer (GLFW)

	均在 [配套 GitHub 仓库](https://github.com/vitorcen/humanoid-training/tree/main/scripts)。

	```bash
	# headless N=10 eval
	DISPLAY=:0 conda run -n humanoidbench python scripts/tdmpc2_eval.py \
	--task humanoid_g1-walk-v0 \
	--ckpt TDMPC2+HBench-g1-walk-v0+0/step_950000.pt \
	--seed 0 --eval 10 --out g1_eval.jsonl

	# GUI replay
	DISPLAY=:0 conda run -n humanoidbench python scripts/tdmpc2_viewer.py \
	--task humanoid_g1-walk-v0 \
	--ckpt TDMPC2+HBench-g1-walk-v0+0/step_950000.pt \
	--seed 0 --fps 50
	```

	---

	## 📁 仓库结构 / Repo layout

	```
	HumanoidBench-TD-MPC2/
	├── README.md (this file)
	├── assets/
	│ ├── tdmpc2-h1-walk.mp4 (515 KB — H1-walk GUI recording)
	│ └── tdmpc2-g1-walk.mp4 (257 KB — G1-walk GUI recording)
	├── TDMPC2+HBench-h1-walk-v0+0/
	│ ├── step_950000.pt (32 MB — agent + world model + critic)
	│ ├── train.log (~370 KB — full training log)
	│ └── ckpt_eval.csv (auto-eval per ckpt, N=3 quick)
	└── TDMPC2+HBench-g1-walk-v0+0/
	├── step_950000.pt (32 MB)
	└── train.log (~700 KB)
	```

	`+0` 表示 seed=0。后续如果发其他 seed 会按 `+10` / `+20` 命名。

	---

	## 📜 License & Attribution

	- Code: MIT (consistent with [TD-MPC2](https://github.com/nicklashansen/tdmpc2) and [HumanoidBench](https://github.com/carlosferrazza/humanoid-bench) upstream)
	- Algorithm: [TD-MPC2 (Hansen et al., 2024)](https://www.tdmpc2.com/)
	- Benchmark: [HumanoidBench (Sferrazza et al., 2024)](https://arxiv.org/abs/2403.10506)
	- Trained by: <https://github.com/vitorcen> on AutoDL infrastructure