Add files using upload-large-folder tool

Browse files

Files changed (9) hide show

.gitattributes +2 -0
README.md +136 -0
TDMPC2+HBench-g1-walk-v0+0/preview.mp4 +3 -0
TDMPC2+HBench-g1-walk-v0+0/step_950000.pt +3 -0
TDMPC2+HBench-g1-walk-v0+0/train.log +0 -0
TDMPC2+HBench-h1-walk-v0+0/ckpt_eval.csv +23 -0
TDMPC2+HBench-h1-walk-v0+0/preview.mp4 +3 -0
TDMPC2+HBench-h1-walk-v0+0/step_950000.pt +3 -0
TDMPC2+HBench-h1-walk-v0+0/train.log +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+TDMPC2+HBench-h1-walk-v0+0/preview.mp4 filter=lfs diff=lfs merge=lfs -text
+TDMPC2+HBench-g1-walk-v0+0/preview.mp4 filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,136 @@

+---
+library_name: tdmpc2
+tags:
+  - reinforcement-learning
+  - humanoid
+  - mujoco
+  - humanoid-bench
+  - locomotion
+  - unitree-h1
+  - unitree-g1
+  - model-based-rl
+  - mpc
+datasets:
+  - carlosferrazza/humanoid-bench
+license: mit
+---
+# HumanoidBench-TD-MPC2 · 自训通关 checkpoints
+_Self-trained TD-MPC2 checkpoints on HumanoidBench locomotion tasks._
+> 🛠 **训练源码 / Training source**: <https://github.com/vitorcen/humanoid-training>
+> 完整训练脚本、patches、eval harness、分析文档全在 GitHub 配套仓库。
+> _Full training scripts, patches, eval harness, and analysis docs in the companion GitHub repo._
+TD-MPC2 是 model-based RL 算法，结合 world model + sample-based MPC planning。
+本仓库收录在 [HumanoidBench](https://github.com/carlosferrazza/humanoid-bench) 上**从零自训**的 checkpoints。
+_TD-MPC2 is a model-based RL algorithm combining a world model with sample-based MPC planning. This repo hosts checkpoints **trained from scratch** on HumanoidBench tasks._
+---
+## 📊 性能 / Performance
+| Task | success_rate | mean_return | N | mean_steps | 备注 |
+|---|---|---|---|---|---|
+| **`h1-walk-v0`** | **100%** | **816.7** | 3 | 1000/1000 | 训练全程稳定，从 step 800k 起 success=100% |
+| **`g1-walk-v0`** | **50%** | **601.7 ± 271.1** | 6 | 755/1000 | 高方差，1/6 集 early fall |
+`success_bar = 700` (HumanoidBench locomotion threshold). _Success = episode return ≥ success_bar._
+### 视频预览 / Video preview
+每个任务子目录下有 `preview.mp4` 演示 deterministic eval（最佳 seed, GUI viewer 录屏）：
+- **`TDMPC2+HBench-h1-walk-v0+0/preview.mp4`** — H1 humanoid 完整 walking cycle，1000 步不倒
+- **`TDMPC2+HBench-g1-walk-v0+0/preview.mp4`** — G1 humanoid 走路，含偶发踉跄
+### 对比同任务 DR.Q 自训 ([wsagi/HumanoidBench-DR.Q](https://huggingface.co/wsagi/HumanoidBench-DR.Q))
+| Task | Algo | Final step | mean_return | success_rate |
+|---|---|---|---|---|
+| h1-walk-v0 | DR.Q | 500k | 801 | 90% (N=10) |
+| h1-walk-v0 | **TD-MPC2** (this) | 950k | **817** | **100%** (N=3) ⭐ |
+| g1-walk-v0 | DR.Q PDBH | 500k | 711 | 70% (N=10) |
+| g1-walk-v0 | **TD-MPC2** PDBH (this) | 950k | **602** | **50%** (N=6) |
+**结论**：TD-MPC2 在 **h1-walk** 上略胜 DR.Q（同 step 范围，更稳定）；在更难的 **g1-walk** (37D + PDBH wrappers) 上落后于 DR.Q，但仍满足 ≥30% 通关阈值。
+_TD-MPC2 slightly outperforms DR.Q on h1-walk (more stable); falls behind DR.Q on the harder g1-walk task but still passes the 30% threshold._
+---
+## 🔧 训练配置 / Training config
+| Task | Robot | act_dim | Wrappers | Steps | Hardware | Wall time |
+|---|---|---|---|---|---|---|
+| `h1-walk-v0` | Unitree H1 | 19 | none | 1M | 4090 24GB | ~24h |
+| `g1-walk-v0` | Unitree G1 | 23 | PD + BlockedHands | 1M | AutoDL 4080S 32GB | ~22h (3-seed parallel) |
+- **Algorithm**: TD-MPC2 `model_size=5` (small, ~16M params)
+- **Seed**: 0 for h1-walk; 0 for g1-walk (best of 3 seeds 0/10/20, multi-seed parallel on same GPU)
+- **Multi-seed parallel pattern**: see [feedback_tdmpc2_multiseed.md](https://github.com/vitorcen/humanoid-training/blob/main/.claude/memory/feedback_tdmpc2_multiseed.md) — 3 seeds time-slice one GPU, util 15% → 98%, total throughput 2.7×
+### Patches applied to upstream submodules
+Both **required** for G1-walk — torque-only G1 will not learn to walk ([memory record](https://github.com/vitorcen/humanoid-training/blob/main/.claude/memory/project_benchmark_validation.md)):
+- `patches/g1-pos-control.patch` — replaces torque actuators with PD position actuators
+- `patches/humanoid-bench-g1-and-lazy.patch` — BlockedHands wrapper to freeze 14 finger DoFs (irrelevant noise for walk task)
+- `patches/tdmpc2-save-agent.patch` — fixes upstream TD-MPC2 to actually save weights every eval (the only patch required for h1-walk)
+Apply with `bash patches/apply.sh` from the [training repo](https://github.com/vitorcen/humanoid-training).
+---
+## 🚀 推理 / Inference
+完整 deterministic eval + GUI viewer 脚本：
+- `scripts/tdmpc2_eval.py` — N-ep JSONL eval (headless)
+- `scripts/tdmpc2_viewer.py` — GUI viewer (GLFW)
+均在 [配套 GitHub 仓库](https://github.com/vitorcen/humanoid-training/tree/main/scripts)。
+```bash
+# headless N=10 eval
+DISPLAY=:0 conda run -n humanoidbench python scripts/tdmpc2_eval.py \
+    --task humanoid_g1-walk-v0 \
+    --ckpt TDMPC2+HBench-g1-walk-v0+0/step_950000.pt \
+    --seed 0 --eval 10 --out g1_eval.jsonl
+# GUI replay
+DISPLAY=:0 conda run -n humanoidbench python scripts/tdmpc2_viewer.py \
+    --task humanoid_g1-walk-v0 \
+    --ckpt TDMPC2+HBench-g1-walk-v0+0/step_950000.pt \
+    --seed 0 --fps 50
+```
+---
+## 📁 仓库结构 / Repo layout
+```
+TDMPC2+HBench-h1-walk-v0+0/
+├── step_950000.pt          (32 MB — agent + world model + critic)
+├── train.log               (~370 KB — full training log)
+├── ckpt_eval.csv           (auto-eval per ckpt, N=3 quick)
+└── preview.mp4             (515 KB — GUI viewer recording)
+TDMPC2+HBench-g1-walk-v0+0/
+├── step_950000.pt          (32 MB)
+├── train.log               (~700 KB)
+└── preview.mp4             (257 KB)
+```
+`+0` 表示 seed=0。后续如果发其他 seed 会按 `+10` / `+20` 命名。
+---
+## 📜 License & Attribution
+- **Code**: MIT (consistent with [TD-MPC2](https://github.com/nicklashansen/tdmpc2) and [HumanoidBench](https://github.com/carlosferrazza/humanoid-bench) upstream)
+- **Algorithm**: [TD-MPC2 (Hansen et al., 2024)](https://www.tdmpc2.com/)
+- **Benchmark**: [HumanoidBench (Sferrazza et al., 2024)](https://arxiv.org/abs/2403.10506)
+- **Trained by**: <https://github.com/vitorcen> on AutoDL infrastructure

TDMPC2+HBench-g1-walk-v0+0/preview.mp4 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:47b47c47d802d399552e6506d5faf781e8fab36395df8406b97704eef04b90bf
+size 256928

TDMPC2+HBench-g1-walk-v0+0/step_950000.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a95587aaf9315a046668036b9224342e7227519d0e2902862b99f2bb30da77c2
+size 32059682

TDMPC2+HBench-g1-walk-v0+0/train.log ADDED Viewed

The diff for this file is too large to render. See raw diff

TDMPC2+HBench-h1-walk-v0+0/ckpt_eval.csv ADDED Viewed

	@@ -0,0 +1,23 @@

+timestamp,ckpt_mtime,agent_train_step,success_rate,mean_return,mean_steps,timeout_rate,n_ep,note,ckpt_file
+1779813130,1779813085,0,0.0000,4.92,56.0,0.0000,3,ok,step_00000000.pt
+1779814213,1779814198,50022,,,,,3,"error: 2_1779814210.jsonl
+ERROR conda.cli.main_run:execute(127): `conda run python /hom",step_00050022.pt
+1779815392,1779815356,100052,0.0000,86.95,154.7,0.0000,3,ok,step_00100052.pt
+1779816573,1779816516,150005,0.0000,85.98,211.3,0.0000,3,ok,step_00150005.pt
+1779817757,1779817679,200101,0.0000,200.00,305.0,0.0000,3,ok,step_00200101.pt
+1779818941,1779818841,250153,0.0000,131.87,286.0,0.0000,3,ok,step_00250153.pt
+1779820041,1779820004,300185,0.0000,290.06,438.0,0.0000,3,ok,step_00300185.pt
+1779821236,1779821168,350149,0.0000,426.35,590.7,0.0000,3,ok,step_00350149.pt
+1779822443,1779822357,400925,0.6667,678.70,913.0,0.6667,3,ok,step_00400925.pt
+1779823563,1779823515,450310,1.0000,784.77,1000.0,1.0000,3,ok,step_00450310.pt
+1779824773,1779824695,500555,1.0000,815.97,1000.0,1.0000,3,ok,step_00500555.pt
+1779825981,1779825866,550378,0.6667,721.70,953.0,0.3333,3,ok,step_00550378.pt
+1779827101,1779827050,600760,1.0000,813.39,1000.0,1.0000,3,ok,step_00600760.pt
+1779828307,1779828221,650589,0.3333,663.86,897.0,0.6667,3,ok,step_00650589.pt
+1779829517,1779829401,700870,1.0000,810.15,1000.0,1.0000,3,ok,step_00700870.pt
+1779830637,1779830563,750356,1.0000,804.41,1000.0,1.0000,3,ok,step_00750356.pt
+1779831848,1779831732,800098,1.0000,817.04,1000.0,1.0000,3,ok,step_00800098.pt
+1779832930,1779832919,850572,,,,,3,"error: 2_1779832928.jsonl
+ERROR conda.cli.main_run:execute(127): `conda run python /hom",step_00850572.pt
+1779834140,1779834083,900572,1.0000,779.23,986.0,0.3333,3,ok,step_00900572.pt
+1779835350,1779835256,950572,1.0000,816.66,1000.0,1.0000,3,ok,step_00950572.pt

TDMPC2+HBench-h1-walk-v0+0/preview.mp4 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2bad368d1a3dad93006d1d0a56f9ead19fbbd8e9f2ec62f1b199ca96e1b5c70f
+size 515283

TDMPC2+HBench-h1-walk-v0+0/step_950000.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:21f44ac488018f64f9852b3534c41c823f50f2dc55b2cffa06cc475a921ff366
+size 31908130

TDMPC2+HBench-h1-walk-v0+0/train.log ADDED Viewed

The diff for this file is too large to render. See raw diff