Add files using upload-large-folder tool
Browse files- .gitattributes +2 -0
- README.md +136 -0
- TDMPC2+HBench-g1-walk-v0+0/preview.mp4 +3 -0
- TDMPC2+HBench-g1-walk-v0+0/step_950000.pt +3 -0
- TDMPC2+HBench-g1-walk-v0+0/train.log +0 -0
- TDMPC2+HBench-h1-walk-v0+0/ckpt_eval.csv +23 -0
- TDMPC2+HBench-h1-walk-v0+0/preview.mp4 +3 -0
- TDMPC2+HBench-h1-walk-v0+0/step_950000.pt +3 -0
- TDMPC2+HBench-h1-walk-v0+0/train.log +0 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
TDMPC2+HBench-h1-walk-v0+0/preview.mp4 filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
TDMPC2+HBench-g1-walk-v0+0/preview.mp4 filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,136 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: tdmpc2
|
| 3 |
+
tags:
|
| 4 |
+
- reinforcement-learning
|
| 5 |
+
- humanoid
|
| 6 |
+
- mujoco
|
| 7 |
+
- humanoid-bench
|
| 8 |
+
- locomotion
|
| 9 |
+
- unitree-h1
|
| 10 |
+
- unitree-g1
|
| 11 |
+
- model-based-rl
|
| 12 |
+
- mpc
|
| 13 |
+
datasets:
|
| 14 |
+
- carlosferrazza/humanoid-bench
|
| 15 |
+
license: mit
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
# HumanoidBench-TD-MPC2 ยท ่ช่ฎญ้ๅ
ณ checkpoints
|
| 19 |
+
|
| 20 |
+
_Self-trained TD-MPC2 checkpoints on HumanoidBench locomotion tasks._
|
| 21 |
+
|
| 22 |
+
> ๐ **่ฎญ็ปๆบ็ / Training source**: <https://github.com/vitorcen/humanoid-training>
|
| 23 |
+
> ๅฎๆด่ฎญ็ป่ๆฌใpatchesใeval harnessใๅๆๆๆกฃๅ
จๅจ GitHub ้
ๅฅไปๅบใ
|
| 24 |
+
> _Full training scripts, patches, eval harness, and analysis docs in the companion GitHub repo._
|
| 25 |
+
|
| 26 |
+
TD-MPC2 ๆฏ model-based RL ็ฎๆณ๏ผ็ปๅ world model + sample-based MPC planningใ
|
| 27 |
+
ๆฌไปๅบๆถๅฝๅจ [HumanoidBench](https://github.com/carlosferrazza/humanoid-bench) ไธ**ไป้ถ่ช่ฎญ**็ checkpointsใ
|
| 28 |
+
|
| 29 |
+
_TD-MPC2 is a model-based RL algorithm combining a world model with sample-based MPC planning. This repo hosts checkpoints **trained from scratch** on HumanoidBench tasks._
|
| 30 |
+
|
| 31 |
+
---
|
| 32 |
+
|
| 33 |
+
## ๐ ๆง่ฝ / Performance
|
| 34 |
+
|
| 35 |
+
| Task | success_rate | mean_return | N | mean_steps | ๅคๆณจ |
|
| 36 |
+
|---|---|---|---|---|---|
|
| 37 |
+
| **`h1-walk-v0`** | **100%** | **816.7** | 3 | 1000/1000 | ่ฎญ็ปๅ
จ็จ็จณๅฎ๏ผไป step 800k ่ตท success=100% |
|
| 38 |
+
| **`g1-walk-v0`** | **50%** | **601.7 ยฑ 271.1** | 6 | 755/1000 | ้ซๆนๅทฎ๏ผ1/6 ้ early fall |
|
| 39 |
+
|
| 40 |
+
`success_bar = 700` (HumanoidBench locomotion threshold). _Success = episode return โฅ success_bar._
|
| 41 |
+
|
| 42 |
+
### ่ง้ข้ข่ง / Video preview
|
| 43 |
+
|
| 44 |
+
ๆฏไธชไปปๅกๅญ็ฎๅฝไธๆ `preview.mp4` ๆผ็คบ deterministic eval๏ผๆไฝณ seed, GUI viewer ๅฝๅฑ๏ผ๏ผ
|
| 45 |
+
|
| 46 |
+
- **`TDMPC2+HBench-h1-walk-v0+0/preview.mp4`** โ H1 humanoid ๅฎๆด walking cycle๏ผ1000 ๆญฅไธๅ
|
| 47 |
+
- **`TDMPC2+HBench-g1-walk-v0+0/preview.mp4`** โ G1 humanoid ่ตฐ่ทฏ๏ผๅซๅถๅ่ธ่ท
|
| 48 |
+
|
| 49 |
+
### ๅฏนๆฏๅไปปๅก DR.Q ่ช่ฎญ ([wsagi/HumanoidBench-DR.Q](https://huggingface.co/wsagi/HumanoidBench-DR.Q))
|
| 50 |
+
|
| 51 |
+
| Task | Algo | Final step | mean_return | success_rate |
|
| 52 |
+
|---|---|---|---|---|
|
| 53 |
+
| h1-walk-v0 | DR.Q | 500k | 801 | 90% (N=10) |
|
| 54 |
+
| h1-walk-v0 | **TD-MPC2** (this) | 950k | **817** | **100%** (N=3) โญ |
|
| 55 |
+
| g1-walk-v0 | DR.Q PDBH | 500k | 711 | 70% (N=10) |
|
| 56 |
+
| g1-walk-v0 | **TD-MPC2** PDBH (this) | 950k | **602** | **50%** (N=6) |
|
| 57 |
+
|
| 58 |
+
**็ป่ฎบ**๏ผTD-MPC2 ๅจ **h1-walk** ไธ็ฅ่ DR.Q๏ผๅ step ่ๅด๏ผๆด็จณๅฎ๏ผ๏ผๅจๆด้พ็ **g1-walk** (37D + PDBH wrappers) ไธ่ฝๅไบ DR.Q๏ผไฝไปๆปก่ถณ โฅ30% ้ๅ
ณ้ๅผใ
|
| 59 |
+
|
| 60 |
+
_TD-MPC2 slightly outperforms DR.Q on h1-walk (more stable); falls behind DR.Q on the harder g1-walk task but still passes the 30% threshold._
|
| 61 |
+
|
| 62 |
+
---
|
| 63 |
+
|
| 64 |
+
## ๐ง ่ฎญ็ป้
็ฝฎ / Training config
|
| 65 |
+
|
| 66 |
+
| Task | Robot | act_dim | Wrappers | Steps | Hardware | Wall time |
|
| 67 |
+
|---|---|---|---|---|---|---|
|
| 68 |
+
| `h1-walk-v0` | Unitree H1 | 19 | none | 1M | 4090 24GB | ~24h |
|
| 69 |
+
| `g1-walk-v0` | Unitree G1 | 23 | PD + BlockedHands | 1M | AutoDL 4080S 32GB | ~22h (3-seed parallel) |
|
| 70 |
+
|
| 71 |
+
- **Algorithm**: TD-MPC2 `model_size=5` (small, ~16M params)
|
| 72 |
+
- **Seed**: 0 for h1-walk; 0 for g1-walk (best of 3 seeds 0/10/20, multi-seed parallel on same GPU)
|
| 73 |
+
- **Multi-seed parallel pattern**: see [feedback_tdmpc2_multiseed.md](https://github.com/vitorcen/humanoid-training/blob/main/.claude/memory/feedback_tdmpc2_multiseed.md) โ 3 seeds time-slice one GPU, util 15% โ 98%, total throughput 2.7ร
|
| 74 |
+
|
| 75 |
+
### Patches applied to upstream submodules
|
| 76 |
+
|
| 77 |
+
Both **required** for G1-walk โ torque-only G1 will not learn to walk ([memory record](https://github.com/vitorcen/humanoid-training/blob/main/.claude/memory/project_benchmark_validation.md)):
|
| 78 |
+
|
| 79 |
+
- `patches/g1-pos-control.patch` โ replaces torque actuators with PD position actuators
|
| 80 |
+
- `patches/humanoid-bench-g1-and-lazy.patch` โ BlockedHands wrapper to freeze 14 finger DoFs (irrelevant noise for walk task)
|
| 81 |
+
- `patches/tdmpc2-save-agent.patch` โ fixes upstream TD-MPC2 to actually save weights every eval (the only patch required for h1-walk)
|
| 82 |
+
|
| 83 |
+
Apply with `bash patches/apply.sh` from the [training repo](https://github.com/vitorcen/humanoid-training).
|
| 84 |
+
|
| 85 |
+
---
|
| 86 |
+
|
| 87 |
+
## ๐ ๆจ็ / Inference
|
| 88 |
+
|
| 89 |
+
ๅฎๆด deterministic eval + GUI viewer ่ๆฌ๏ผ
|
| 90 |
+
|
| 91 |
+
- `scripts/tdmpc2_eval.py` โ N-ep JSONL eval (headless)
|
| 92 |
+
- `scripts/tdmpc2_viewer.py` โ GUI viewer (GLFW)
|
| 93 |
+
|
| 94 |
+
ๅๅจ [้
ๅฅ GitHub ไปๅบ](https://github.com/vitorcen/humanoid-training/tree/main/scripts)ใ
|
| 95 |
+
|
| 96 |
+
```bash
|
| 97 |
+
# headless N=10 eval
|
| 98 |
+
DISPLAY=:0 conda run -n humanoidbench python scripts/tdmpc2_eval.py \
|
| 99 |
+
--task humanoid_g1-walk-v0 \
|
| 100 |
+
--ckpt TDMPC2+HBench-g1-walk-v0+0/step_950000.pt \
|
| 101 |
+
--seed 0 --eval 10 --out g1_eval.jsonl
|
| 102 |
+
|
| 103 |
+
# GUI replay
|
| 104 |
+
DISPLAY=:0 conda run -n humanoidbench python scripts/tdmpc2_viewer.py \
|
| 105 |
+
--task humanoid_g1-walk-v0 \
|
| 106 |
+
--ckpt TDMPC2+HBench-g1-walk-v0+0/step_950000.pt \
|
| 107 |
+
--seed 0 --fps 50
|
| 108 |
+
```
|
| 109 |
+
|
| 110 |
+
---
|
| 111 |
+
|
| 112 |
+
## ๐ ไปๅบ็ปๆ / Repo layout
|
| 113 |
+
|
| 114 |
+
```
|
| 115 |
+
TDMPC2+HBench-h1-walk-v0+0/
|
| 116 |
+
โโโ step_950000.pt (32 MB โ agent + world model + critic)
|
| 117 |
+
โโโ train.log (~370 KB โ full training log)
|
| 118 |
+
โโโ ckpt_eval.csv (auto-eval per ckpt, N=3 quick)
|
| 119 |
+
โโโ preview.mp4 (515 KB โ GUI viewer recording)
|
| 120 |
+
|
| 121 |
+
TDMPC2+HBench-g1-walk-v0+0/
|
| 122 |
+
โโโ step_950000.pt (32 MB)
|
| 123 |
+
โโโ train.log (~700 KB)
|
| 124 |
+
โโโ preview.mp4 (257 KB)
|
| 125 |
+
```
|
| 126 |
+
|
| 127 |
+
`+0` ่กจ็คบ seed=0ใๅ็ปญๅฆๆๅๅ
ถไป seed ไผๆ `+10` / `+20` ๅฝๅใ
|
| 128 |
+
|
| 129 |
+
---
|
| 130 |
+
|
| 131 |
+
## ๐ License & Attribution
|
| 132 |
+
|
| 133 |
+
- **Code**: MIT (consistent with [TD-MPC2](https://github.com/nicklashansen/tdmpc2) and [HumanoidBench](https://github.com/carlosferrazza/humanoid-bench) upstream)
|
| 134 |
+
- **Algorithm**: [TD-MPC2 (Hansen et al., 2024)](https://www.tdmpc2.com/)
|
| 135 |
+
- **Benchmark**: [HumanoidBench (Sferrazza et al., 2024)](https://arxiv.org/abs/2403.10506)
|
| 136 |
+
- **Trained by**: <https://github.com/vitorcen> on AutoDL infrastructure
|
TDMPC2+HBench-g1-walk-v0+0/preview.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:47b47c47d802d399552e6506d5faf781e8fab36395df8406b97704eef04b90bf
|
| 3 |
+
size 256928
|
TDMPC2+HBench-g1-walk-v0+0/step_950000.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a95587aaf9315a046668036b9224342e7227519d0e2902862b99f2bb30da77c2
|
| 3 |
+
size 32059682
|
TDMPC2+HBench-g1-walk-v0+0/train.log
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
TDMPC2+HBench-h1-walk-v0+0/ckpt_eval.csv
ADDED
|
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
timestamp,ckpt_mtime,agent_train_step,success_rate,mean_return,mean_steps,timeout_rate,n_ep,note,ckpt_file
|
| 2 |
+
1779813130,1779813085,0,0.0000,4.92,56.0,0.0000,3,ok,step_00000000.pt
|
| 3 |
+
1779814213,1779814198,50022,,,,,3,"error: 2_1779814210.jsonl
|
| 4 |
+
ERROR conda.cli.main_run:execute(127): `conda run python /hom",step_00050022.pt
|
| 5 |
+
1779815392,1779815356,100052,0.0000,86.95,154.7,0.0000,3,ok,step_00100052.pt
|
| 6 |
+
1779816573,1779816516,150005,0.0000,85.98,211.3,0.0000,3,ok,step_00150005.pt
|
| 7 |
+
1779817757,1779817679,200101,0.0000,200.00,305.0,0.0000,3,ok,step_00200101.pt
|
| 8 |
+
1779818941,1779818841,250153,0.0000,131.87,286.0,0.0000,3,ok,step_00250153.pt
|
| 9 |
+
1779820041,1779820004,300185,0.0000,290.06,438.0,0.0000,3,ok,step_00300185.pt
|
| 10 |
+
1779821236,1779821168,350149,0.0000,426.35,590.7,0.0000,3,ok,step_00350149.pt
|
| 11 |
+
1779822443,1779822357,400925,0.6667,678.70,913.0,0.6667,3,ok,step_00400925.pt
|
| 12 |
+
1779823563,1779823515,450310,1.0000,784.77,1000.0,1.0000,3,ok,step_00450310.pt
|
| 13 |
+
1779824773,1779824695,500555,1.0000,815.97,1000.0,1.0000,3,ok,step_00500555.pt
|
| 14 |
+
1779825981,1779825866,550378,0.6667,721.70,953.0,0.3333,3,ok,step_00550378.pt
|
| 15 |
+
1779827101,1779827050,600760,1.0000,813.39,1000.0,1.0000,3,ok,step_00600760.pt
|
| 16 |
+
1779828307,1779828221,650589,0.3333,663.86,897.0,0.6667,3,ok,step_00650589.pt
|
| 17 |
+
1779829517,1779829401,700870,1.0000,810.15,1000.0,1.0000,3,ok,step_00700870.pt
|
| 18 |
+
1779830637,1779830563,750356,1.0000,804.41,1000.0,1.0000,3,ok,step_00750356.pt
|
| 19 |
+
1779831848,1779831732,800098,1.0000,817.04,1000.0,1.0000,3,ok,step_00800098.pt
|
| 20 |
+
1779832930,1779832919,850572,,,,,3,"error: 2_1779832928.jsonl
|
| 21 |
+
ERROR conda.cli.main_run:execute(127): `conda run python /hom",step_00850572.pt
|
| 22 |
+
1779834140,1779834083,900572,1.0000,779.23,986.0,0.3333,3,ok,step_00900572.pt
|
| 23 |
+
1779835350,1779835256,950572,1.0000,816.66,1000.0,1.0000,3,ok,step_00950572.pt
|
TDMPC2+HBench-h1-walk-v0+0/preview.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2bad368d1a3dad93006d1d0a56f9ead19fbbd8e9f2ec62f1b199ca96e1b5c70f
|
| 3 |
+
size 515283
|
TDMPC2+HBench-h1-walk-v0+0/step_950000.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:21f44ac488018f64f9852b3534c41c823f50f2dc55b2cffa06cc475a921ff366
|
| 3 |
+
size 31908130
|
TDMPC2+HBench-h1-walk-v0+0/train.log
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|