wsagi commited on
Commit
fd37551
ยท
verified ยท
1 Parent(s): 71316ba

Add files using upload-large-folder tool

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ TDMPC2+HBench-h1-walk-v0+0/preview.mp4 filter=lfs diff=lfs merge=lfs -text
37
+ TDMPC2+HBench-g1-walk-v0+0/preview.mp4 filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: tdmpc2
3
+ tags:
4
+ - reinforcement-learning
5
+ - humanoid
6
+ - mujoco
7
+ - humanoid-bench
8
+ - locomotion
9
+ - unitree-h1
10
+ - unitree-g1
11
+ - model-based-rl
12
+ - mpc
13
+ datasets:
14
+ - carlosferrazza/humanoid-bench
15
+ license: mit
16
+ ---
17
+
18
+ # HumanoidBench-TD-MPC2 ยท ่‡ช่ฎญ้€šๅ…ณ checkpoints
19
+
20
+ _Self-trained TD-MPC2 checkpoints on HumanoidBench locomotion tasks._
21
+
22
+ > ๐Ÿ›  **่ฎญ็ปƒๆบ็  / Training source**: <https://github.com/vitorcen/humanoid-training>
23
+ > ๅฎŒๆ•ด่ฎญ็ปƒ่„šๆœฌใ€patchesใ€eval harnessใ€ๅˆ†ๆžๆ–‡ๆกฃๅ…จๅœจ GitHub ้…ๅฅ—ไป“ๅบ“ใ€‚
24
+ > _Full training scripts, patches, eval harness, and analysis docs in the companion GitHub repo._
25
+
26
+ TD-MPC2 ๆ˜ฏ model-based RL ็ฎ—ๆณ•๏ผŒ็ป“ๅˆ world model + sample-based MPC planningใ€‚
27
+ ๆœฌไป“ๅบ“ๆ”ถๅฝ•ๅœจ [HumanoidBench](https://github.com/carlosferrazza/humanoid-bench) ไธŠ**ไปŽ้›ถ่‡ช่ฎญ**็š„ checkpointsใ€‚
28
+
29
+ _TD-MPC2 is a model-based RL algorithm combining a world model with sample-based MPC planning. This repo hosts checkpoints **trained from scratch** on HumanoidBench tasks._
30
+
31
+ ---
32
+
33
+ ## ๐Ÿ“Š ๆ€ง่ƒฝ / Performance
34
+
35
+ | Task | success_rate | mean_return | N | mean_steps | ๅค‡ๆณจ |
36
+ |---|---|---|---|---|---|
37
+ | **`h1-walk-v0`** | **100%** | **816.7** | 3 | 1000/1000 | ่ฎญ็ปƒๅ…จ็จ‹็จณๅฎš๏ผŒไปŽ step 800k ่ตท success=100% |
38
+ | **`g1-walk-v0`** | **50%** | **601.7 ยฑ 271.1** | 6 | 755/1000 | ้ซ˜ๆ–นๅทฎ๏ผŒ1/6 ้›† early fall |
39
+
40
+ `success_bar = 700` (HumanoidBench locomotion threshold). _Success = episode return โ‰ฅ success_bar._
41
+
42
+ ### ่ง†้ข‘้ข„่งˆ / Video preview
43
+
44
+ ๆฏไธชไปปๅŠกๅญ็›ฎๅฝ•ไธ‹ๆœ‰ `preview.mp4` ๆผ”็คบ deterministic eval๏ผˆๆœ€ไฝณ seed, GUI viewer ๅฝ•ๅฑ๏ผ‰๏ผš
45
+
46
+ - **`TDMPC2+HBench-h1-walk-v0+0/preview.mp4`** โ€” H1 humanoid ๅฎŒๆ•ด walking cycle๏ผŒ1000 ๆญฅไธๅ€’
47
+ - **`TDMPC2+HBench-g1-walk-v0+0/preview.mp4`** โ€” G1 humanoid ่ตฐ่ทฏ๏ผŒๅซๅถๅ‘่ธ‰่ท„
48
+
49
+ ### ๅฏนๆฏ”ๅŒไปปๅŠก DR.Q ่‡ช่ฎญ ([wsagi/HumanoidBench-DR.Q](https://huggingface.co/wsagi/HumanoidBench-DR.Q))
50
+
51
+ | Task | Algo | Final step | mean_return | success_rate |
52
+ |---|---|---|---|---|
53
+ | h1-walk-v0 | DR.Q | 500k | 801 | 90% (N=10) |
54
+ | h1-walk-v0 | **TD-MPC2** (this) | 950k | **817** | **100%** (N=3) โญ |
55
+ | g1-walk-v0 | DR.Q PDBH | 500k | 711 | 70% (N=10) |
56
+ | g1-walk-v0 | **TD-MPC2** PDBH (this) | 950k | **602** | **50%** (N=6) |
57
+
58
+ **็ป“่ฎบ**๏ผšTD-MPC2 ๅœจ **h1-walk** ไธŠ็•ฅ่ƒœ DR.Q๏ผˆๅŒ step ่Œƒๅ›ด๏ผŒๆ›ด็จณๅฎš๏ผ‰๏ผ›ๅœจๆ›ด้šพ็š„ **g1-walk** (37D + PDBH wrappers) ไธŠ่ฝๅŽไบŽ DR.Q๏ผŒไฝ†ไปๆปก่ถณ โ‰ฅ30% ้€šๅ…ณ้˜ˆๅ€ผใ€‚
59
+
60
+ _TD-MPC2 slightly outperforms DR.Q on h1-walk (more stable); falls behind DR.Q on the harder g1-walk task but still passes the 30% threshold._
61
+
62
+ ---
63
+
64
+ ## ๐Ÿ”ง ่ฎญ็ปƒ้…็ฝฎ / Training config
65
+
66
+ | Task | Robot | act_dim | Wrappers | Steps | Hardware | Wall time |
67
+ |---|---|---|---|---|---|---|
68
+ | `h1-walk-v0` | Unitree H1 | 19 | none | 1M | 4090 24GB | ~24h |
69
+ | `g1-walk-v0` | Unitree G1 | 23 | PD + BlockedHands | 1M | AutoDL 4080S 32GB | ~22h (3-seed parallel) |
70
+
71
+ - **Algorithm**: TD-MPC2 `model_size=5` (small, ~16M params)
72
+ - **Seed**: 0 for h1-walk; 0 for g1-walk (best of 3 seeds 0/10/20, multi-seed parallel on same GPU)
73
+ - **Multi-seed parallel pattern**: see [feedback_tdmpc2_multiseed.md](https://github.com/vitorcen/humanoid-training/blob/main/.claude/memory/feedback_tdmpc2_multiseed.md) โ€” 3 seeds time-slice one GPU, util 15% โ†’ 98%, total throughput 2.7ร—
74
+
75
+ ### Patches applied to upstream submodules
76
+
77
+ Both **required** for G1-walk โ€” torque-only G1 will not learn to walk ([memory record](https://github.com/vitorcen/humanoid-training/blob/main/.claude/memory/project_benchmark_validation.md)):
78
+
79
+ - `patches/g1-pos-control.patch` โ€” replaces torque actuators with PD position actuators
80
+ - `patches/humanoid-bench-g1-and-lazy.patch` โ€” BlockedHands wrapper to freeze 14 finger DoFs (irrelevant noise for walk task)
81
+ - `patches/tdmpc2-save-agent.patch` โ€” fixes upstream TD-MPC2 to actually save weights every eval (the only patch required for h1-walk)
82
+
83
+ Apply with `bash patches/apply.sh` from the [training repo](https://github.com/vitorcen/humanoid-training).
84
+
85
+ ---
86
+
87
+ ## ๐Ÿš€ ๆŽจ็† / Inference
88
+
89
+ ๅฎŒๆ•ด deterministic eval + GUI viewer ่„šๆœฌ๏ผš
90
+
91
+ - `scripts/tdmpc2_eval.py` โ€” N-ep JSONL eval (headless)
92
+ - `scripts/tdmpc2_viewer.py` โ€” GUI viewer (GLFW)
93
+
94
+ ๅ‡ๅœจ [้…ๅฅ— GitHub ไป“ๅบ“](https://github.com/vitorcen/humanoid-training/tree/main/scripts)ใ€‚
95
+
96
+ ```bash
97
+ # headless N=10 eval
98
+ DISPLAY=:0 conda run -n humanoidbench python scripts/tdmpc2_eval.py \
99
+ --task humanoid_g1-walk-v0 \
100
+ --ckpt TDMPC2+HBench-g1-walk-v0+0/step_950000.pt \
101
+ --seed 0 --eval 10 --out g1_eval.jsonl
102
+
103
+ # GUI replay
104
+ DISPLAY=:0 conda run -n humanoidbench python scripts/tdmpc2_viewer.py \
105
+ --task humanoid_g1-walk-v0 \
106
+ --ckpt TDMPC2+HBench-g1-walk-v0+0/step_950000.pt \
107
+ --seed 0 --fps 50
108
+ ```
109
+
110
+ ---
111
+
112
+ ## ๐Ÿ“ ไป“ๅบ“็ป“ๆž„ / Repo layout
113
+
114
+ ```
115
+ TDMPC2+HBench-h1-walk-v0+0/
116
+ โ”œโ”€โ”€ step_950000.pt (32 MB โ€” agent + world model + critic)
117
+ โ”œโ”€โ”€ train.log (~370 KB โ€” full training log)
118
+ โ”œโ”€โ”€ ckpt_eval.csv (auto-eval per ckpt, N=3 quick)
119
+ โ””โ”€โ”€ preview.mp4 (515 KB โ€” GUI viewer recording)
120
+
121
+ TDMPC2+HBench-g1-walk-v0+0/
122
+ โ”œโ”€โ”€ step_950000.pt (32 MB)
123
+ โ”œโ”€โ”€ train.log (~700 KB)
124
+ โ””โ”€โ”€ preview.mp4 (257 KB)
125
+ ```
126
+
127
+ `+0` ่กจ็คบ seed=0ใ€‚ๅŽ็ปญๅฆ‚ๆžœๅ‘ๅ…ถไป– seed ไผšๆŒ‰ `+10` / `+20` ๅ‘ฝๅใ€‚
128
+
129
+ ---
130
+
131
+ ## ๐Ÿ“œ License & Attribution
132
+
133
+ - **Code**: MIT (consistent with [TD-MPC2](https://github.com/nicklashansen/tdmpc2) and [HumanoidBench](https://github.com/carlosferrazza/humanoid-bench) upstream)
134
+ - **Algorithm**: [TD-MPC2 (Hansen et al., 2024)](https://www.tdmpc2.com/)
135
+ - **Benchmark**: [HumanoidBench (Sferrazza et al., 2024)](https://arxiv.org/abs/2403.10506)
136
+ - **Trained by**: <https://github.com/vitorcen> on AutoDL infrastructure
TDMPC2+HBench-g1-walk-v0+0/preview.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:47b47c47d802d399552e6506d5faf781e8fab36395df8406b97704eef04b90bf
3
+ size 256928
TDMPC2+HBench-g1-walk-v0+0/step_950000.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a95587aaf9315a046668036b9224342e7227519d0e2902862b99f2bb30da77c2
3
+ size 32059682
TDMPC2+HBench-g1-walk-v0+0/train.log ADDED
The diff for this file is too large to render. See raw diff
 
TDMPC2+HBench-h1-walk-v0+0/ckpt_eval.csv ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ timestamp,ckpt_mtime,agent_train_step,success_rate,mean_return,mean_steps,timeout_rate,n_ep,note,ckpt_file
2
+ 1779813130,1779813085,0,0.0000,4.92,56.0,0.0000,3,ok,step_00000000.pt
3
+ 1779814213,1779814198,50022,,,,,3,"error: 2_1779814210.jsonl
4
+ ERROR conda.cli.main_run:execute(127): `conda run python /hom",step_00050022.pt
5
+ 1779815392,1779815356,100052,0.0000,86.95,154.7,0.0000,3,ok,step_00100052.pt
6
+ 1779816573,1779816516,150005,0.0000,85.98,211.3,0.0000,3,ok,step_00150005.pt
7
+ 1779817757,1779817679,200101,0.0000,200.00,305.0,0.0000,3,ok,step_00200101.pt
8
+ 1779818941,1779818841,250153,0.0000,131.87,286.0,0.0000,3,ok,step_00250153.pt
9
+ 1779820041,1779820004,300185,0.0000,290.06,438.0,0.0000,3,ok,step_00300185.pt
10
+ 1779821236,1779821168,350149,0.0000,426.35,590.7,0.0000,3,ok,step_00350149.pt
11
+ 1779822443,1779822357,400925,0.6667,678.70,913.0,0.6667,3,ok,step_00400925.pt
12
+ 1779823563,1779823515,450310,1.0000,784.77,1000.0,1.0000,3,ok,step_00450310.pt
13
+ 1779824773,1779824695,500555,1.0000,815.97,1000.0,1.0000,3,ok,step_00500555.pt
14
+ 1779825981,1779825866,550378,0.6667,721.70,953.0,0.3333,3,ok,step_00550378.pt
15
+ 1779827101,1779827050,600760,1.0000,813.39,1000.0,1.0000,3,ok,step_00600760.pt
16
+ 1779828307,1779828221,650589,0.3333,663.86,897.0,0.6667,3,ok,step_00650589.pt
17
+ 1779829517,1779829401,700870,1.0000,810.15,1000.0,1.0000,3,ok,step_00700870.pt
18
+ 1779830637,1779830563,750356,1.0000,804.41,1000.0,1.0000,3,ok,step_00750356.pt
19
+ 1779831848,1779831732,800098,1.0000,817.04,1000.0,1.0000,3,ok,step_00800098.pt
20
+ 1779832930,1779832919,850572,,,,,3,"error: 2_1779832928.jsonl
21
+ ERROR conda.cli.main_run:execute(127): `conda run python /hom",step_00850572.pt
22
+ 1779834140,1779834083,900572,1.0000,779.23,986.0,0.3333,3,ok,step_00900572.pt
23
+ 1779835350,1779835256,950572,1.0000,816.66,1000.0,1.0000,3,ok,step_00950572.pt
TDMPC2+HBench-h1-walk-v0+0/preview.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2bad368d1a3dad93006d1d0a56f9ead19fbbd8e9f2ec62f1b199ca96e1b5c70f
3
+ size 515283
TDMPC2+HBench-h1-walk-v0+0/step_950000.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:21f44ac488018f64f9852b3534c41c823f50f2dc55b2cffa06cc475a921ff366
3
+ size 31908130
TDMPC2+HBench-h1-walk-v0+0/train.log ADDED
The diff for this file is too large to render. See raw diff