wsagi commited on
Commit
6ba1694
·
verified ·
1 Parent(s): fee7459

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +36 -27
README.md CHANGED
@@ -15,17 +15,17 @@ datasets:
15
  language:
16
  - en
17
  ---
18
-
19
  # DiffusionPolicy-PickOrange
20
 
21
- 针对 [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) 任务**从头训练**的 LeRobot Diffusion Policy(~267M,UNet 1D + ResNet18 vision encoder),**已 hot-swap 到 DDIM 32-step inference**(不重训,直接改 ckpt `config.json`)。
22
- _A LeRobot Diffusion Policy (~267M, UNet 1D + ResNet18 vision encoder) **trained from scratch** on the [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) task. **DDIM 32-step inference hot-swapped into the ckpt config** without retraining._
23
 
24
  ![DP eval — SO-101 PickOrange](dp-pick-orange.jpg)
25
 
26
  **🔗 项目仓库 / Project repos**:
 
27
  - [vitorcen/isaaclab-experience](https://github.com/vitorcen/isaaclab-experience) — Isaac Lab + LeIsaac 多策略横评(parent project)
28
- - [vitorcen/LeIsaac](https://github.com/vitorcen/LeIsaac) — LeIsaac fork(训练脚本 + 设计文档 / training scripts + design docs)
29
 
30
  ## TL;DR
31
 
@@ -40,6 +40,7 @@ _A LeRobot Diffusion Policy (~267M, UNet 1D + ResNet18 vision encoder) **trained
40
  _Probabilistic outcomes across runs — full distribution from 0/3 to 3/3 observed, with **some rounds completing all 3 oranges**. Diffusion sampling is inherently stochastic; multi-round averaging required for meaningful comparison._
41
 
42
  ## 模型亮点
 
43
  _Highlights_
44
 
45
  - **DDIM scheduler hot-swap 不重训**:DP 论文里 DDPM 100-step 是标配,但 100 步串行采样 → 393 ms/chunk → slowdown 2.96x,4090 实时性吃力。DDIM 是 DDPM 的确定性子集,**可以直接 swap config 不重训权重**。32-step 是 4090 sweet spot。
@@ -49,34 +50,36 @@ _Highlights_
49
  - **从头训练,无 pretrained vision backbone**:ResNet18 vision encoder 是 LeRobot diffusion 默认 from-scratch 设置,没用 ImageNet pretrain。60 episode 数据撑起一个 visuomotor 任务的极限测试。
50
 
51
  ## 训练配方
 
52
  _Training recipe_
53
 
54
- | 项 / Item | 值 / Value |
55
- |---|---|
56
- | Dataset | `LightwheelAI/leisaac-pick-orange` (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz) |
57
- | Policy | `diffusion` (LeRobot 实现 / LeRobot impl.) |
58
- | Vision encoder | ResNet18(from scratch / no ImageNet pretrain) |
59
- | Action head | UNet 1D denoiser |
60
- | `n_action_steps` (输出 / output chunk) | 8 |
61
- | Noise scheduler (训练 / training) | DDPM, 100 steps |
62
- | Noise scheduler (推理 / inference) | **DDIM, 32 steps**(hot-swapped post-training) |
63
- | Steps | 100,000 |
64
- | Optimizer | AdamW |
65
- | Hardware | RTX 4090 (24 GB) |
66
- | Recipe credit | LeRobot diffusion baseline, [Diffusion Policy paper (Chi et al. 2023)](https://diffusion-policy.cs.columbia.edu/) |
67
-
68
- 训练入口脚本(在我们的 LeIsaac fork):[`scripts/training/diffusion_policy/train.sh`](https://github.com/vitorcen/LeIsaac/blob/main/scripts/training/diffusion_policy/train.sh)。
69
- _Training entrypoint in our fork: [`scripts/training/diffusion_policy/train.sh`](https://github.com/vitorcen/LeIsaac/blob/main/scripts/training/diffusion_policy/train.sh)._
70
 
71
  ## 评测结果
 
72
  _Eval results_
73
 
74
  测试环境 / Test setup:Isaac Sim 5.1,task `LeIsaac-SO101-PickOrange-v0`,`episode_length_s=120`,`step_hz=60`(DP 训练时 sim rate),dual-cam 观测,`policy_action_horizon=16`。
75
  _Test setup: Isaac Sim 5.1, dual-cam observation, `step_hz=60` matching training, `policy_action_horizon=16`._
76
 
77
- | 配置 / Config | 推理延迟 | 观察到的结果分布 | 备注 |
78
- |---|---|---|---|
79
- | DDPM 100-step (无 swap) | 393 ms/chunk, 2.96x slowdown | ⚠️ 多次 timeout | 实时性吃力,运动严重滞后 |
80
  | **DDIM 32-step (本 ckpt 默认)** | **147 ms/chunk, 1.1x slowdown** | **0/3 / 1/3 / 2/3 / 3/3 全谱出现** | 部分轮能完整放完 3 颗 ✅ |
81
 
82
  **关键观察 / Key observations**:
@@ -94,6 +97,7 @@ _Rigorous comparison requires `eval_rounds=10+`. Single-round inferences are mis
94
  ## ⚠️ 推理关键配置 / Critical inference setting
95
 
96
  ### 1. DDIM hot-swap(已应用于本 ckpt)
 
97
  _DDIM hot-swap (already applied in this ckpt)_
98
 
99
  `config.json` 中的关键字段(本 repo 已设置):
@@ -128,7 +132,7 @@ max_steps = (target - overhead) / per_step ≈ 23 (安全档 / safe)
128
  实测 / Measured on 4090: 30 → 2/3 oranges, **32 → 可见 3/3 完整 success**, 50 → 爆 3D 算力 OOM-like behavior。
129
  _Tested on 4090: 30 → 2/3, **32 → full 3/3 success observed**, 50 → 3D rendering choked._
130
 
131
- **弱卡建议 / Weaker GPU recommendation**: 3060 ~10 ms/step,sweet spot ~ **7-8 steps**。完整 calibration 见 [设计文档](https://github.com/vitorcen/LeIsaac/blob/main/docs/training/dp_inference_speedup_and_dynamic_timeout.html)。
132
 
133
  ### 3. Action horizon 配置 / Action horizon setting
134
 
@@ -142,6 +146,7 @@ _DP outputs `n_action_steps=8` (fixed); the server auto-caps client `policy_acti
142
  ```
143
 
144
  ## 使用方法
 
145
  _Usage_
146
 
147
  ### 1. 启动 LeRobot async policy_server
@@ -151,7 +156,7 @@ pip install lerobot
151
  python -m lerobot.async_inference.policy_server --host 0.0.0.0 --port 8080
152
  ```
153
 
154
- ### 2. 通过 [vitorcen/LeIsaac](https://github.com/vitorcen/LeIsaac) fork 启动 eval
155
 
156
  ```bash
157
  cd LeIsaac
@@ -172,6 +177,7 @@ bash scripts/evaluation/run_eval.sh -- \
172
  _Use `eval_rounds=10` to average success rate (DP is stochastic; single samples mislead)._
173
 
174
  ## 局限性
 
175
  _Limitations_
176
 
177
  - **Stochastic success**:每次 diffusion 采样初值不同,相同 ckpt 同 config 也会有 run-to-run 差异。**不建议**用单 round 结论判断模型好坏。
@@ -184,16 +190,18 @@ _Limitations_
184
  _No image augmentation or domain randomization → real-world transfer is likely weak._
185
 
186
  ## 相关
 
187
  _Related_
188
 
189
  - 同任务对照 / Same-task comparisons:
190
  - [`wsagi/ACT-PickOrange`](https://huggingface.co/wsagi/ACT-PickOrange) — 自训 ACT (~80M),1/1 deterministic success @ horizon=32
191
  - [`shadowHokage/act_policy`](https://huggingface.co/shadowHokage/act_policy) — 社区 ACT,1/1 (deterministic)
192
  - [`LightwheelAI/leisaac-pick-orange-v0`](https://huggingface.co/LightwheelAI/leisaac-pick-orange-v0) — GR00T N1.5 SOTA (~3B),~30s 完成 3 颗
193
- - 完整训练 + eval 配方:[vitorcen/LeIsaac](https://github.com/vitorcen/LeIsaac) fork
194
- - 设计文档 / Design doc:[`docs/training/dp_inference_speedup_and_dynamic_timeout.html`](https://github.com/vitorcen/LeIsaac/blob/main/docs/training/dp_inference_speedup_and_dynamic_timeout.html) — DDIM swap + dynamic timeout 完整 postmortem(含 SVG 拟合曲线)
195
 
196
  ## 致谢
 
197
  _Acknowledgments_
198
 
199
  - LeIsaac 团队 + LightwheelAI 提供任务环境和数据集
@@ -202,6 +210,7 @@ _Acknowledgments_
202
  - DDIM scheduler swap inspired by HuggingFace `diffusers` library
203
 
204
  ## 引用
 
205
  _Citation_
206
 
207
  ```bibtex
 
15
  language:
16
  - en
17
  ---
 
18
  # DiffusionPolicy-PickOrange
19
 
20
+ 针对 [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) 任务**从头训练**的 LeRobot Diffusion Policy(267M,UNet 1D + ResNet18 vision encoder),**已 hot-swap 到 DDIM 32-step inference**(不重训,直接改 ckpt `config.json`)。
21
+ _A LeRobot Diffusion Policy (267M, UNet 1D + ResNet18 vision encoder) **trained from scratch** on the [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) task. **DDIM 32-step inference hot-swapped into the ckpt config** without retraining._
22
 
23
  ![DP eval — SO-101 PickOrange](dp-pick-orange.jpg)
24
 
25
  **🔗 项目仓库 / Project repos**:
26
+
27
  - [vitorcen/isaaclab-experience](https://github.com/vitorcen/isaaclab-experience) — Isaac Lab + LeIsaac 多策略横评(parent project)
28
+ - [vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) — LeIsaac fork(训练脚本 + 设计文档 / training scripts + design docs)
29
 
30
  ## TL;DR
31
 
 
40
  _Probabilistic outcomes across runs — full distribution from 0/3 to 3/3 observed, with **some rounds completing all 3 oranges**. Diffusion sampling is inherently stochastic; multi-round averaging required for meaningful comparison._
41
 
42
  ## 模型亮点
43
+
44
  _Highlights_
45
 
46
  - **DDIM scheduler hot-swap 不重训**:DP 论文里 DDPM 100-step 是标配,但 100 步串行采样 → 393 ms/chunk → slowdown 2.96x,4090 实时性吃力。DDIM 是 DDPM 的确定性子集,**可以直接 swap config 不重训权重**。32-step 是 4090 sweet spot。
 
50
  - **从头训练,无 pretrained vision backbone**:ResNet18 vision encoder 是 LeRobot diffusion 默认 from-scratch 设置,没用 ImageNet pretrain。60 episode 数据撑起一个 visuomotor 任务的极限测试。
51
 
52
  ## 训练配方
53
+
54
  _Training recipe_
55
 
56
+ | 项 / Item | 值 / Value |
57
+ | ---------------------------------------- | ------------------------------------------------------------------------------------------------------------- |
58
+ | Dataset | `LightwheelAI/leisaac-pick-orange` (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz) |
59
+ | Policy | `diffusion` (LeRobot 实现 / LeRobot impl.) |
60
+ | Vision encoder | ResNet18(from scratch / no ImageNet pretrain) |
61
+ | Action head | UNet 1D denoiser |
62
+ | `n_action_steps` (输出 / output chunk) | 8 |
63
+ | Noise scheduler (训练 / training) | DDPM, 100 steps |
64
+ | Noise scheduler (推理 / inference) | **DDIM, 32 steps**(hot-swapped post-training) |
65
+ | Steps | 100,000 |
66
+ | Optimizer | AdamW |
67
+ | Hardware | RTX 4090 (24 GB) |
68
+ | Recipe credit | LeRobot diffusion baseline,[Diffusion Policy paper (Chi et al. 2023)](https://diffusion-policy.cs.columbia.edu/) |
69
+
70
+ 训练入口脚本(在我们的 LeIsaac fork):[`scripts/training/diffusion_policy/train.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/diffusion_policy/train.sh)。
71
+ _Training entrypoint in our fork: [`scripts/training/diffusion_policy/train.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/diffusion_policy/train.sh)._
72
 
73
  ## 评测结果
74
+
75
  _Eval results_
76
 
77
  测试环境 / Test setup:Isaac Sim 5.1,task `LeIsaac-SO101-PickOrange-v0`,`episode_length_s=120`,`step_hz=60`(DP 训练时 sim rate),dual-cam 观测,`policy_action_horizon=16`。
78
  _Test setup: Isaac Sim 5.1, dual-cam observation, `step_hz=60` matching training, `policy_action_horizon=16`._
79
 
80
+ | 配置 / Config | 推理延迟 | 观察到的结果分布 | 备注 |
81
+ | ------------------------------------- | ------------------------------------- | ---------------------------------------- | ------------------------ |
82
+ | DDPM 100-step (无 swap) | 393 ms/chunk, 2.96x slowdown | ⚠️ 多次 timeout | 实时性吃力,运动严重滞后 |
83
  | **DDIM 32-step (本 ckpt 默认)** | **147 ms/chunk, 1.1x slowdown** | **0/3 / 1/3 / 2/3 / 3/3 全谱出现** | 部分轮能完整放完 3 颗 ✅ |
84
 
85
  **关键观察 / Key observations**:
 
97
  ## ⚠️ 推理关键配置 / Critical inference setting
98
 
99
  ### 1. DDIM hot-swap(已应用于本 ckpt)
100
+
101
  _DDIM hot-swap (already applied in this ckpt)_
102
 
103
  `config.json` 中的关键字段(本 repo 已设置):
 
132
  实测 / Measured on 4090: 30 → 2/3 oranges, **32 → 可见 3/3 完整 success**, 50 → 爆 3D 算力 OOM-like behavior。
133
  _Tested on 4090: 30 → 2/3, **32 → full 3/3 success observed**, 50 → 3D rendering choked._
134
 
135
+ **弱卡建议 / Weaker GPU recommendation**: 3060 ~10 ms/step,sweet spot ~ **7-8 steps**。完整 calibration 见 [设计文档](https://github.com/vitorcen/LeIsaac-Training/blob/main/docs/training/dp_inference_speedup_and_dynamic_timeout.html)。
136
 
137
  ### 3. Action horizon 配置 / Action horizon setting
138
 
 
146
  ```
147
 
148
  ## 使用方法
149
+
150
  _Usage_
151
 
152
  ### 1. 启动 LeRobot async policy_server
 
156
  python -m lerobot.async_inference.policy_server --host 0.0.0.0 --port 8080
157
  ```
158
 
159
+ ### 2. 通过 [vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) fork 启动 eval
160
 
161
  ```bash
162
  cd LeIsaac
 
177
  _Use `eval_rounds=10` to average success rate (DP is stochastic; single samples mislead)._
178
 
179
  ## 局限性
180
+
181
  _Limitations_
182
 
183
  - **Stochastic success**:每次 diffusion 采样初值不同,相同 ckpt 同 config 也会有 run-to-run 差异。**不建议**用单 round 结论判断模型好坏。
 
190
  _No image augmentation or domain randomization → real-world transfer is likely weak._
191
 
192
  ## 相关
193
+
194
  _Related_
195
 
196
  - 同任务对照 / Same-task comparisons:
197
  - [`wsagi/ACT-PickOrange`](https://huggingface.co/wsagi/ACT-PickOrange) — 自训 ACT (~80M),1/1 deterministic success @ horizon=32
198
  - [`shadowHokage/act_policy`](https://huggingface.co/shadowHokage/act_policy) — 社区 ACT,1/1 (deterministic)
199
  - [`LightwheelAI/leisaac-pick-orange-v0`](https://huggingface.co/LightwheelAI/leisaac-pick-orange-v0) — GR00T N1.5 SOTA (~3B),~30s 完成 3 颗
200
+ - 完整训练 + eval 配方:[vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) fork
201
+ - 设计文档 / Design doc:[`docs/training/dp_inference_speedup_and_dynamic_timeout.html`](https://github.com/vitorcen/LeIsaac-Training/blob/main/docs/training/dp_inference_speedup_and_dynamic_timeout.html) — DDIM swap + dynamic timeout 完整 postmortem(含 SVG 拟合曲线)
202
 
203
  ## 致谢
204
+
205
  _Acknowledgments_
206
 
207
  - LeIsaac 团队 + LightwheelAI 提供任务环境和数据集
 
210
  - DDIM scheduler swap inspired by HuggingFace `diffusers` library
211
 
212
  ## 引用
213
+
214
  _Citation_
215
 
216
  ```bibtex