Instructions to use wsagi/DiffusionPolicy-PickOrange with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use wsagi/DiffusionPolicy-PickOrange with LeRobot:
- Notebooks
- Google Colab
- Kaggle
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -15,17 +15,17 @@ datasets:
|
|
| 15 |
language:
|
| 16 |
- en
|
| 17 |
---
|
| 18 |
-
|
| 19 |
# DiffusionPolicy-PickOrange
|
| 20 |
|
| 21 |
-
针对 [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) 任务**从头训练**的 LeRobot Diffusion Policy(
|
| 22 |
-
_A LeRobot Diffusion Policy (
|
| 23 |
|
| 24 |

|
| 25 |
|
| 26 |
**🔗 项目仓库 / Project repos**:
|
|
|
|
| 27 |
- [vitorcen/isaaclab-experience](https://github.com/vitorcen/isaaclab-experience) — Isaac Lab + LeIsaac 多策略横评(parent project)
|
| 28 |
-
- [vitorcen/LeIsaac](https://github.com/vitorcen/LeIsaac) — LeIsaac fork(训练脚本 + 设计文档 / training scripts + design docs)
|
| 29 |
|
| 30 |
## TL;DR
|
| 31 |
|
|
@@ -40,6 +40,7 @@ _A LeRobot Diffusion Policy (~267M, UNet 1D + ResNet18 vision encoder) **trained
|
|
| 40 |
_Probabilistic outcomes across runs — full distribution from 0/3 to 3/3 observed, with **some rounds completing all 3 oranges**. Diffusion sampling is inherently stochastic; multi-round averaging required for meaningful comparison._
|
| 41 |
|
| 42 |
## 模型亮点
|
|
|
|
| 43 |
_Highlights_
|
| 44 |
|
| 45 |
- **DDIM scheduler hot-swap 不重训**:DP 论文里 DDPM 100-step 是标配,但 100 步串行采样 → 393 ms/chunk → slowdown 2.96x,4090 实时性吃力。DDIM 是 DDPM 的确定性子集,**可以直接 swap config 不重训权重**。32-step 是 4090 sweet spot。
|
|
@@ -49,34 +50,36 @@ _Highlights_
|
|
| 49 |
- **从头训练,无 pretrained vision backbone**:ResNet18 vision encoder 是 LeRobot diffusion 默认 from-scratch 设置,没用 ImageNet pretrain。60 episode 数据撑起一个 visuomotor 任务的极限测试。
|
| 50 |
|
| 51 |
## 训练配方
|
|
|
|
| 52 |
_Training recipe_
|
| 53 |
|
| 54 |
-
| 项 / Item
|
| 55 |
-
|---|---|
|
| 56 |
-
| Dataset
|
| 57 |
-
| Policy
|
| 58 |
-
| Vision encoder
|
| 59 |
-
| Action head
|
| 60 |
-
| `n_action_steps` (输出 / output chunk) | 8
|
| 61 |
-
| Noise scheduler (训练 / training)
|
| 62 |
-
| Noise scheduler (推理 / inference)
|
| 63 |
-
| Steps
|
| 64 |
-
| Optimizer
|
| 65 |
-
| Hardware
|
| 66 |
-
| Recipe credit
|
| 67 |
-
|
| 68 |
-
训练入口脚本(在我们的 LeIsaac fork):[`scripts/training/diffusion_policy/train.sh`](https://github.com/vitorcen/LeIsaac/blob/main/scripts/training/diffusion_policy/train.sh)。
|
| 69 |
-
_Training entrypoint in our fork: [`scripts/training/diffusion_policy/train.sh`](https://github.com/vitorcen/LeIsaac/blob/main/scripts/training/diffusion_policy/train.sh)._
|
| 70 |
|
| 71 |
## 评测结果
|
|
|
|
| 72 |
_Eval results_
|
| 73 |
|
| 74 |
测试环境 / Test setup:Isaac Sim 5.1,task `LeIsaac-SO101-PickOrange-v0`,`episode_length_s=120`,`step_hz=60`(DP 训练时 sim rate),dual-cam 观测,`policy_action_horizon=16`。
|
| 75 |
_Test setup: Isaac Sim 5.1, dual-cam observation, `step_hz=60` matching training, `policy_action_horizon=16`._
|
| 76 |
|
| 77 |
-
| 配置 / Config
|
| 78 |
-
|---|---|---|---|
|
| 79 |
-
| DDPM 100-step (无 swap)
|
| 80 |
| **DDIM 32-step (本 ckpt 默认)** | **147 ms/chunk, 1.1x slowdown** | **0/3 / 1/3 / 2/3 / 3/3 全谱出现** | 部分轮能完整放完 3 颗 ✅ |
|
| 81 |
|
| 82 |
**关键观察 / Key observations**:
|
|
@@ -94,6 +97,7 @@ _Rigorous comparison requires `eval_rounds=10+`. Single-round inferences are mis
|
|
| 94 |
## ⚠️ 推理关键配置 / Critical inference setting
|
| 95 |
|
| 96 |
### 1. DDIM hot-swap(已应用于本 ckpt)
|
|
|
|
| 97 |
_DDIM hot-swap (already applied in this ckpt)_
|
| 98 |
|
| 99 |
`config.json` 中的关键字段(本 repo 已设置):
|
|
@@ -128,7 +132,7 @@ max_steps = (target - overhead) / per_step ≈ 23 (安全档 / safe)
|
|
| 128 |
实测 / Measured on 4090: 30 → 2/3 oranges, **32 → 可见 3/3 完整 success**, 50 → 爆 3D 算力 OOM-like behavior。
|
| 129 |
_Tested on 4090: 30 → 2/3, **32 → full 3/3 success observed**, 50 → 3D rendering choked._
|
| 130 |
|
| 131 |
-
**弱卡建议 / Weaker GPU recommendation**: 3060 ~10 ms/step,sweet spot ~ **7-8 steps**。完整 calibration 见 [设计文档](https://github.com/vitorcen/LeIsaac/blob/main/docs/training/dp_inference_speedup_and_dynamic_timeout.html)。
|
| 132 |
|
| 133 |
### 3. Action horizon 配置 / Action horizon setting
|
| 134 |
|
|
@@ -142,6 +146,7 @@ _DP outputs `n_action_steps=8` (fixed); the server auto-caps client `policy_acti
|
|
| 142 |
```
|
| 143 |
|
| 144 |
## 使用方法
|
|
|
|
| 145 |
_Usage_
|
| 146 |
|
| 147 |
### 1. 启动 LeRobot async policy_server
|
|
@@ -151,7 +156,7 @@ pip install lerobot
|
|
| 151 |
python -m lerobot.async_inference.policy_server --host 0.0.0.0 --port 8080
|
| 152 |
```
|
| 153 |
|
| 154 |
-
### 2. 通过 [vitorcen/LeIsaac](https://github.com/vitorcen/LeIsaac) fork 启动 eval
|
| 155 |
|
| 156 |
```bash
|
| 157 |
cd LeIsaac
|
|
@@ -172,6 +177,7 @@ bash scripts/evaluation/run_eval.sh -- \
|
|
| 172 |
_Use `eval_rounds=10` to average success rate (DP is stochastic; single samples mislead)._
|
| 173 |
|
| 174 |
## 局限性
|
|
|
|
| 175 |
_Limitations_
|
| 176 |
|
| 177 |
- **Stochastic success**:每次 diffusion 采样初值不同,相同 ckpt 同 config 也会有 run-to-run 差异。**不建议**用单 round 结论判断模型好坏。
|
|
@@ -184,16 +190,18 @@ _Limitations_
|
|
| 184 |
_No image augmentation or domain randomization → real-world transfer is likely weak._
|
| 185 |
|
| 186 |
## 相关
|
|
|
|
| 187 |
_Related_
|
| 188 |
|
| 189 |
- 同任务对照 / Same-task comparisons:
|
| 190 |
- [`wsagi/ACT-PickOrange`](https://huggingface.co/wsagi/ACT-PickOrange) — 自训 ACT (~80M),1/1 deterministic success @ horizon=32
|
| 191 |
- [`shadowHokage/act_policy`](https://huggingface.co/shadowHokage/act_policy) — 社区 ACT,1/1 (deterministic)
|
| 192 |
- [`LightwheelAI/leisaac-pick-orange-v0`](https://huggingface.co/LightwheelAI/leisaac-pick-orange-v0) — GR00T N1.5 SOTA (~3B),~30s 完成 3 颗
|
| 193 |
-
- 完整训练 + eval 配方:[vitorcen/LeIsaac](https://github.com/vitorcen/LeIsaac) fork
|
| 194 |
-
- 设计文档 / Design doc:[`docs/training/dp_inference_speedup_and_dynamic_timeout.html`](https://github.com/vitorcen/LeIsaac/blob/main/docs/training/dp_inference_speedup_and_dynamic_timeout.html) — DDIM swap + dynamic timeout 完整 postmortem(含 SVG 拟合曲线)
|
| 195 |
|
| 196 |
## 致谢
|
|
|
|
| 197 |
_Acknowledgments_
|
| 198 |
|
| 199 |
- LeIsaac 团队 + LightwheelAI 提供任务环境和数据集
|
|
@@ -202,6 +210,7 @@ _Acknowledgments_
|
|
| 202 |
- DDIM scheduler swap inspired by HuggingFace `diffusers` library
|
| 203 |
|
| 204 |
## 引用
|
|
|
|
| 205 |
_Citation_
|
| 206 |
|
| 207 |
```bibtex
|
|
|
|
| 15 |
language:
|
| 16 |
- en
|
| 17 |
---
|
|
|
|
| 18 |
# DiffusionPolicy-PickOrange
|
| 19 |
|
| 20 |
+
针对 [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) 任务**从头训练**的 LeRobot Diffusion Policy(267M,UNet 1D + ResNet18 vision encoder),**已 hot-swap 到 DDIM 32-step inference**(不重训,直接改 ckpt `config.json`)。
|
| 21 |
+
_A LeRobot Diffusion Policy (267M, UNet 1D + ResNet18 vision encoder) **trained from scratch** on the [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) task. **DDIM 32-step inference hot-swapped into the ckpt config** without retraining._
|
| 22 |
|
| 23 |

|
| 24 |
|
| 25 |
**🔗 项目仓库 / Project repos**:
|
| 26 |
+
|
| 27 |
- [vitorcen/isaaclab-experience](https://github.com/vitorcen/isaaclab-experience) — Isaac Lab + LeIsaac 多策略横评(parent project)
|
| 28 |
+
- [vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) — LeIsaac fork(训练脚本 + 设计文档 / training scripts + design docs)
|
| 29 |
|
| 30 |
## TL;DR
|
| 31 |
|
|
|
|
| 40 |
_Probabilistic outcomes across runs — full distribution from 0/3 to 3/3 observed, with **some rounds completing all 3 oranges**. Diffusion sampling is inherently stochastic; multi-round averaging required for meaningful comparison._
|
| 41 |
|
| 42 |
## 模型亮点
|
| 43 |
+
|
| 44 |
_Highlights_
|
| 45 |
|
| 46 |
- **DDIM scheduler hot-swap 不重训**:DP 论文里 DDPM 100-step 是标配,但 100 步串行采样 → 393 ms/chunk → slowdown 2.96x,4090 实时性吃力。DDIM 是 DDPM 的确定性子集,**可以直接 swap config 不重训权重**。32-step 是 4090 sweet spot。
|
|
|
|
| 50 |
- **从头训练,无 pretrained vision backbone**:ResNet18 vision encoder 是 LeRobot diffusion 默认 from-scratch 设置,没用 ImageNet pretrain。60 episode 数据撑起一个 visuomotor 任务的极限测试。
|
| 51 |
|
| 52 |
## 训练配方
|
| 53 |
+
|
| 54 |
_Training recipe_
|
| 55 |
|
| 56 |
+
| 项 / Item | 值 / Value |
|
| 57 |
+
| ---------------------------------------- | ------------------------------------------------------------------------------------------------------------- |
|
| 58 |
+
| Dataset | `LightwheelAI/leisaac-pick-orange` (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz) |
|
| 59 |
+
| Policy | `diffusion` (LeRobot 实现 / LeRobot impl.) |
|
| 60 |
+
| Vision encoder | ResNet18(from scratch / no ImageNet pretrain) |
|
| 61 |
+
| Action head | UNet 1D denoiser |
|
| 62 |
+
| `n_action_steps` (输出 / output chunk) | 8 |
|
| 63 |
+
| Noise scheduler (训练 / training) | DDPM, 100 steps |
|
| 64 |
+
| Noise scheduler (推理 / inference) | **DDIM, 32 steps**(hot-swapped post-training) |
|
| 65 |
+
| Steps | 100,000 |
|
| 66 |
+
| Optimizer | AdamW |
|
| 67 |
+
| Hardware | RTX 4090 (24 GB) |
|
| 68 |
+
| Recipe credit | LeRobot diffusion baseline,[Diffusion Policy paper (Chi et al. 2023)](https://diffusion-policy.cs.columbia.edu/) |
|
| 69 |
+
|
| 70 |
+
训练入口脚本(在我们的 LeIsaac fork):[`scripts/training/diffusion_policy/train.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/diffusion_policy/train.sh)。
|
| 71 |
+
_Training entrypoint in our fork: [`scripts/training/diffusion_policy/train.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/diffusion_policy/train.sh)._
|
| 72 |
|
| 73 |
## 评测结果
|
| 74 |
+
|
| 75 |
_Eval results_
|
| 76 |
|
| 77 |
测试环境 / Test setup:Isaac Sim 5.1,task `LeIsaac-SO101-PickOrange-v0`,`episode_length_s=120`,`step_hz=60`(DP 训练时 sim rate),dual-cam 观测,`policy_action_horizon=16`。
|
| 78 |
_Test setup: Isaac Sim 5.1, dual-cam observation, `step_hz=60` matching training, `policy_action_horizon=16`._
|
| 79 |
|
| 80 |
+
| 配置 / Config | 推理延迟 | 观察到的结果分布 | 备注 |
|
| 81 |
+
| ------------------------------------- | ------------------------------------- | ---------------------------------------- | ------------------------ |
|
| 82 |
+
| DDPM 100-step (无 swap) | 393 ms/chunk, 2.96x slowdown | ⚠️ 多次 timeout | 实时性吃力,运动严重滞后 |
|
| 83 |
| **DDIM 32-step (本 ckpt 默认)** | **147 ms/chunk, 1.1x slowdown** | **0/3 / 1/3 / 2/3 / 3/3 全谱出现** | 部分轮能完整放完 3 颗 ✅ |
|
| 84 |
|
| 85 |
**关键观察 / Key observations**:
|
|
|
|
| 97 |
## ⚠️ 推理关键配置 / Critical inference setting
|
| 98 |
|
| 99 |
### 1. DDIM hot-swap(已应用于本 ckpt)
|
| 100 |
+
|
| 101 |
_DDIM hot-swap (already applied in this ckpt)_
|
| 102 |
|
| 103 |
`config.json` 中的关键字段(本 repo 已设置):
|
|
|
|
| 132 |
实测 / Measured on 4090: 30 → 2/3 oranges, **32 → 可见 3/3 完整 success**, 50 → 爆 3D 算力 OOM-like behavior。
|
| 133 |
_Tested on 4090: 30 → 2/3, **32 → full 3/3 success observed**, 50 → 3D rendering choked._
|
| 134 |
|
| 135 |
+
**弱卡建议 / Weaker GPU recommendation**: 3060 ~10 ms/step,sweet spot ~ **7-8 steps**。完整 calibration 见 [设计文档](https://github.com/vitorcen/LeIsaac-Training/blob/main/docs/training/dp_inference_speedup_and_dynamic_timeout.html)。
|
| 136 |
|
| 137 |
### 3. Action horizon 配置 / Action horizon setting
|
| 138 |
|
|
|
|
| 146 |
```
|
| 147 |
|
| 148 |
## 使用方法
|
| 149 |
+
|
| 150 |
_Usage_
|
| 151 |
|
| 152 |
### 1. 启动 LeRobot async policy_server
|
|
|
|
| 156 |
python -m lerobot.async_inference.policy_server --host 0.0.0.0 --port 8080
|
| 157 |
```
|
| 158 |
|
| 159 |
+
### 2. 通过 [vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) fork 启动 eval
|
| 160 |
|
| 161 |
```bash
|
| 162 |
cd LeIsaac
|
|
|
|
| 177 |
_Use `eval_rounds=10` to average success rate (DP is stochastic; single samples mislead)._
|
| 178 |
|
| 179 |
## 局限性
|
| 180 |
+
|
| 181 |
_Limitations_
|
| 182 |
|
| 183 |
- **Stochastic success**:每次 diffusion 采样初值不同,相同 ckpt 同 config 也会有 run-to-run 差异。**不建议**用单 round 结论判断模型好坏。
|
|
|
|
| 190 |
_No image augmentation or domain randomization → real-world transfer is likely weak._
|
| 191 |
|
| 192 |
## 相关
|
| 193 |
+
|
| 194 |
_Related_
|
| 195 |
|
| 196 |
- 同任务对照 / Same-task comparisons:
|
| 197 |
- [`wsagi/ACT-PickOrange`](https://huggingface.co/wsagi/ACT-PickOrange) — 自训 ACT (~80M),1/1 deterministic success @ horizon=32
|
| 198 |
- [`shadowHokage/act_policy`](https://huggingface.co/shadowHokage/act_policy) — 社区 ACT,1/1 (deterministic)
|
| 199 |
- [`LightwheelAI/leisaac-pick-orange-v0`](https://huggingface.co/LightwheelAI/leisaac-pick-orange-v0) — GR00T N1.5 SOTA (~3B),~30s 完成 3 颗
|
| 200 |
+
- 完整训练 + eval 配方:[vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) fork
|
| 201 |
+
- 设计文档 / Design doc:[`docs/training/dp_inference_speedup_and_dynamic_timeout.html`](https://github.com/vitorcen/LeIsaac-Training/blob/main/docs/training/dp_inference_speedup_and_dynamic_timeout.html) — DDIM swap + dynamic timeout 完整 postmortem(含 SVG 拟合曲线)
|
| 202 |
|
| 203 |
## 致谢
|
| 204 |
+
|
| 205 |
_Acknowledgments_
|
| 206 |
|
| 207 |
- LeIsaac 团队 + LightwheelAI 提供任务环境和数据集
|
|
|
|
| 210 |
- DDIM scheduler swap inspired by HuggingFace `diffusers` library
|
| 211 |
|
| 212 |
## 引用
|
| 213 |
+
|
| 214 |
_Citation_
|
| 215 |
|
| 216 |
```bibtex
|