wsagi
/

DiffusionPolicy-PickOrange

@@ -15,17 +15,17 @@ datasets:
 language:
   - en
 ---
 # DiffusionPolicy-PickOrange
-针对 [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) 任务**从头训练**的 LeRobot Diffusion Policy（~267M，UNet 1D + ResNet18 vision encoder），**已 hot-swap 到 DDIM 32-step inference**（不重训，直接改 ckpt `config.json`）。
-_A LeRobot Diffusion Policy (~267M, UNet 1D + ResNet18 vision encoder) **trained from scratch** on the [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) task. **DDIM 32-step inference hot-swapped into the ckpt config** without retraining._
 ![DP eval — SO-101 PickOrange](dp-pick-orange.jpg)
 **🔗 项目仓库 / Project repos**：
 - [vitorcen/isaaclab-experience](https://github.com/vitorcen/isaaclab-experience) — Isaac Lab + LeIsaac 多策略横评（parent project）
-- [vitorcen/LeIsaac](https://github.com/vitorcen/LeIsaac) — LeIsaac fork（训练脚本 + 设计文档 / training scripts + design docs）
 ## TL;DR
@@ -40,6 +40,7 @@ _A LeRobot Diffusion Policy (~267M, UNet 1D + ResNet18 vision encoder) **trained
   _Probabilistic outcomes across runs — full distribution from 0/3 to 3/3 observed, with **some rounds completing all 3 oranges**. Diffusion sampling is inherently stochastic; multi-round averaging required for meaningful comparison._
 ## 模型亮点
 _Highlights_
 - **DDIM scheduler hot-swap 不重训**：DP 论文里 DDPM 100-step 是标配，但 100 步串行采样 → 393 ms/chunk → slowdown 2.96x，4090 实时性吃力。DDIM 是 DDPM 的确定性子集，**可以直接 swap config 不重训权重**。32-step 是 4090 sweet spot。
@@ -49,34 +50,36 @@ _Highlights_
 - **从头训练，无 pretrained vision backbone**：ResNet18 vision encoder 是 LeRobot diffusion 默认 from-scratch 设置，没用 ImageNet pretrain。60 episode 数据撑起一个 visuomotor 任务的极限测试。
 ## 训练配方
 _Training recipe_
-| 项 / Item | 值 / Value |
-|---|---|
-| Dataset | `LightwheelAI/leisaac-pick-orange` (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz) |
-| Policy | `diffusion` (LeRobot 实现 / LeRobot impl.) |
-| Vision encoder | ResNet18（from scratch / no ImageNet pretrain） |
-| Action head | UNet 1D denoiser |
-| `n_action_steps` (输出 / output chunk) | 8 |
-| Noise scheduler (训练 / training) | DDPM, 100 steps |
-| Noise scheduler (推理 / inference) | **DDIM, 32 steps**（hot-swapped post-training） |
-| Steps | 100,000 |
-| Optimizer | AdamW |
-| Hardware | RTX 4090 (24 GB) |
-| Recipe credit | LeRobot diffusion baseline, [Diffusion Policy paper (Chi et al. 2023)](https://diffusion-policy.cs.columbia.edu/) |
-训练入口脚本（在我们的 LeIsaac fork）：[`scripts/training/diffusion_policy/train.sh`](https://github.com/vitorcen/LeIsaac/blob/main/scripts/training/diffusion_policy/train.sh)。
-_Training entrypoint in our fork: [`scripts/training/diffusion_policy/train.sh`](https://github.com/vitorcen/LeIsaac/blob/main/scripts/training/diffusion_policy/train.sh)._
 ## 评测结果
 _Eval results_
 测试环境 / Test setup：Isaac Sim 5.1，task `LeIsaac-SO101-PickOrange-v0`，`episode_length_s=120`，`step_hz=60`（DP 训练时 sim rate），dual-cam 观测，`policy_action_horizon=16`。
 _Test setup: Isaac Sim 5.1, dual-cam observation, `step_hz=60` matching training, `policy_action_horizon=16`._
-| 配置 / Config | 推理延迟 | 观察到的结果分布 | 备注 |
-|---|---|---|---|
-| DDPM 100-step (无 swap) | 393 ms/chunk, 2.96x slowdown | ⚠️ 多次 timeout | 实时性吃力，运动严重滞后 |
 | **DDIM 32-step (本 ckpt 默认)** | **147 ms/chunk, 1.1x slowdown** | **0/3 / 1/3 / 2/3 / 3/3 全谱出现** | 部分轮能完整放完 3 颗 ✅ |
 **关键观察 / Key observations**：
@@ -94,6 +97,7 @@ _Rigorous comparison requires `eval_rounds=10+`. Single-round inferences are mis
 ## ⚠️ 推理关键配置 / Critical inference setting
 ### 1. DDIM hot-swap（已应用于本 ckpt）
 _DDIM hot-swap (already applied in this ckpt)_
 `config.json` 中的关键字段（本 repo 已设置）：
@@ -128,7 +132,7 @@ max_steps = (target - overhead) / per_step ≈ 23 (安全档 / safe)
 实测 / Measured on 4090: 30 → 2/3 oranges, **32 → 可见 3/3 完整 success**, 50 → 爆 3D 算力 OOM-like behavior。
 _Tested on 4090: 30 → 2/3, **32 → full 3/3 success observed**, 50 → 3D rendering choked._
-**弱卡建议 / Weaker GPU recommendation**: 3060 ~10 ms/step，sweet spot ~ **7-8 steps**。完整 calibration 见 [设计文档](https://github.com/vitorcen/LeIsaac/blob/main/docs/training/dp_inference_speedup_and_dynamic_timeout.html)。
 ### 3. Action horizon 配置 / Action horizon setting
@@ -142,6 +146,7 @@ _DP outputs `n_action_steps=8` (fixed); the server auto-caps client `policy_acti
 ```
 ## 使用方法
 _Usage_
 ### 1. 启动 LeRobot async policy_server
@@ -151,7 +156,7 @@ pip install lerobot
 python -m lerobot.async_inference.policy_server --host 0.0.0.0 --port 8080
 ```
-### 2. 通过 [vitorcen/LeIsaac](https://github.com/vitorcen/LeIsaac) fork 启动 eval
 ```bash
 cd LeIsaac
@@ -172,6 +177,7 @@ bash scripts/evaluation/run_eval.sh -- \
 _Use `eval_rounds=10` to average success rate (DP is stochastic; single samples mislead)._
 ## 局限性
 _Limitations_
 - **Stochastic success**：每次 diffusion 采样初值不同，相同 ckpt 同 config 也会有 run-to-run 差异。**不建议**用单 round 结论判断模型好坏。
@@ -184,16 +190,18 @@ _Limitations_
   _No image augmentation or domain randomization → real-world transfer is likely weak._
 ## 相关
 _Related_
 - 同任务对照 / Same-task comparisons：
   - [`wsagi/ACT-PickOrange`](https://huggingface.co/wsagi/ACT-PickOrange) — 自训 ACT (~80M)，1/1 deterministic success @ horizon=32
   - [`shadowHokage/act_policy`](https://huggingface.co/shadowHokage/act_policy) — 社区 ACT，1/1 (deterministic)
   - [`LightwheelAI/leisaac-pick-orange-v0`](https://huggingface.co/LightwheelAI/leisaac-pick-orange-v0) — GR00T N1.5 SOTA (~3B)，~30s 完成 3 颗
-- 完整训练 + eval 配方：[vitorcen/LeIsaac](https://github.com/vitorcen/LeIsaac) fork
-- 设计文档 / Design doc：[`docs/training/dp_inference_speedup_and_dynamic_timeout.html`](https://github.com/vitorcen/LeIsaac/blob/main/docs/training/dp_inference_speedup_and_dynamic_timeout.html) — DDIM swap + dynamic timeout 完整 postmortem（含 SVG 拟合曲线）
 ## 致谢
 _Acknowledgments_
 - LeIsaac 团队 + LightwheelAI 提供任务环境和数据集
@@ -202,6 +210,7 @@ _Acknowledgments_
 - DDIM scheduler swap inspired by HuggingFace `diffusers` library
 ## 引用
 _Citation_
 ```bibtex

 language:
   - en
 ---
 # DiffusionPolicy-PickOrange
+针对 [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) 任务**从头训练**的 LeRobot Diffusion Policy（267M，UNet 1D + ResNet18 vision encoder），**已 hot-swap 到 DDIM 32-step inference**（不重训，直接改 ckpt `config.json`）。
+_A LeRobot Diffusion Policy (267M, UNet 1D + ResNet18 vision encoder) **trained from scratch** on the [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) task. **DDIM 32-step inference hot-swapped into the ckpt config** without retraining._
 ![DP eval — SO-101 PickOrange](dp-pick-orange.jpg)
 **🔗 项目仓库 / Project repos**：
 - [vitorcen/isaaclab-experience](https://github.com/vitorcen/isaaclab-experience) — Isaac Lab + LeIsaac 多策略横评（parent project）
+- [vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) — LeIsaac fork（训练脚本 + 设计文档 / training scripts + design docs）
 ## TL;DR
   _Probabilistic outcomes across runs — full distribution from 0/3 to 3/3 observed, with **some rounds completing all 3 oranges**. Diffusion sampling is inherently stochastic; multi-round averaging required for meaningful comparison._
 ## 模型亮点
 _Highlights_
 - **DDIM scheduler hot-swap 不重训**：DP 论文里 DDPM 100-step 是标配，但 100 步串行采样 → 393 ms/chunk → slowdown 2.96x，4090 实时性吃力。DDIM 是 DDPM 的确定性子集，**可以直接 swap config 不重训权重**。32-step 是 4090 sweet spot。
 - **从头训练，无 pretrained vision backbone**：ResNet18 vision encoder 是 LeRobot diffusion 默认 from-scratch 设置，没用 ImageNet pretrain。60 episode 数据撑起一个 visuomotor 任务的极限测试。
 ## 训练配方
 _Training recipe_
+| 项 / Item                                | 值 / Value                                                                                                    |
+| ---------------------------------------- | ------------------------------------------------------------------------------------------------------------- |
+| Dataset                                  | `LightwheelAI/leisaac-pick-orange` (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz)                      |
+| Policy                                   | `diffusion` (LeRobot 实现 / LeRobot impl.)                                                                  |
+| Vision encoder                           | ResNet18（from scratch / no ImageNet pretrain）                                                               |
+| Action head                              | UNet 1D denoiser                                                                                              |
+| `n_action_steps` (输出 / output chunk) | 8                                                                                                             |
+| Noise scheduler (训练 / training)        | DDPM, 100 steps                                                                                               |
+| Noise scheduler (推理 / inference)       | **DDIM, 32 steps**（hot-swapped post-training）                                                         |
+| Steps                                    | 100,000                                                                                                       |
+| Optimizer                                | AdamW                                                                                                         |
+| Hardware                                 | RTX 4090 (24 GB)                                                                                              |
+| Recipe credit                            | LeRobot diffusion baseline,[Diffusion Policy paper (Chi et al. 2023)](https://diffusion-policy.cs.columbia.edu/) |
+训练入口脚本（在我们的 LeIsaac fork）：[`scripts/training/diffusion_policy/train.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/diffusion_policy/train.sh)。
+_Training entrypoint in our fork: [`scripts/training/diffusion_policy/train.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/diffusion_policy/train.sh)._
 ## 评测结果
 _Eval results_
 测试环境 / Test setup：Isaac Sim 5.1，task `LeIsaac-SO101-PickOrange-v0`，`episode_length_s=120`，`step_hz=60`（DP 训练时 sim rate），dual-cam 观测，`policy_action_horizon=16`。
 _Test setup: Isaac Sim 5.1, dual-cam observation, `step_hz=60` matching training, `policy_action_horizon=16`._
+| 配置 / Config                         | 推理延迟                              | 观察到的结果分布                         | 备注                     |
+| ------------------------------------- | ------------------------------------- | ---------------------------------------- | ------------------------ |
+| DDPM 100-step (无 swap)               | 393 ms/chunk, 2.96x slowdown          | ⚠️ 多次 timeout                        | 实时性吃力，运动严重滞后 |
 | **DDIM 32-step (本 ckpt 默认)** | **147 ms/chunk, 1.1x slowdown** | **0/3 / 1/3 / 2/3 / 3/3 全谱出现** | 部分轮能完整放完 3 颗 ✅ |
 **关键观察 / Key observations**：
 ## ⚠️ 推理关键配置 / Critical inference setting
 ### 1. DDIM hot-swap（已应用于本 ckpt）
 _DDIM hot-swap (already applied in this ckpt)_
 `config.json` 中的关键字段（本 repo 已设置）：
 实测 / Measured on 4090: 30 → 2/3 oranges, **32 → 可见 3/3 完整 success**, 50 → 爆 3D 算力 OOM-like behavior。
 _Tested on 4090: 30 → 2/3, **32 → full 3/3 success observed**, 50 → 3D rendering choked._
+**弱卡建议 / Weaker GPU recommendation**: 3060 ~10 ms/step，sweet spot ~ **7-8 steps**。完整 calibration 见 [设计文档](https://github.com/vitorcen/LeIsaac-Training/blob/main/docs/training/dp_inference_speedup_and_dynamic_timeout.html)。
 ### 3. Action horizon 配置 / Action horizon setting
 ```
 ## 使用方法
 _Usage_
 ### 1. 启动 LeRobot async policy_server
 python -m lerobot.async_inference.policy_server --host 0.0.0.0 --port 8080
 ```
+### 2. 通过 [vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) fork 启动 eval
 ```bash
 cd LeIsaac
 _Use `eval_rounds=10` to average success rate (DP is stochastic; single samples mislead)._
 ## 局限性
 _Limitations_
 - **Stochastic success**：每次 diffusion 采样初值不同，相同 ckpt 同 config 也会有 run-to-run 差异。**不建议**用单 round 结论判断模型好坏。
   _No image augmentation or domain randomization → real-world transfer is likely weak._
 ## 相关
 _Related_
 - 同任务对照 / Same-task comparisons：
   - [`wsagi/ACT-PickOrange`](https://huggingface.co/wsagi/ACT-PickOrange) — 自训 ACT (~80M)，1/1 deterministic success @ horizon=32
   - [`shadowHokage/act_policy`](https://huggingface.co/shadowHokage/act_policy) — 社区 ACT，1/1 (deterministic)
   - [`LightwheelAI/leisaac-pick-orange-v0`](https://huggingface.co/LightwheelAI/leisaac-pick-orange-v0) — GR00T N1.5 SOTA (~3B)，~30s 完成 3 颗
+- 完整训练 + eval 配方：[vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) fork
+- 设计文档 / Design doc：[`docs/training/dp_inference_speedup_and_dynamic_timeout.html`](https://github.com/vitorcen/LeIsaac-Training/blob/main/docs/training/dp_inference_speedup_and_dynamic_timeout.html) — DDIM swap + dynamic timeout 完整 postmortem（含 SVG 拟合曲线）
 ## 致谢
 _Acknowledgments_
 - LeIsaac 团队 + LightwheelAI 提供任务环境和数据集
 - DDIM scheduler swap inspired by HuggingFace `diffusers` library
 ## 引用
 _Citation_
 ```bibtex