File size: 14,408 Bytes

c67ec24
 
 
 
 
 
 
 
 
fee7459
 
 
c67ec24
 
fee7459
 
c67ec24
fee7459
 
6ba1694
 
fee7459
 
 
 
6ba1694
fee7459
6ba1694
fee7459
 
 
 
 
 
 
 
 
 
 
 
 
 
6ba1694
fee7459
 
 
 
 
 
 
 
 
6ba1694
fee7459
 
6ba1694
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fee7459
 
6ba1694
fee7459
 
 
 
 
6ba1694
 
 
fee7459
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6ba1694
fee7459
 
 
 
 
 
 
 
 
 
 
c67ec24
fee7459
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c67ec24
 
fee7459
 
c67ec24
6ba1694
c67ec24
fee7459
c67ec24
fee7459
 
c67ec24
fee7459
 
 
 
 
 
 
6ba1694
fee7459
 
 
c67ec24
 
fee7459
 
 
c67ec24
6ba1694
fee7459
 
 
 
c67ec24
fee7459
 
 
c67ec24
 
fee7459
 
 
 
c67ec24
 
fee7459
 
 
 
6ba1694
fee7459
 
 
 
 
 
 
 
 
 
 
 
6ba1694
fee7459
 
 
 
 
 
6ba1694
 
fee7459
 
6ba1694
fee7459
 
 
 
 
 
 
 
6ba1694
fee7459
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c67ec24
fee7459
c67ec24
fee7459

---
license: apache-2.0
library_name: lerobot
pipeline_tag: robotics
tags:
  - diffusion-policy
  - lerobot
  - so101
  - leisaac
  - pick-orange
  - isaac-sim
  - ddim
datasets:
  - LightwheelAI/leisaac-pick-orange
language:
  - en
---
# DiffusionPolicy-PickOrange

针对 [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) 任务**从头训练**的 LeRobot Diffusion Policy（267M，UNet 1D + ResNet18 vision encoder），**已 hot-swap 到 DDIM 32-step inference**（不重训，直接改 ckpt `config.json`）。
_A LeRobot Diffusion Policy (267M, UNet 1D + ResNet18 vision encoder) **trained from scratch** on the [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) task. **DDIM 32-step inference hot-swapped into the ckpt config** without retraining._

![DP eval — SO-101 PickOrange](dp-pick-orange.jpg)

**🔗 项目仓库 / Project repos**：

- [vitorcen/isaaclab-experience](https://github.com/vitorcen/isaaclab-experience) — Isaac Lab + LeIsaac 多策略横评（parent project）
- [vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) — LeIsaac fork（训练脚本 + 设计文档 / training scripts + design docs）

## TL;DR

- **任务 / Task**：`Pick up the orange and place it on the plate` — SO-101 单臂依次夹起 3 颗橙子放盘子。
  _Single-arm SO-101 picks 3 oranges sequentially and places each on a plate._
- **数据集 / Dataset**：[`LightwheelAI/leisaac-pick-orange`](https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange) — 60 episode 遥操示范。
- **架构 / Architecture**：Diffusion Policy（UNet 1D denoiser + ResNet18 双相机 vision encoder + 6 DOF state input → 8-step action chunk）。
- **训练 / Training**：100k step，~1.07 GB model.safetensors，DDPM 100-step 训练。
- **推理 hot-swap / Inference hot-swap**：`config.json` 改 `noise_scheduler_type: DDPM → DDIM` + `num_inference_steps: null → 32`，**不重训**。inference latency **393 ms → 147 ms / chunk**，slowdown 2.96x → **1.1x** 实时跑得动。
  _Edit `config.json`: `noise_scheduler_type: DDPM → DDIM` + `num_inference_steps: null → 32` — **no retraining**. Inference latency drops 393 → 147 ms/chunk, slowdown 2.96x → 1.1x, real-time on RTX 4090._
- **评测 / Eval**：Isaac Sim 5.1 + LeIsaac，**多轮 eval 见到 0/3 ~ 3/3 全谱概率分布，部分轮能完整放完 3 颗**。Diffusion 采样自带 stochasticity，需多轮平均才有意义。
  _Probabilistic outcomes across runs — full distribution from 0/3 to 3/3 observed, with **some rounds completing all 3 oranges**. Diffusion sampling is inherently stochastic; multi-round averaging required for meaningful comparison._

## 模型亮点

_Highlights_

- **DDIM scheduler hot-swap 不重训**：DP 论文里 DDPM 100-step 是标配，但 100 步串行采样 → 393 ms/chunk → slowdown 2.96x，4090 实时性吃力。DDIM 是 DDPM 的确定性子集，**可以直接 swap config 不重训权重**。32-step 是 4090 sweet spot。
  _DDIM is a deterministic subset of DDPM; ckpt config can be swapped without retraining. 32 inference steps is the RTX 4090 sweet spot._
- **概率性完整 3/3 success**：多轮 eval 中**有 round 能完整夹起并放置 3 颗橙子**。比 ACT 的 deterministic 1/1 输出嘈杂，但说明 DP 在 dataset 边界上能触达 task 完整性，不只是"勉强夹起 1 颗"。
  _Some rounds achieve full 3/3 placement, demonstrating DP reaches task completion (not just first-orange grasp) when the diffusion sample lands favorably._
- **从头训练，无 pretrained vision backbone**：ResNet18 vision encoder 是 LeRobot diffusion 默认 from-scratch 设置，没用 ImageNet pretrain。60 episode 数据撑起一个 visuomotor 任务的极限测试。

## 训练配方

_Training recipe_

| 项 / Item                                | 值 / Value                                                                                                    |
| ---------------------------------------- | ------------------------------------------------------------------------------------------------------------- |
| Dataset                                  | `LightwheelAI/leisaac-pick-orange` (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz)                      |
| Policy                                   | `diffusion` (LeRobot 实现 / LeRobot impl.)                                                                  |
| Vision encoder                           | ResNet18（from scratch / no ImageNet pretrain）                                                               |
| Action head                              | UNet 1D denoiser                                                                                              |
| `n_action_steps` (输出 / output chunk) | 8                                                                                                             |
| Noise scheduler (训练 / training)        | DDPM, 100 steps                                                                                               |
| Noise scheduler (推理 / inference)       | **DDIM, 32 steps**（hot-swapped post-training）                                                         |
| Steps                                    | 100,000                                                                                                       |
| Optimizer                                | AdamW                                                                                                         |
| Hardware                                 | RTX 4090 (24 GB)                                                                                              |
| Recipe credit                            | LeRobot diffusion baseline,[Diffusion Policy paper (Chi et al. 2023)](https://diffusion-policy.cs.columbia.edu/) |

训练入口脚本（在我们的 LeIsaac fork）：[`scripts/training/diffusion_policy/train.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/diffusion_policy/train.sh)。
_Training entrypoint in our fork: [`scripts/training/diffusion_policy/train.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/diffusion_policy/train.sh)._

## 评测结果

_Eval results_

测试环境 / Test setup：Isaac Sim 5.1，task `LeIsaac-SO101-PickOrange-v0`，`episode_length_s=120`，`step_hz=60`（DP 训练时 sim rate），dual-cam 观测，`policy_action_horizon=16`。
_Test setup: Isaac Sim 5.1, dual-cam observation, `step_hz=60` matching training, `policy_action_horizon=16`._

| 配置 / Config                         | 推理延迟                              | 观察到的结果分布                         | 备注                     |
| ------------------------------------- | ------------------------------------- | ---------------------------------------- | ------------------------ |
| DDPM 100-step (无 swap)               | 393 ms/chunk, 2.96x slowdown          | ⚠️ 多次 timeout                        | 实时性吃力，运动严重滞后 |
| **DDIM 32-step (本 ckpt 默认)** | **147 ms/chunk, 1.1x slowdown** | **0/3 / 1/3 / 2/3 / 3/3 全谱出现** | 部分轮能完整放完 3 颗 ✅ |

**关键观察 / Key observations**：

1. **Diffusion sampling 是 stochastic**：同 ckpt 同 config，每次推理从不同噪声起步 → 同 episode 跑多次结果不同。**这是架构特性，不是 bug**。
   _Stochastic by design: same ckpt + config gives different outcomes per run due to noise initialization._
2. **部分轮 3/3 完整 success**：证明 DP 在 dataset 60-ep 边界内能 reach task completion，不只是单颗 grasp。
   _Some rounds achieve full 3/3 — DP can reach task completion within the 60-episode dataset boundary._
3. **结果分布偏斜**：第 1 颗 success rate 远高于第 3 颗（共同 dataset OOD ceiling，与 ACT / SmolVLA / π0.5 一致）。
   _Distribution is skewed: 1st-orange success rate >> 3rd-orange. Shared dataset OOD ceiling with ACT / SmolVLA / π0.5._

**严谨 success rate 估计 / Rigorous estimate**：需 `eval_rounds=10` 及以上多 round 平均才能定量。单 sample 误差大，**不要**用单 round 推论。
_Rigorous comparison requires `eval_rounds=10+`. Single-round inferences are misleading._

## ⚠️ 推理关键配置 / Critical inference setting

### 1. DDIM hot-swap（已应用于本 ckpt）

_DDIM hot-swap (already applied in this ckpt)_

`config.json` 中的关键字段（本 repo 已设置）：
_Key fields in `config.json` (already configured in this repo):_

```json
{
    "noise_scheduler_type": "DDIM",
    "num_inference_steps": 32
}
```

`config.json.bak` 保留原始 DDPM 设定，可对比。
_`config.json.bak` keeps the original DDPM settings for reference._

### 2. DDIM 步数按 GPU 反推 / Per-GPU DDIM step calibration

RTX 4090 + Isaac Sim 实测拟合：
_RTX 4090 + Isaac Sim measured fit:_

```
inference_ms ≈ 36 + n_steps × 3.3
# overhead 36ms = ResNet18 encode + ZMQ RTT
# per_step 3.3ms = UNet single denoising on 4090

target_inference_ms = effective_chunk × (1000 / step_hz) × safety
                    = 8 × 16.67 × 0.85 = 113 ms (60Hz, safety 0.85)
max_steps = (target - overhead) / per_step ≈ 23 (安全档 / safe)
          = (133 - 36) / 3.3 ≈ 29 (临界档 / critical)
```

实测 / Measured on 4090: 30 → 2/3 oranges, **32 → 可见 3/3 完整 success**, 50 → 爆 3D 算力 OOM-like behavior。
_Tested on 4090: 30 → 2/3, **32 → full 3/3 success observed**, 50 → 3D rendering choked._

**弱卡建议 / Weaker GPU recommendation**: 3060 ~10 ms/step，sweet spot ~ **7-8 steps**。完整 calibration 见 [设计文档](https://github.com/vitorcen/LeIsaac-Training/blob/main/docs/training/dp_inference_speedup_and_dynamic_timeout.html)。

### 3. Action horizon 配置 / Action horizon setting

DP 模型输出 `n_action_steps=8`（固定），所以**客户端 `policy_action_horizon` ≥ 8 时 server 自动截到 8**。设 16 / 32 / 50 等效。
_DP outputs `n_action_steps=8` (fixed); the server auto-caps client `policy_action_horizon` to 8 when ≥ 8, so 16 / 32 / 50 are equivalent at the client side._

```bash
--policy_action_horizon=16    # 任意 ≥ 8 都行 / any value ≥ 8 works
--step_hz=60                  # DP 训练 sim rate / DP training sim rate
--episode_length_s=120
```

## 使用方法

_Usage_

### 1. 启动 LeRobot async policy_server

```bash
pip install lerobot
python -m lerobot.async_inference.policy_server --host 0.0.0.0 --port 8080
```

### 2. 通过 [vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) fork 启动 eval

```bash
cd LeIsaac
bash scripts/evaluation/run_eval.sh -- \
    --task=LeIsaac-SO101-PickOrange-v0 \
    --eval_rounds=10 \
    --episode_length_s=120 \
    --step_hz=60 \
    --policy_type=lerobot-diffusion \
    --policy_host=127.0.0.1 --policy_port=8080 \
    --policy_checkpoint_path=wsagi/DiffusionPolicy-PickOrange \
    --policy_action_horizon=16 \
    --policy_language_instruction='Pick up the orange and place it on the plate' \
    --device=cuda --enable_cameras
```

建议 `eval_rounds=10` 求 success rate 平均（DP 是 stochastic，单 sample 容易误判）。
_Use `eval_rounds=10` to average success rate (DP is stochastic; single samples mislead)._

## 局限性

_Limitations_

- **Stochastic success**：每次 diffusion 采样初值不同，相同 ckpt 同 config 也会有 run-to-run 差异。**不建议**用单 round 结论判断模型好坏。
  _Stochastic outcomes: each diffusion sampling pass starts from different noise; same ckpt + config gives run-to-run variance. Single-round conclusions are misleading._
- **第 2 / 3 颗 dataset OOD**：与 ACT / SmolVLA / π0.5 共同 ceiling — dataset 60 ep × 每集 1 次"放第 N 颗"演示，第 2/3 颗 state coverage 稀疏。即便 DDIM 32-step 解锁实时性，**第 3 颗的成功率仍随颗数衰减**。
  _Shared 2nd/3rd-orange OOD ceiling. Even with DDIM-32 unlocking realtime, 3rd-orange success rate drops monotonically._
- **GPU bound**：DDIM step 数与 GPU 算力强耦合。本 ckpt 默认 32-step 是 4090 优化值；3060/3070 上需降到 ~10 step（性能下降 + 可能再损 success rate）。
  _GPU-bound: DDIM steps are tightly coupled to GPU compute. The 32-step default is RTX 4090-optimized; weaker GPUs need ~10 steps (with quality tradeoff)._
- **无图像增强、无 domain randomization**：sim-only ckpt，真机迁移可能弱。
  _No image augmentation or domain randomization → real-world transfer is likely weak._

## 相关

_Related_

- 同任务对照 / Same-task comparisons：
  - [`wsagi/ACT-PickOrange`](https://huggingface.co/wsagi/ACT-PickOrange) — 自训 ACT (~80M)，1/1 deterministic success @ horizon=32
  - [`shadowHokage/act_policy`](https://huggingface.co/shadowHokage/act_policy) — 社区 ACT，1/1 (deterministic)
  - [`LightwheelAI/leisaac-pick-orange-v0`](https://huggingface.co/LightwheelAI/leisaac-pick-orange-v0) — GR00T N1.5 SOTA (~3B)，~30s 完成 3 颗
- 完整训练 + eval 配方：[vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) fork
- 设计文档 / Design doc：[`docs/training/dp_inference_speedup_and_dynamic_timeout.html`](https://github.com/vitorcen/LeIsaac-Training/blob/main/docs/training/dp_inference_speedup_and_dynamic_timeout.html) — DDIM swap + dynamic timeout 完整 postmortem（含 SVG 拟合曲线）

## 致谢

_Acknowledgments_

- LeIsaac 团队 + LightwheelAI 提供任务环境和数据集
- LeRobot 团队提供 Diffusion Policy 实现 + async inference 框架
- Diffusion Policy 原始论文：[Chi et al. 2023](https://diffusion-policy.cs.columbia.edu/)
- DDIM scheduler swap inspired by HuggingFace `diffusers` library

## 引用

_Citation_

```bibtex
@inproceedings{chi2023diffusion,
  title={Diffusion Policy: Visuomotor Policy Learning via Action Diffusion},
  author={Chi, Cheng and Feng, Siyuan and Du, Yilun and Xu, Zhenjia and Cousineau, Eric and Burchfiel, Benjamin and Song, Shuran},
  booktitle={Robotics: Science and Systems},
  year={2023}
}

@inproceedings{song2021denoising,
  title={Denoising Diffusion Implicit Models},
  author={Song, Jiaming and Meng, Chenlin and Ermon, Stefano},
  booktitle={International Conference on Learning Representations},
  year={2021}
}
```

## License

Apache-2.0