DiffusionPolicy-PickOrange

针对 LeIsaac SO-101 PickOrange 任务从头训练的 LeRobot Diffusion Policy(267M,UNet 1D + ResNet18 vision encoder),已 hot-swap 到 DDIM 32-step inference(不重训,直接改 ckpt config.json)。 A LeRobot Diffusion Policy (267M, UNet 1D + ResNet18 vision encoder) trained from scratch on the LeIsaac SO-101 PickOrange task. DDIM 32-step inference hot-swapped into the ckpt config without retraining.

DP eval — SO-101 PickOrange

🔗 项目仓库 / Project repos

TL;DR

  • 任务 / TaskPick up the orange and place it on the plate — SO-101 单臂依次夹起 3 颗橙子放盘子。 Single-arm SO-101 picks 3 oranges sequentially and places each on a plate.
  • 数据集 / DatasetLightwheelAI/leisaac-pick-orange — 60 episode 遥操示范。
  • 架构 / Architecture:Diffusion Policy(UNet 1D denoiser + ResNet18 双相机 vision encoder + 6 DOF state input → 8-step action chunk)。
  • 训练 / Training:100k step,~1.07 GB model.safetensors,DDPM 100-step 训练。
  • 推理 hot-swap / Inference hot-swapconfig.jsonnoise_scheduler_type: DDPM → DDIM + num_inference_steps: null → 32不重训。inference latency 393 ms → 147 ms / chunk,slowdown 2.96x → 1.1x 实时跑得动。 Edit config.json: noise_scheduler_type: DDPM → DDIM + num_inference_steps: null → 32no retraining. Inference latency drops 393 → 147 ms/chunk, slowdown 2.96x → 1.1x, real-time on RTX 4090.
  • 评测 / Eval:Isaac Sim 5.1 + LeIsaac,多轮 eval 见到 0/3 ~ 3/3 全谱概率分布,部分轮能完整放完 3 颗。Diffusion 采样自带 stochasticity,需多轮平均才有意义。 Probabilistic outcomes across runs — full distribution from 0/3 to 3/3 observed, with some rounds completing all 3 oranges. Diffusion sampling is inherently stochastic; multi-round averaging required for meaningful comparison.

模型亮点

Highlights

  • DDIM scheduler hot-swap 不重训:DP 论文里 DDPM 100-step 是标配,但 100 步串行采样 → 393 ms/chunk → slowdown 2.96x,4090 实时性吃力。DDIM 是 DDPM 的确定性子集,可以直接 swap config 不重训权重。32-step 是 4090 sweet spot。 DDIM is a deterministic subset of DDPM; ckpt config can be swapped without retraining. 32 inference steps is the RTX 4090 sweet spot.
  • 概率性完整 3/3 success:多轮 eval 中有 round 能完整夹起并放置 3 颗橙子。比 ACT 的 deterministic 1/1 输出嘈杂,但说明 DP 在 dataset 边界上能触达 task 完整性,不只是"勉强夹起 1 颗"。 Some rounds achieve full 3/3 placement, demonstrating DP reaches task completion (not just first-orange grasp) when the diffusion sample lands favorably.
  • 从头训练,无 pretrained vision backbone:ResNet18 vision encoder 是 LeRobot diffusion 默认 from-scratch 设置,没用 ImageNet pretrain。60 episode 数据撑起一个 visuomotor 任务的极限测试。

训练配方

Training recipe

项 / Item 值 / Value
Dataset LightwheelAI/leisaac-pick-orange (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz)
Policy diffusion (LeRobot 实现 / LeRobot impl.)
Vision encoder ResNet18(from scratch / no ImageNet pretrain)
Action head UNet 1D denoiser
n_action_steps (输出 / output chunk) 8
Noise scheduler (训练 / training) DDPM, 100 steps
Noise scheduler (推理 / inference) DDIM, 32 steps(hot-swapped post-training)
Steps 100,000
Optimizer AdamW
Hardware RTX 4090 (24 GB)
Recipe credit LeRobot diffusion baseline,Diffusion Policy paper (Chi et al. 2023)

训练入口脚本(在我们的 LeIsaac fork):scripts/training/diffusion_policy/train.shTraining entrypoint in our fork: scripts/training/diffusion_policy/train.sh.

评测结果

Eval results

测试环境 / Test setup:Isaac Sim 5.1,task LeIsaac-SO101-PickOrange-v0episode_length_s=120step_hz=60(DP 训练时 sim rate),dual-cam 观测,policy_action_horizon=16Test setup: Isaac Sim 5.1, dual-cam observation, step_hz=60 matching training, policy_action_horizon=16.

配置 / Config 推理延迟 观察到的结果分布 备注
DDPM 100-step (无 swap) 393 ms/chunk, 2.96x slowdown ⚠️ 多次 timeout 实时性吃力,运动严重滞后
DDIM 32-step (本 ckpt 默认) 147 ms/chunk, 1.1x slowdown 0/3 / 1/3 / 2/3 / 3/3 全谱出现 部分轮能完整放完 3 颗 ✅

关键观察 / Key observations

  1. Diffusion sampling 是 stochastic:同 ckpt 同 config,每次推理从不同噪声起步 → 同 episode 跑多次结果不同。这是架构特性,不是 bugStochastic by design: same ckpt + config gives different outcomes per run due to noise initialization.
  2. 部分轮 3/3 完整 success:证明 DP 在 dataset 60-ep 边界内能 reach task completion,不只是单颗 grasp。 Some rounds achieve full 3/3 — DP can reach task completion within the 60-episode dataset boundary.
  3. 结果分布偏斜:第 1 颗 success rate 远高于第 3 颗(共同 dataset OOD ceiling,与 ACT / SmolVLA / π0.5 一致)。 Distribution is skewed: 1st-orange success rate >> 3rd-orange. Shared dataset OOD ceiling with ACT / SmolVLA / π0.5.

严谨 success rate 估计 / Rigorous estimate:需 eval_rounds=10 及以上多 round 平均才能定量。单 sample 误差大,不要用单 round 推论。 Rigorous comparison requires eval_rounds=10+. Single-round inferences are misleading.

⚠️ 推理关键配置 / Critical inference setting

1. DDIM hot-swap(已应用于本 ckpt)

DDIM hot-swap (already applied in this ckpt)

config.json 中的关键字段(本 repo 已设置): Key fields in config.json (already configured in this repo):

{
    "noise_scheduler_type": "DDIM",
    "num_inference_steps": 32
}

config.json.bak 保留原始 DDPM 设定,可对比。 config.json.bak keeps the original DDPM settings for reference.

2. DDIM 步数按 GPU 反推 / Per-GPU DDIM step calibration

RTX 4090 + Isaac Sim 实测拟合: RTX 4090 + Isaac Sim measured fit:

inference_ms ≈ 36 + n_steps × 3.3
# overhead 36ms = ResNet18 encode + ZMQ RTT
# per_step 3.3ms = UNet single denoising on 4090

target_inference_ms = effective_chunk × (1000 / step_hz) × safety
                    = 8 × 16.67 × 0.85 = 113 ms (60Hz, safety 0.85)
max_steps = (target - overhead) / per_step ≈ 23 (安全档 / safe)
          = (133 - 36) / 3.3 ≈ 29 (临界档 / critical)

实测 / Measured on 4090: 30 → 2/3 oranges, 32 → 可见 3/3 完整 success, 50 → 爆 3D 算力 OOM-like behavior。 Tested on 4090: 30 → 2/3, 32 → full 3/3 success observed, 50 → 3D rendering choked.

弱卡建议 / Weaker GPU recommendation: 3060 ~10 ms/step,sweet spot ~ 7-8 steps。完整 calibration 见 设计文档

3. Action horizon 配置 / Action horizon setting

DP 模型输出 n_action_steps=8(固定),所以客户端 policy_action_horizon ≥ 8 时 server 自动截到 8。设 16 / 32 / 50 等效。 DP outputs n_action_steps=8 (fixed); the server auto-caps client policy_action_horizon to 8 when ≥ 8, so 16 / 32 / 50 are equivalent at the client side.

--policy_action_horizon=16    # 任意 ≥ 8 都行 / any value ≥ 8 works
--step_hz=60                  # DP 训练 sim rate / DP training sim rate
--episode_length_s=120

使用方法

Usage

1. 启动 LeRobot async policy_server

pip install lerobot
python -m lerobot.async_inference.policy_server --host 0.0.0.0 --port 8080

2. 通过 vitorcen/LeIsaac-Training fork 启动 eval

cd LeIsaac
bash scripts/evaluation/run_eval.sh -- \
    --task=LeIsaac-SO101-PickOrange-v0 \
    --eval_rounds=10 \
    --episode_length_s=120 \
    --step_hz=60 \
    --policy_type=lerobot-diffusion \
    --policy_host=127.0.0.1 --policy_port=8080 \
    --policy_checkpoint_path=wsagi/DiffusionPolicy-PickOrange \
    --policy_action_horizon=16 \
    --policy_language_instruction='Pick up the orange and place it on the plate' \
    --device=cuda --enable_cameras

建议 eval_rounds=10 求 success rate 平均(DP 是 stochastic,单 sample 容易误判)。 Use eval_rounds=10 to average success rate (DP is stochastic; single samples mislead).

局限性

Limitations

  • Stochastic success:每次 diffusion 采样初值不同,相同 ckpt 同 config 也会有 run-to-run 差异。不建议用单 round 结论判断模型好坏。 Stochastic outcomes: each diffusion sampling pass starts from different noise; same ckpt + config gives run-to-run variance. Single-round conclusions are misleading.
  • 第 2 / 3 颗 dataset OOD:与 ACT / SmolVLA / π0.5 共同 ceiling — dataset 60 ep × 每集 1 次"放第 N 颗"演示,第 2/3 颗 state coverage 稀疏。即便 DDIM 32-step 解锁实时性,第 3 颗的成功率仍随颗数衰减Shared 2nd/3rd-orange OOD ceiling. Even with DDIM-32 unlocking realtime, 3rd-orange success rate drops monotonically.
  • GPU bound:DDIM step 数与 GPU 算力强耦合。本 ckpt 默认 32-step 是 4090 优化值;3060/3070 上需降到 ~10 step(性能下降 + 可能再损 success rate)。 GPU-bound: DDIM steps are tightly coupled to GPU compute. The 32-step default is RTX 4090-optimized; weaker GPUs need ~10 steps (with quality tradeoff).
  • 无图像增强、无 domain randomization:sim-only ckpt,真机迁移可能弱。 No image augmentation or domain randomization → real-world transfer is likely weak.

相关

Related

致谢

Acknowledgments

  • LeIsaac 团队 + LightwheelAI 提供任务环境和数据集
  • LeRobot 团队提供 Diffusion Policy 实现 + async inference 框架
  • Diffusion Policy 原始论文:Chi et al. 2023
  • DDIM scheduler swap inspired by HuggingFace diffusers library

引用

Citation

@inproceedings{chi2023diffusion,
  title={Diffusion Policy: Visuomotor Policy Learning via Action Diffusion},
  author={Chi, Cheng and Feng, Siyuan and Du, Yilun and Xu, Zhenjia and Cousineau, Eric and Burchfiel, Benjamin and Song, Shuran},
  booktitle={Robotics: Science and Systems},
  year={2023}
}

@inproceedings{song2021denoising,
  title={Denoising Diffusion Implicit Models},
  author={Song, Jiaming and Meng, Chenlin and Ermon, Stefano},
  booktitle={International Conference on Learning Representations},
  year={2021}
}

License

Apache-2.0

Downloads last month
27
Safetensors
Model size
0.3B params
Tensor type
F32
·
Video Preview
loading

Dataset used to train wsagi/DiffusionPolicy-PickOrange