Upload README.md with huggingface_hub

6ba1694 verified about 18 hours ago

14.4 kB

	---
	license: apache-2.0
	library_name: lerobot
	pipeline_tag: robotics
	tags:
	- diffusion-policy
	- lerobot
	- so101
	- leisaac
	- pick-orange
	- isaac-sim
	- ddim
	datasets:
	- LightwheelAI/leisaac-pick-orange
	language:
	- en
	---
	# DiffusionPolicy-PickOrange

	针对 [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) 任务从头训练的 LeRobot Diffusion Policy（267M，UNet 1D + ResNet18 vision encoder），已 hot-swap 到 DDIM 32-step inference（不重训，直接改 ckpt `config.json`）。
	_A LeRobot Diffusion Policy (267M, UNet 1D + ResNet18 vision encoder) trained from scratch on the [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) task. DDIM 32-step inference hot-swapped into the ckpt config without retraining._

	![DP eval — SO-101 PickOrange](dp-pick-orange.jpg)

	🔗 项目仓库 / Project repos：

	- [vitorcen/isaaclab-experience](https://github.com/vitorcen/isaaclab-experience) — Isaac Lab + LeIsaac 多策略横评（parent project）
	- [vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) — LeIsaac fork（训练脚本 + 设计文档 / training scripts + design docs）

	## TL;DR

	- 任务 / Task：`Pick up the orange and place it on the plate` — SO-101 单臂依次夹起 3 颗橙子放盘子。
	_Single-arm SO-101 picks 3 oranges sequentially and places each on a plate._
	- 数据集 / Dataset：[`LightwheelAI/leisaac-pick-orange`](https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange) — 60 episode 遥操示范。
	- 架构 / Architecture：Diffusion Policy（UNet 1D denoiser + ResNet18 双相机 vision encoder + 6 DOF state input → 8-step action chunk）。
	- 训练 / Training：100k step，~1.07 GB model.safetensors，DDPM 100-step 训练。
	- 推理 hot-swap / Inference hot-swap：`config.json` 改 `noise_scheduler_type: DDPM → DDIM` + `num_inference_steps: null → 32`，不重训。inference latency 393 ms → 147 ms / chunk，slowdown 2.96x → 1.1x 实时跑得动。
	_Edit `config.json`: `noise_scheduler_type: DDPM → DDIM` + `num_inference_steps: null → 32` — no retraining. Inference latency drops 393 → 147 ms/chunk, slowdown 2.96x → 1.1x, real-time on RTX 4090._
	- 评测 / Eval：Isaac Sim 5.1 + LeIsaac，多轮 eval 见到 0/3 ~ 3/3 全谱概率分布，部分轮能完整放完 3 颗。Diffusion 采样自带 stochasticity，需多轮平均才有意义。
	_Probabilistic outcomes across runs — full distribution from 0/3 to 3/3 observed, with some rounds completing all 3 oranges. Diffusion sampling is inherently stochastic; multi-round averaging required for meaningful comparison._

	## 模型亮点

	_Highlights_

	- DDIM scheduler hot-swap 不重训：DP 论文里 DDPM 100-step 是标配，但 100 步串行采样 → 393 ms/chunk → slowdown 2.96x，4090 实时性吃力。DDIM 是 DDPM 的确定性子集，可以直接 swap config 不重训权重。32-step 是 4090 sweet spot。
	_DDIM is a deterministic subset of DDPM; ckpt config can be swapped without retraining. 32 inference steps is the RTX 4090 sweet spot._
	- 概率性完整 3/3 success：多轮 eval 中有 round 能完整夹起并放置 3 颗橙子。比 ACT 的 deterministic 1/1 输出嘈杂，但说明 DP 在 dataset 边界上能触达 task 完整性，不只是"勉强夹起 1 颗"。
	_Some rounds achieve full 3/3 placement, demonstrating DP reaches task completion (not just first-orange grasp) when the diffusion sample lands favorably._
	- 从头训练，无 pretrained vision backbone：ResNet18 vision encoder 是 LeRobot diffusion 默认 from-scratch 设置，没用 ImageNet pretrain。60 episode 数据撑起一个 visuomotor 任务的极限测试。

	## 训练配方

	_Training recipe_

	\| 项 / Item \| 值 / Value \|
	\| ---------------------------------------- \| ------------------------------------------------------------------------------------------------------------- \|
	\| Dataset \| `LightwheelAI/leisaac-pick-orange` (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz) \|
	\| Policy \| `diffusion` (LeRobot 实现 / LeRobot impl.) \|
	\| Vision encoder \| ResNet18（from scratch / no ImageNet pretrain） \|
	\| Action head \| UNet 1D denoiser \|
	\| `n_action_steps` (输出 / output chunk) \| 8 \|
	\| Noise scheduler (训练 / training) \| DDPM, 100 steps \|
	\| Noise scheduler (推理 / inference) \| DDIM, 32 steps（hot-swapped post-training） \|
	\| Steps \| 100,000 \|
	\| Optimizer \| AdamW \|
	\| Hardware \| RTX 4090 (24 GB) \|
	\| Recipe credit \| LeRobot diffusion baseline,[Diffusion Policy paper (Chi et al. 2023)](https://diffusion-policy.cs.columbia.edu/) \|

	训练入口脚本（在我们的 LeIsaac fork）：[`scripts/training/diffusion_policy/train.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/diffusion_policy/train.sh)。
	_Training entrypoint in our fork: [`scripts/training/diffusion_policy/train.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/diffusion_policy/train.sh)._

	## 评测结果

	_Eval results_

	测试环境 / Test setup：Isaac Sim 5.1，task `LeIsaac-SO101-PickOrange-v0`，`episode_length_s=120`，`step_hz=60`（DP 训练时 sim rate），dual-cam 观测，`policy_action_horizon=16`。
	_Test setup: Isaac Sim 5.1, dual-cam observation, `step_hz=60` matching training, `policy_action_horizon=16`._

	\| 配置 / Config \| 推理延迟 \| 观察到的结果分布 \| 备注 \|
	\| ------------------------------------- \| ------------------------------------- \| ---------------------------------------- \| ------------------------ \|
	\| DDPM 100-step (无 swap) \| 393 ms/chunk, 2.96x slowdown \| ⚠️ 多次 timeout \| 实时性吃力，运动严重滞后 \|
	\| DDIM 32-step (本 ckpt 默认) \| 147 ms/chunk, 1.1x slowdown \| 0/3 / 1/3 / 2/3 / 3/3 全谱出现 \| 部分轮能完整放完 3 颗 ✅ \|

	关键观察 / Key observations：

	1. Diffusion sampling 是 stochastic：同 ckpt 同 config，每次推理从不同噪声起步 → 同 episode 跑多次结果不同。这是架构特性，不是 bug。
	_Stochastic by design: same ckpt + config gives different outcomes per run due to noise initialization._
	2. 部分轮 3/3 完整 success：证明 DP 在 dataset 60-ep 边界内能 reach task completion，不只是单颗 grasp。
	_Some rounds achieve full 3/3 — DP can reach task completion within the 60-episode dataset boundary._
	3. 结果分布偏斜：第 1 颗 success rate 远高于第 3 颗（共同 dataset OOD ceiling，与 ACT / SmolVLA / π0.5 一致）。
	_Distribution is skewed: 1st-orange success rate >> 3rd-orange. Shared dataset OOD ceiling with ACT / SmolVLA / π0.5._

	严谨 success rate 估计 / Rigorous estimate：需 `eval_rounds=10` 及以上多 round 平均才能定量。单 sample 误差大，不要用单 round 推论。
	_Rigorous comparison requires `eval_rounds=10+`. Single-round inferences are misleading._

	## ⚠️ 推理关键配置 / Critical inference setting

	### 1. DDIM hot-swap（已应用于本 ckpt）

	_DDIM hot-swap (already applied in this ckpt)_

	`config.json` 中的关键字段（本 repo 已设置）：
	_Key fields in `config.json` (already configured in this repo):_

	```json
	{
	"noise_scheduler_type": "DDIM",
	"num_inference_steps": 32
	}
	```

	`config.json.bak` 保留原始 DDPM 设定，可对比。
	_`config.json.bak` keeps the original DDPM settings for reference._

	### 2. DDIM 步数按 GPU 反推 / Per-GPU DDIM step calibration

	RTX 4090 + Isaac Sim 实测拟合：
	_RTX 4090 + Isaac Sim measured fit:_

	```
	inference_ms ≈ 36 + n_steps × 3.3
	# overhead 36ms = ResNet18 encode + ZMQ RTT
	# per_step 3.3ms = UNet single denoising on 4090

	target_inference_ms = effective_chunk × (1000 / step_hz) × safety
	= 8 × 16.67 × 0.85 = 113 ms (60Hz, safety 0.85)
	max_steps = (target - overhead) / per_step ≈ 23 (安全档 / safe)
	= (133 - 36) / 3.3 ≈ 29 (临界档 / critical)
	```

	实测 / Measured on 4090: 30 → 2/3 oranges, 32 → 可见 3/3 完整 success, 50 → 爆 3D 算力 OOM-like behavior。
	_Tested on 4090: 30 → 2/3, 32 → full 3/3 success observed, 50 → 3D rendering choked._

	弱卡建议 / Weaker GPU recommendation: 3060 ~10 ms/step，sweet spot ~ 7-8 steps。完整 calibration 见 [设计文档](https://github.com/vitorcen/LeIsaac-Training/blob/main/docs/training/dp_inference_speedup_and_dynamic_timeout.html)。

	### 3. Action horizon 配置 / Action horizon setting

	DP 模型输出 `n_action_steps=8`（固定），所以客户端 `policy_action_horizon` ≥ 8 时 server 自动截到 8。设 16 / 32 / 50 等效。
	_DP outputs `n_action_steps=8` (fixed); the server auto-caps client `policy_action_horizon` to 8 when ≥ 8, so 16 / 32 / 50 are equivalent at the client side._

	```bash
	--policy_action_horizon=16 # 任意 ≥ 8 都行 / any value ≥ 8 works
	--step_hz=60 # DP 训练 sim rate / DP training sim rate
	--episode_length_s=120
	```

	## 使用方法

	_Usage_

	### 1. 启动 LeRobot async policy_server

	```bash
	pip install lerobot
	python -m lerobot.async_inference.policy_server --host 0.0.0.0 --port 8080
	```

	### 2. 通过 [vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) fork 启动 eval

	```bash
	cd LeIsaac
	bash scripts/evaluation/run_eval.sh -- \
	--task=LeIsaac-SO101-PickOrange-v0 \
	--eval_rounds=10 \
	--episode_length_s=120 \
	--step_hz=60 \
	--policy_type=lerobot-diffusion \
	--policy_host=127.0.0.1 --policy_port=8080 \
	--policy_checkpoint_path=wsagi/DiffusionPolicy-PickOrange \
	--policy_action_horizon=16 \
	--policy_language_instruction='Pick up the orange and place it on the plate' \
	--device=cuda --enable_cameras
	```

	建议 `eval_rounds=10` 求 success rate 平均（DP 是 stochastic，单 sample 容易误判）。
	_Use `eval_rounds=10` to average success rate (DP is stochastic; single samples mislead)._

	## 局限性

	_Limitations_

	- Stochastic success：每次 diffusion 采样初值不同，相同 ckpt 同 config 也会有 run-to-run 差异。不建议用单 round 结论判断模型好坏。
	_Stochastic outcomes: each diffusion sampling pass starts from different noise; same ckpt + config gives run-to-run variance. Single-round conclusions are misleading._
	- 第 2 / 3 颗 dataset OOD：与 ACT / SmolVLA / π0.5 共同 ceiling — dataset 60 ep × 每集 1 次"放第 N 颗"演示，第 2/3 颗 state coverage 稀疏。即便 DDIM 32-step 解锁实时性，第 3 颗的成功率仍随颗数衰减。
	_Shared 2nd/3rd-orange OOD ceiling. Even with DDIM-32 unlocking realtime, 3rd-orange success rate drops monotonically._
	- GPU bound：DDIM step 数与 GPU 算力强耦合。本 ckpt 默认 32-step 是 4090 优化值；3060/3070 上需降到 ~10 step（性能下降 + 可能再损 success rate）。
	_GPU-bound: DDIM steps are tightly coupled to GPU compute. The 32-step default is RTX 4090-optimized; weaker GPUs need ~10 steps (with quality tradeoff)._
	- 无图像增强、无 domain randomization：sim-only ckpt，真机迁移可能弱。
	_No image augmentation or domain randomization → real-world transfer is likely weak._

	## 相关

	_Related_

	- 同任务对照 / Same-task comparisons：
	- [`wsagi/ACT-PickOrange`](https://huggingface.co/wsagi/ACT-PickOrange) — 自训 ACT (~80M)，1/1 deterministic success @ horizon=32
	- [`shadowHokage/act_policy`](https://huggingface.co/shadowHokage/act_policy) — 社区 ACT，1/1 (deterministic)
	- [`LightwheelAI/leisaac-pick-orange-v0`](https://huggingface.co/LightwheelAI/leisaac-pick-orange-v0) — GR00T N1.5 SOTA (~3B)，~30s 完成 3 颗
	- 完整训练 + eval 配方：[vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) fork
	- 设计文档 / Design doc：[`docs/training/dp_inference_speedup_and_dynamic_timeout.html`](https://github.com/vitorcen/LeIsaac-Training/blob/main/docs/training/dp_inference_speedup_and_dynamic_timeout.html) — DDIM swap + dynamic timeout 完整 postmortem（含 SVG 拟合曲线）

	## 致谢

	_Acknowledgments_

	- LeIsaac 团队 + LightwheelAI 提供任务环境和数据集
	- LeRobot 团队提供 Diffusion Policy 实现 + async inference 框架
	- Diffusion Policy 原始论文：[Chi et al. 2023](https://diffusion-policy.cs.columbia.edu/)
	- DDIM scheduler swap inspired by HuggingFace `diffusers` library

	## 引用

	_Citation_

	```bibtex
	@inproceedings{chi2023diffusion,
	title={Diffusion Policy: Visuomotor Policy Learning via Action Diffusion},
	author={Chi, Cheng and Feng, Siyuan and Du, Yilun and Xu, Zhenjia and Cousineau, Eric and Burchfiel, Benjamin and Song, Shuran},
	booktitle={Robotics: Science and Systems},
	year={2023}
	}

	@inproceedings{song2021denoising,
	title={Denoising Diffusion Implicit Models},
	author={Song, Jiaming and Meng, Chenlin and Ermon, Stefano},
	booktitle={International Conference on Learning Representations},
	year={2021}
	}
	```

	## License

	Apache-2.0