Upload README.md with huggingface_hub

c7d8732 verified 1 day ago

10.2 kB

	---
	license: apache-2.0
	library_name: lerobot
	pipeline_tag: robotics
	tags:
	- act
	- lerobot
	- so101
	- leisaac
	- pick-orange
	- isaac-sim
	datasets:
	- LightwheelAI/leisaac-pick-orange
	language:
	- en
	base_model: lerobot/act
	---

	# ACT-PickOrange

	针对 [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) 任务从头训练的 [ACT (Action Chunking Transformer)](https://tonyzhaozh.github.io/aloha/) 策略。
	_An [ACT (Action Chunking Transformer)](https://tonyzhaozh.github.io/aloha/) policy trained from scratch on the [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) task._

	![ACT-PickOrange — SO-101 in Isaac Sim](act-pick-orange.png)

	🔗 项目仓库 / Project repos：
	- [vitorcen/isaaclab-experience](https://github.com/vitorcen/isaaclab-experience) — Isaac Lab + LeIsaac 多策略横评（parent project）
	- [vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) — LeIsaac fork（训练脚本 + 设计文档 / training scripts + design docs）

	## TL;DR

	- 任务 / Task：`Pick up the orange and place it on the plate` — SO-101 单臂依次夹起 3 颗橙子并放盘子。
	_Single-arm SO-101 picks 3 oranges sequentially and places each on a plate._
	- 数据集 / Dataset：[`LightwheelAI/leisaac-pick-orange`](https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange) — 60 episode 遥操示范。
	- 架构 / Architecture：ACT chunk_size=100，~80M 参数，纯 vision + joint state → action chunk regression（无 LLM / 无 diffusion）。
	- 训练 / Training：batch=8 / lr=1e-5 / 10k step / 关闭图像增强，~5h on RTX 4090。
	- 评测 / Eval：Isaac Sim 5.1 + LeIsaac，1/1 success @ 120s sim time（3 颗全部放盘成功）。
	- ⚠️ 关键 inference 配置 / Critical inference setting：`policy_action_horizon=32`。
	默认值 16 会让模型卡在第二颗橙子（爪子抖），8 会卡在第一颗。详见下方 [Inference caveat](#-推理关键配置--critical-inference-caveat)。

	## 模型亮点
	_Highlights_

	- 复刻 + 验证 [shadowHokage/act_policy](https://huggingface.co/shadowHokage/act_policy) 的配方，得到等价或更好的成功率。
	_Reproduces and validates the [shadowHokage/act_policy](https://huggingface.co/shadowHokage/act_policy) recipe with comparable or better success rate._
	- 暴露了 LeIsaac 默认 `policy_action_horizon=16` 的隐性陷阱：chunk_size=100 的 ACT 需要 horizon ≥ 32 才能让宏观运动段完整执行，详见 README 的诊断章节。
	_Exposes a hidden trap in LeIsaac's default `policy_action_horizon=16`: ACT models with chunk_size=100 require horizon ≥ 32 to let the macro-motion segment of each chunk execute._
	- 无 image augmentation、无 weight decay 调参、无 special trick — 干净的 ACT baseline。

	## 训练配方
	_Training recipe_

	\| 项 / Item \| 值 / Value \|
	\|---\|---\|
	\| Dataset \| `LightwheelAI/leisaac-pick-orange` (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz) \|
	\| Policy \| `act` (LeRobot 实现 / LeRobot impl.) \|
	\| Backbone \| ResNet18 vision encoder + Transformer encoder/decoder \|
	\| `chunk_size` \| 100 \|
	\| `n_action_steps` \| 100 \|
	\| Batch size \| 8 \|
	\| Optimizer \| AdamW \|
	\| Learning rate \| 1e-5 (constant) \|
	\| Steps \| 10,000 \|
	\| Image augmentation \| disabled \|
	\| Hardware \| RTX 4090 (24 GB) \|
	\| Wall-clock \| ~5 hours \|
	\| Recipe credit \| [shadowHokage/act_policy](https://huggingface.co/shadowHokage/act_policy) \|

	训练入口脚本在我们的 LeIsaac fork：[`scripts/training/act/train.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/act/train.sh)。
	_Training entrypoint script lives in our LeIsaac fork: [`scripts/training/act/train.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/act/train.sh)._

	## 评测结果
	_Eval results_

	\| 配置 / Config \| 第 1 颗 \| 第 2 颗 \| 第 3 颗 \| Episode 成功率 \|
	\|---\|---\|---\|---\|---\|
	\| horizon=8 \| 🔴 卡死（夹住不动） \| — \| — \| 0/1 \|
	\| horizon=16 \| ✅ 成功 \| 🟡 爪子抖 / muting \| — \| 0/1 \|
	\| horizon=32 \| ✅ 成功 \| ✅ 折腾后成功 \| ✅ 折腾后成功 \| 1/1 ✅ \|

	测试环境 / Test setup：Isaac Sim 5.1，task `LeIsaac-SO101-PickOrange-v0`，`episode_length_s=120`，`step_hz=30`，dual-cam 观测。
	_Test setup: Isaac Sim 5.1, task `LeIsaac-SO101-PickOrange-v0`, `episode_length_s=120`, `step_hz=30`, dual-cam observations._

	单 sample 警告 / Single-sample caveat：以上 1/1 是单一 episode 结果，未跑统计意义上的多轮平均。但 horizon=8 / 16 / 32 三个失败模式的 monotonic 趋势 (失败 → 部分失败 → 成功) 足以做 falsification — 不是模型问题，是配置问题。
	_The 1/1 success rate is from a single episode, not statistically averaged. However, the monotonic failure-mode pattern across horizon=8/16/32 (stuck → jitter → success) is sufficient as a falsification: this is a configuration issue, not a model capability issue._

	## ⚠️ 推理关键配置 / Critical inference caveat

	ACT chunk_size=100 + 默认 horizon=16 = 第二颗橙子永远过不去。这不是 ACT 的弱点，是 LeIsaac 默认配置的隐性陷阱。
	_ACT chunk_size=100 + the default horizon=16 will deadlock on the 2nd orange. This is not an ACT weakness; it's a hidden trap in LeIsaac's default config._

	### 根因 / Root cause

	ACT 每个 chunk 输出 100 步动作，是一段完整规划：前 ~10 步是"启动 / 加速"，中段 (step 20-80) 才是真正的宏观运动（接近 → 夹起 → 提起 → 运送 → 释放）。LeRobot async client 用直接窗口 (receding horizon)，每 `policy_action_horizon` 步重新查询一次。
	_Each ACT chunk outputs a 100-step planned trajectory: the first ~10 steps are "startup", and steps 20-80 are the macro-motion (approach → grasp → lift → transport → release). The LeRobot async client uses a sliding window, re-querying every `policy_action_horizon` steps._

	- horizon=8 → 每次只执行前 8 步就丢掉重 query → 永远在执行"启动段"，根本到不了宏观运动 → 卡死。
	_horizon=8 → only the first 8 startup steps are ever executed → the macro-motion never fires → deadlock._
	- horizon=16 → 够第 1 颗的简单"靠近→夹起"，但第 2 颗的"放→后退→接近第 2 颗"复杂段需要更长执行窗 → 模型 OOD + 短 horizon 双重打击 → 抖。
	_horizon=16 → enough for the simple "approach → grasp" of orange #1, but the post-1st-orange transition demands a longer execution window → OOD state + short horizon compound → jitter._
	- horizon=32 → 给 macro-motion 完整执行机会，1/1 通过。

	### 推荐配置 / Recommended settings

	```bash
	--policy_type=lerobot-act
	--policy_action_horizon=32
	--policy_checkpoint_path=<path-to-this-model>
	--step_hz=30 # 对齐 dataset 30Hz / matches dataset 30Hz
	--episode_length_s=120
	```

	## 使用方法
	_Usage_

	### 1. 启动 LeRobot async policy_server

	```bash
	pip install lerobot
	python -m lerobot.async_inference.policy_server --host 0.0.0.0 --port 8080
	```

	### 2. 客户端启动 LeIsaac eval

	通过我们的 [vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) fork：

	```bash
	cd LeIsaac
	bash scripts/evaluation/run_eval.sh -- \
	--task=LeIsaac-SO101-PickOrange-v0 \
	--eval_rounds=3 \
	--episode_length_s=120 \
	--step_hz=30 \
	--policy_type=lerobot-act \
	--policy_host=127.0.0.1 --policy_port=8080 \
	--policy_checkpoint_path=wsagi/ACT-PickOrange \
	--policy_action_horizon=32 \
	--policy_language_instruction="Pick up the orange and place it on the plate" \
	--device=cuda --enable_cameras
	```

	`run_eval.sh` 自动按 user-patience cap 计算 wall-clock timeout，避免无意义等待慢推理。
	_`run_eval.sh` auto-computes a user-patience wall-clock timeout so slow inference fails fast._

	## 局限性
	_Limitations_

	- 数据集 OOD on 2nd-3rd orange：dataset 60 episode × 每集 1 次"放第 N 颗"演示。第 2/3 颗的 state coverage 比第 1 颗稀疏一个数量级，model 在那里 monotonic 变难、动作变"折腾"。即便 horizon=32 救了形式上的成功率，精度仍随颗数线性退化。这是数据问题不是模型问题。
	_Dataset OOD on 2nd–3rd orange: with 60 episodes × 1 "place N-th orange" demo each, state coverage drops by ~1 order of magnitude per orange. Even at horizon=32 the policy gets visibly more jittery on later oranges. This is a data issue, not a model issue._
	- 三个独立架构 (我们的 ACT / Diffusion Policy / SmolVLA / 公开 shadowHokage ACT) 在同一 dataset 上共同 OOD on 3rd orange — 全 family 共病。
	- 无图像增强、无 domain randomization → real-world transfer 可能弱。本 ckpt 仅用于 Isaac Sim 仿真验证，不保证真机 deploy。
	_No image augmentation or domain randomization → real-world transfer is likely weak. This checkpoint is only validated in Isaac Sim simulation; real-robot deployment is not guaranteed._

	## 相关
	_Related_

	- 同任务对照 / Same-task comparisons：
	- [`wsagi/DiffusionPolicy-PickOrange`](https://huggingface.co/wsagi/DiffusionPolicy-PickOrange) — 自训 Diffusion Policy (267M, DDIM 32-step swap)
	- [`shadowHokage/act_policy`](https://huggingface.co/shadowHokage/act_policy) — 同配方公开 ckpt（我们的复刻参考）
	- [`LightwheelAI/leisaac-pick-orange-v0`](https://huggingface.co/LightwheelAI/leisaac-pick-orange-v0) — GR00T N1.5 SOTA（30s 完成 3 颗）
	- 完整训练 + eval 配方：[vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) fork

	## 致谢
	_Acknowledgments_

	- LeIsaac 团队 + LightwheelAI 提供任务环境和数据集
	- LeRobot 团队提供 ACT 实现 + async inference 框架
	- shadowHokage 公开训练配方作为复刻基线

	## 引用
	_Citation_

	```bibtex
	@inproceedings{zhao2023learning,
	title={Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware},
	author={Zhao, Tony Z. and Kumar, Vikash and Levine, Sergey and Finn, Chelsea},
	booktitle={Robotics: Science and Systems},
	year={2023}
	}
	```

	## License

	Apache-2.0