File size: 10,217 Bytes
92aa51c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4d77147
 
c7d8732
4d77147
92aa51c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c7d8732
 
92aa51c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c7d8732
92aa51c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c7d8732
92aa51c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
---
license: apache-2.0
library_name: lerobot
pipeline_tag: robotics
tags:
  - act
  - lerobot
  - so101
  - leisaac
  - pick-orange
  - isaac-sim
datasets:
  - LightwheelAI/leisaac-pick-orange
language:
  - en
base_model: lerobot/act
---

# ACT-PickOrange

针对 [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) 任务从头训练的 [ACT (Action Chunking Transformer)](https://tonyzhaozh.github.io/aloha/) 策略。
_An [ACT (Action Chunking Transformer)](https://tonyzhaozh.github.io/aloha/) policy trained from scratch on the [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) task._

![ACT-PickOrange — SO-101 in Isaac Sim](act-pick-orange.png)

**🔗 项目仓库 / Project repos**- [vitorcen/isaaclab-experience](https://github.com/vitorcen/isaaclab-experience) — Isaac Lab + LeIsaac 多策略横评(parent project)
- [vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) — LeIsaac fork(训练脚本 + 设计文档 / training scripts + design docs)

## TL;DR

- **任务 / Task**`Pick up the orange and place it on the plate` — SO-101 单臂依次夹起 3 颗橙子并放盘子。
  _Single-arm SO-101 picks 3 oranges sequentially and places each on a plate._
- **数据集 / Dataset**:[`LightwheelAI/leisaac-pick-orange`](https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange) — 60 episode 遥操示范。
- **架构 / Architecture**:ACT chunk_size=100,~80M 参数,纯 vision + joint state → action chunk regression(无 LLM / 无 diffusion)。
- **训练 / Training**:batch=8 / lr=1e-5 / 10k step / **关闭图像增强**,~5h on RTX 4090。
- **评测 / Eval**:Isaac Sim 5.1 + LeIsaac,**1/1 success @ 120s sim time**(3 颗全部放盘成功)。
- **⚠️ 关键 inference 配置 / Critical inference setting**:`policy_action_horizon=32`。
  默认值 16 会让模型卡在第二颗橙子(爪子抖),8 会卡在第一颗。详见下方 [Inference caveat](#-推理关键配置--critical-inference-caveat)。

## 模型亮点
_Highlights_

- **复刻 + 验证 [shadowHokage/act_policy](https://huggingface.co/shadowHokage/act_policy) 的配方**,得到等价或更好的成功率。
  _Reproduces and validates the [shadowHokage/act_policy](https://huggingface.co/shadowHokage/act_policy) recipe with comparable or better success rate._
- **暴露了 LeIsaac 默认 `policy_action_horizon=16` 的隐性陷阱**:chunk_size=100 的 ACT 需要 horizon ≥ 32 才能让宏观运动段完整执行,详见 README 的诊断章节。
  _Exposes a hidden trap in LeIsaac's default `policy_action_horizon=16`: ACT models with chunk_size=100 require horizon ≥ 32 to let the macro-motion segment of each chunk execute._
- 无 image augmentation、无 weight decay 调参、无 special trick — 干净的 ACT baseline。

## 训练配方
_Training recipe_

| 项 / Item | 值 / Value |
|---|---|
| Dataset | `LightwheelAI/leisaac-pick-orange` (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz) |
| Policy | `act` (LeRobot 实现 / LeRobot impl.) |
| Backbone | ResNet18 vision encoder + Transformer encoder/decoder |
| `chunk_size` | 100 |
| `n_action_steps` | 100 |
| Batch size | 8 |
| Optimizer | AdamW |
| Learning rate | 1e-5 (constant) |
| Steps | 10,000 |
| Image augmentation | **disabled** |
| Hardware | RTX 4090 (24 GB) |
| Wall-clock | ~5 hours |
| Recipe credit | [shadowHokage/act_policy](https://huggingface.co/shadowHokage/act_policy) |

训练入口脚本在我们的 LeIsaac fork:[`scripts/training/act/train.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/act/train.sh)。
_Training entrypoint script lives in our LeIsaac fork: [`scripts/training/act/train.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/act/train.sh)._

## 评测结果
_Eval results_

| 配置 / Config | 第 1 颗 | 第 2 颗 | 第 3 颗 | Episode 成功率 |
|---|---|---|---|---|
| horizon=8  | 🔴 卡死(夹住不动) | — | — | 0/1 |
| horizon=16 | ✅ 成功 | 🟡 爪子抖 / muting | — | 0/1 |
| **horizon=32** | ✅ 成功 | ✅ 折腾后成功 | ✅ 折腾后成功 | **1/1** ✅ |

测试环境 / Test setup:Isaac Sim 5.1,task `LeIsaac-SO101-PickOrange-v0``episode_length_s=120``step_hz=30`,dual-cam 观测。
_Test setup: Isaac Sim 5.1, task `LeIsaac-SO101-PickOrange-v0`, `episode_length_s=120`, `step_hz=30`, dual-cam observations._

**单 sample 警告 / Single-sample caveat**:以上 1/1 是单一 episode 结果,未跑统计意义上的多轮平均。但 horizon=8 / 16 / 32 三个失败模式的 monotonic 趋势 (失败 → 部分失败 → 成功) 足以做 falsification — 不是模型问题,是配置问题。
_The 1/1 success rate is from a single episode, not statistically averaged. However, the monotonic failure-mode pattern across horizon=8/16/32 (stuck → jitter → success) is sufficient as a falsification: this is a configuration issue, not a model capability issue._

## ⚠️ 推理关键配置 / Critical inference caveat

**ACT chunk_size=100 + 默认 horizon=16 = 第二颗橙子永远过不去。** 这不是 ACT 的弱点,是 LeIsaac 默认配置的隐性陷阱。
_**ACT chunk_size=100 + the default horizon=16 will deadlock on the 2nd orange.** This is not an ACT weakness; it's a hidden trap in LeIsaac's default config._

### 根因 / Root cause

ACT 每个 chunk 输出 100 步动作,是一段**完整规划**:前 ~10 步是"启动 / 加速",中段 (step 20-80) 才是真正的**宏观运动**(接近 → 夹起 → 提起 → 运送 → 释放)。LeRobot async client 用直接窗口 (receding horizon),每 `policy_action_horizon` 步重新查询一次。
_Each ACT chunk outputs a 100-step planned trajectory: the first ~10 steps are "startup", and steps 20-80 are the macro-motion (approach → grasp → lift → transport → release). The LeRobot async client uses a sliding window, re-querying every `policy_action_horizon` steps._

- horizon=8 → 每次只执行前 8 步就丢掉重 query → 永远在执行"启动段",**根本到不了宏观运动** → 卡死。
  _horizon=8 → only the first 8 startup steps are ever executed → the macro-motion never fires → deadlock._
- horizon=16 → 够第 1 颗的简单"靠近→夹起",但第 2 颗的"放→后退→接近第 2 颗"复杂段需要更长执行窗 → 模型 OOD + 短 horizon 双重打击 → 抖。
  _horizon=16 → enough for the simple "approach → grasp" of orange #1, but the post-1st-orange transition demands a longer execution window → OOD state + short horizon compound → jitter._
- horizon=32 → 给 macro-motion 完整执行机会,1/1 通过。

### 推荐配置 / Recommended settings

```bash
--policy_type=lerobot-act
--policy_action_horizon=32
--policy_checkpoint_path=<path-to-this-model>
--step_hz=30                  # 对齐 dataset 30Hz / matches dataset 30Hz
--episode_length_s=120
```

## 使用方法
_Usage_

### 1. 启动 LeRobot async policy_server

```bash
pip install lerobot
python -m lerobot.async_inference.policy_server --host 0.0.0.0 --port 8080
```

### 2. 客户端启动 LeIsaac eval

通过我们的 [vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) fork:

```bash
cd LeIsaac
bash scripts/evaluation/run_eval.sh -- \
    --task=LeIsaac-SO101-PickOrange-v0 \
    --eval_rounds=3 \
    --episode_length_s=120 \
    --step_hz=30 \
    --policy_type=lerobot-act \
    --policy_host=127.0.0.1 --policy_port=8080 \
    --policy_checkpoint_path=wsagi/ACT-PickOrange \
    --policy_action_horizon=32 \
    --policy_language_instruction="Pick up the orange and place it on the plate" \
    --device=cuda --enable_cameras
```

`run_eval.sh` 自动按 user-patience cap 计算 wall-clock timeout,避免无意义等待慢推理。
_`run_eval.sh` auto-computes a user-patience wall-clock timeout so slow inference fails fast._

## 局限性
_Limitations_

- **数据集 OOD on 2nd-3rd orange**:dataset 60 episode × 每集 1 次"放第 N 颗"演示。第 2/3 颗的 state coverage 比第 1 颗稀疏一个数量级,model 在那里 monotonic 变难、动作变"折腾"。即便 horizon=32 救了形式上的成功率,**精度仍随颗数线性退化**。这是数据问题不是模型问题。
  _**Dataset OOD on 2nd–3rd orange**: with 60 episodes × 1 "place N-th orange" demo each, state coverage drops by ~1 order of magnitude per orange. Even at horizon=32 the policy gets visibly more jittery on later oranges. This is a data issue, not a model issue._
- 三个独立架构 (我们的 ACT / Diffusion Policy / SmolVLA / 公开 shadowHokage ACT) 在同一 dataset 上 **共同 OOD on 3rd orange** — 全 family 共病。
- 无图像增强、无 domain randomization → real-world transfer 可能弱。本 ckpt 仅用于 Isaac Sim 仿真验证,不保证真机 deploy。
  _No image augmentation or domain randomization → real-world transfer is likely weak. This checkpoint is only validated in Isaac Sim simulation; real-robot deployment is not guaranteed._

## 相关
_Related_

- 同任务对照 / Same-task comparisons:
  - [`wsagi/DiffusionPolicy-PickOrange`](https://huggingface.co/wsagi/DiffusionPolicy-PickOrange) — 自训 Diffusion Policy (267M, DDIM 32-step swap)
  - [`shadowHokage/act_policy`](https://huggingface.co/shadowHokage/act_policy) — 同配方公开 ckpt(我们的复刻参考)
  - [`LightwheelAI/leisaac-pick-orange-v0`](https://huggingface.co/LightwheelAI/leisaac-pick-orange-v0) — GR00T N1.5 SOTA(30s 完成 3 颗)
- 完整训练 + eval 配方:[vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) fork

## 致谢
_Acknowledgments_

- LeIsaac 团队 + LightwheelAI 提供任务环境和数据集
- LeRobot 团队提供 ACT 实现 + async inference 框架
- shadowHokage 公开训练配方作为复刻基线

## 引用
_Citation_

```bibtex
@inproceedings{zhao2023learning,
  title={Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware},
  author={Zhao, Tony Z. and Kumar, Vikash and Levine, Sergey and Finn, Chelsea},
  booktitle={Robotics: Science and Systems},
  year={2023}
}
```

## License

Apache-2.0