ACT-PickOrange / README.md
wsagi's picture
Upload README.md with huggingface_hub
c7d8732 verified
metadata
license: apache-2.0
library_name: lerobot
pipeline_tag: robotics
tags:
  - act
  - lerobot
  - so101
  - leisaac
  - pick-orange
  - isaac-sim
datasets:
  - LightwheelAI/leisaac-pick-orange
language:
  - en
base_model: lerobot/act

ACT-PickOrange

针对 LeIsaac SO-101 PickOrange 任务从头训练的 ACT (Action Chunking Transformer) 策略。 An ACT (Action Chunking Transformer) policy trained from scratch on the LeIsaac SO-101 PickOrange task.

ACT-PickOrange — SO-101 in Isaac Sim

🔗 项目仓库 / Project repos

TL;DR

  • 任务 / TaskPick up the orange and place it on the plate — SO-101 单臂依次夹起 3 颗橙子并放盘子。 Single-arm SO-101 picks 3 oranges sequentially and places each on a plate.
  • 数据集 / DatasetLightwheelAI/leisaac-pick-orange — 60 episode 遥操示范。
  • 架构 / Architecture:ACT chunk_size=100,~80M 参数,纯 vision + joint state → action chunk regression(无 LLM / 无 diffusion)。
  • 训练 / Training:batch=8 / lr=1e-5 / 10k step / 关闭图像增强,~5h on RTX 4090。
  • 评测 / Eval:Isaac Sim 5.1 + LeIsaac,1/1 success @ 120s sim time(3 颗全部放盘成功)。
  • ⚠️ 关键 inference 配置 / Critical inference settingpolicy_action_horizon=32。 默认值 16 会让模型卡在第二颗橙子(爪子抖),8 会卡在第一颗。详见下方 Inference caveat

模型亮点

Highlights

  • 复刻 + 验证 shadowHokage/act_policy 的配方,得到等价或更好的成功率。 Reproduces and validates the shadowHokage/act_policy recipe with comparable or better success rate.
  • 暴露了 LeIsaac 默认 policy_action_horizon=16 的隐性陷阱:chunk_size=100 的 ACT 需要 horizon ≥ 32 才能让宏观运动段完整执行,详见 README 的诊断章节。 Exposes a hidden trap in LeIsaac's default policy_action_horizon=16: ACT models with chunk_size=100 require horizon ≥ 32 to let the macro-motion segment of each chunk execute.
  • 无 image augmentation、无 weight decay 调参、无 special trick — 干净的 ACT baseline。

训练配方

Training recipe

项 / Item 值 / Value
Dataset LightwheelAI/leisaac-pick-orange (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz)
Policy act (LeRobot 实现 / LeRobot impl.)
Backbone ResNet18 vision encoder + Transformer encoder/decoder
chunk_size 100
n_action_steps 100
Batch size 8
Optimizer AdamW
Learning rate 1e-5 (constant)
Steps 10,000
Image augmentation disabled
Hardware RTX 4090 (24 GB)
Wall-clock ~5 hours
Recipe credit shadowHokage/act_policy

训练入口脚本在我们的 LeIsaac fork:scripts/training/act/train.shTraining entrypoint script lives in our LeIsaac fork: scripts/training/act/train.sh.

评测结果

Eval results

配置 / Config 第 1 颗 第 2 颗 第 3 颗 Episode 成功率
horizon=8 🔴 卡死(夹住不动) 0/1
horizon=16 ✅ 成功 🟡 爪子抖 / muting 0/1
horizon=32 ✅ 成功 ✅ 折腾后成功 ✅ 折腾后成功 1/1

测试环境 / Test setup:Isaac Sim 5.1,task LeIsaac-SO101-PickOrange-v0episode_length_s=120step_hz=30,dual-cam 观测。 Test setup: Isaac Sim 5.1, task LeIsaac-SO101-PickOrange-v0, episode_length_s=120, step_hz=30, dual-cam observations.

单 sample 警告 / Single-sample caveat:以上 1/1 是单一 episode 结果,未跑统计意义上的多轮平均。但 horizon=8 / 16 / 32 三个失败模式的 monotonic 趋势 (失败 → 部分失败 → 成功) 足以做 falsification — 不是模型问题,是配置问题。 The 1/1 success rate is from a single episode, not statistically averaged. However, the monotonic failure-mode pattern across horizon=8/16/32 (stuck → jitter → success) is sufficient as a falsification: this is a configuration issue, not a model capability issue.

⚠️ 推理关键配置 / Critical inference caveat

ACT chunk_size=100 + 默认 horizon=16 = 第二颗橙子永远过不去。 这不是 ACT 的弱点,是 LeIsaac 默认配置的隐性陷阱。 ACT chunk_size=100 + the default horizon=16 will deadlock on the 2nd orange. This is not an ACT weakness; it's a hidden trap in LeIsaac's default config.

根因 / Root cause

ACT 每个 chunk 输出 100 步动作,是一段完整规划:前 ~10 步是"启动 / 加速",中段 (step 20-80) 才是真正的宏观运动(接近 → 夹起 → 提起 → 运送 → 释放)。LeRobot async client 用直接窗口 (receding horizon),每 policy_action_horizon 步重新查询一次。 Each ACT chunk outputs a 100-step planned trajectory: the first ~10 steps are "startup", and steps 20-80 are the macro-motion (approach → grasp → lift → transport → release). The LeRobot async client uses a sliding window, re-querying every policy_action_horizon steps.

  • horizon=8 → 每次只执行前 8 步就丢掉重 query → 永远在执行"启动段",根本到不了宏观运动 → 卡死。 horizon=8 → only the first 8 startup steps are ever executed → the macro-motion never fires → deadlock.
  • horizon=16 → 够第 1 颗的简单"靠近→夹起",但第 2 颗的"放→后退→接近第 2 颗"复杂段需要更长执行窗 → 模型 OOD + 短 horizon 双重打击 → 抖。 horizon=16 → enough for the simple "approach → grasp" of orange #1, but the post-1st-orange transition demands a longer execution window → OOD state + short horizon compound → jitter.
  • horizon=32 → 给 macro-motion 完整执行机会,1/1 通过。

推荐配置 / Recommended settings

--policy_type=lerobot-act
--policy_action_horizon=32
--policy_checkpoint_path=<path-to-this-model>
--step_hz=30                  # 对齐 dataset 30Hz / matches dataset 30Hz
--episode_length_s=120

使用方法

Usage

1. 启动 LeRobot async policy_server

pip install lerobot
python -m lerobot.async_inference.policy_server --host 0.0.0.0 --port 8080

2. 客户端启动 LeIsaac eval

通过我们的 vitorcen/LeIsaac-Training fork:

cd LeIsaac
bash scripts/evaluation/run_eval.sh -- \
    --task=LeIsaac-SO101-PickOrange-v0 \
    --eval_rounds=3 \
    --episode_length_s=120 \
    --step_hz=30 \
    --policy_type=lerobot-act \
    --policy_host=127.0.0.1 --policy_port=8080 \
    --policy_checkpoint_path=wsagi/ACT-PickOrange \
    --policy_action_horizon=32 \
    --policy_language_instruction="Pick up the orange and place it on the plate" \
    --device=cuda --enable_cameras

run_eval.sh 自动按 user-patience cap 计算 wall-clock timeout,避免无意义等待慢推理。 run_eval.sh auto-computes a user-patience wall-clock timeout so slow inference fails fast.

局限性

Limitations

  • 数据集 OOD on 2nd-3rd orange:dataset 60 episode × 每集 1 次"放第 N 颗"演示。第 2/3 颗的 state coverage 比第 1 颗稀疏一个数量级,model 在那里 monotonic 变难、动作变"折腾"。即便 horizon=32 救了形式上的成功率,精度仍随颗数线性退化。这是数据问题不是模型问题。 Dataset OOD on 2nd–3rd orange: with 60 episodes × 1 "place N-th orange" demo each, state coverage drops by ~1 order of magnitude per orange. Even at horizon=32 the policy gets visibly more jittery on later oranges. This is a data issue, not a model issue.
  • 三个独立架构 (我们的 ACT / Diffusion Policy / SmolVLA / 公开 shadowHokage ACT) 在同一 dataset 上 共同 OOD on 3rd orange — 全 family 共病。
  • 无图像增强、无 domain randomization → real-world transfer 可能弱。本 ckpt 仅用于 Isaac Sim 仿真验证,不保证真机 deploy。 No image augmentation or domain randomization → real-world transfer is likely weak. This checkpoint is only validated in Isaac Sim simulation; real-robot deployment is not guaranteed.

相关

Related

致谢

Acknowledgments

  • LeIsaac 团队 + LightwheelAI 提供任务环境和数据集
  • LeRobot 团队提供 ACT 实现 + async inference 框架
  • shadowHokage 公开训练配方作为复刻基线

引用

Citation

@inproceedings{zhao2023learning,
  title={Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware},
  author={Zhao, Tony Z. and Kumar, Vikash and Levine, Sergey and Finn, Chelsea},
  booktitle={Robotics: Science and Systems},
  year={2023}
}

License

Apache-2.0