Robotics
LeRobot
Safetensors
English
pi05
openpi
so101
leisaac
pick-orange
isaac-sim
flow-matching
vla
negative-result
Instructions to use wsagi/Pi0.5-PickOrange with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use wsagi/Pi0.5-PickOrange with LeRobot:
- Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| library_name: lerobot | |
| pipeline_tag: robotics | |
| tags: | |
| - pi05 | |
| - openpi | |
| - lerobot | |
| - so101 | |
| - leisaac | |
| - pick-orange | |
| - isaac-sim | |
| - flow-matching | |
| - vla | |
| - negative-result | |
| datasets: | |
| - LightwheelAI/leisaac-pick-orange | |
| language: | |
| - en | |
| # Pi0.5-PickOrange — π0.5 PyTorch expert-only FT (⚠️ negative result) | |
| **⚠️ 这是一个有据可查的失败实验(已公开作为反面教材 / educational negative result)**: | |
| 20-round strict benchmark = **1/60 oranges (1.7%)**,在 [STRICT_LEADERBOARD](https://github.com/vitorcen/isaaclab-experience/blob/main/scripts/benchmark/STRICT_LEADERBOARD.md) 上末位,**比同任务的 SmolVLA 低 15 倍**。发布的目的是把"为什么 π0.5 在 LeIsaac PickOrange 上学不会"这件事用 ckpt 本身固定下来,供后续研究者复现 / 否证。 | |
| _This is a **deliberately published failure** — a documented negative result. 20-round strict eval = 1/60 oranges (1.7%), last place on the strict leaderboard, **15× worse than SmolVLA** on the same task. Published to anchor the "why π0.5 doesn't learn this task" claim with a real checkpoint, so others can reproduce / refute._ | |
| **🔗 项目仓库 / Project repos**: | |
| - [vitorcen/isaaclab-experience](https://github.com/vitorcen/isaaclab-experience) — Isaac Lab + LeIsaac 多策略横评(parent project) | |
| - [vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) — LeIsaac fork(训练脚本 + 设计文档 / training scripts + design docs) | |
| - 完整 negative report HTML: [`pi05_pytorch_expert_ft_negative.html`](https://github.com/vitorcen/LeIsaac-Training/blob/main/docs/training/pi05_pytorch_expert_ft_negative.html) | |
| ## 🎥 失败现场录屏 / The failure, on video | |
| <video controls src="https://huggingface.co/wsagi/Pi0.5-PickOrange/resolve/main/Pi0.5-PickOrange.mp4"></video> | |
| _π0.5 expert-FT ckpt 在 LeIsaac PickOrange 上的真实录屏:机械臂持续运动满 180s,橙子一颗未入盘(**0/3**)。这不是 bug,是 SigLIP@224 vision bottleneck 下"看不见橙子"的真实表现——和成功模型(GR00T-N1.7 / ACT)形成直接对照。_ | |
| _Real screen capture: the arm keeps moving for the full 180s but places **0/3** oranges. Not a bug — the genuine behavior under the SigLIP@224 vision bottleneck. Compare against the models that actually succeed (GR00T-N1.7 / ACT) below._ | |
| ## TL;DR | |
| | Item | Value | | |
| |------|-------| | |
| | **任务 / Task** | SO-101 PickOrange — 单臂依次夹起 3 颗橙子放盘子 | | |
| | **数据集 / Dataset** | [`LightwheelAI/leisaac-pick-orange`](https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange) (60 demos, 30Hz) | | |
| | **架构 / Architecture** | π0.5 = PaliGemma-2B VLM (frozen) + Gemma-300M action expert (trainable) + flow-matching | | |
| | **可训参数 / Trainable params** | 693M (gemma_expert layers 425M + lm_head 263M + norm 3M) | | |
| | **配方 / Recipe** | `train_expert_only=true`, `freeze_vision_encoder=true`, bf16, lr=2.5e-5, chunk=50, batch=1 + grad_accum=8, 10k steps | | |
| | **vision input** | **SigLIP @ 224×224**(PaliGemma 硬编码,**主嫌**) | | |
| | **Strict benchmark** | **1/60 oranges (1.7%)** — 20 rounds × 3 ep × 1 orange/ep, ckpt-2000 | | |
| | **σ(5-round)** | 0.50 / 15 (3.3%) — worst-case (μ-1σ) = **-0.25 / 15** | | |
| | **Leaderboard 排名 / Rank** | **6/6(末位)**,低 SmolVLA 15× | | |
| | **Inference latency** | ~108 ms / chunk (50-step flow matching, RTX 4090) | | |
| | **GPU hours** | ~3.5 h on RTX Pro 6000 (bf16, ZeRO-2 offload) | | |
| ## 为什么发布失败模型 / Why publish a failed model | |
| 科研里负面结果通常被丢进抽屉,但其实和成功一样有价值: | |
| 1. **锁定假设**:让后续研究者可以 load 这个 ckpt 直接验证"是不是这套配方在这个数据集上真的不行",避免反复踩同样的坑。 | |
| 2. **隔离变量**:训练侧的 dataloader / preprocessor / postprocessor / camera mapping / freeze 配置都已经调通(基础设施 4 个 bug 修完),失败不是 infra 噪声,而是**架构 vs 任务**的真实信号。 | |
| 3. **量化"偶尔的 1 只"**:用户最初看到 3-round 跑出 2/9 觉得有希望,但 20-round 1/60 证明那只是 Bernoulli outlier (p≈1.7%)。 | |
| _Negative results matter as much as positive ones. This ckpt lets others verify the failure mode without re-spending the GPU hours._ | |
| ## 根因分析(主嫌 80%)/ Root cause (main suspect, 80% confidence) | |
| **PaliGemma-2B 的 SigLIP vision encoder 硬编码 224×224 输入**,而 LeIsaac 原生 640×480 → 2.86× downscale 后橙子只剩 **10–17 px**,**≤1 个 SigLIP patch (14px)**。 | |
| 对比同任务上 work 的模型: | |
| | Model | Vision encoder | Input res | Orange size after resize | Result | | |
| |-------|---------------|-----------|--------------------------|--------| | |
| | GR00T-N1.7 | Eagle-2 ViT | 448 | 22-34 px (1.5–2.4 patch) | 68.3% ✅ | | |
| | SmolVLA | SigLIP | 512 | 24-40 px (1.7–2.9 patch) | 25.0% ✅ | | |
| | **π0.5 (this)** | **SigLIP** | **224** | **10-17 px (≤1 patch)** | **1.7% ❌** | | |
| → 橙子在 vision token 上几乎不可见,"freeze 整个 PaliGemma + 只训 action expert"再多 token 也无法补救 vision bottleneck。 | |
| _PaliGemma's SigLIP is hardcoded to 224×224 — after downscaling LeIsaac's native 640×480, oranges shrink to ≤1 SigLIP patch. No amount of expert-only training can recover information already lost at the vision encoder._ | |
| ## 训练配方 / Training recipe | |
| ```bash | |
| # 训练入口 / training entry | |
| bash LeIsaac/scripts/training/pi05_pt/train.sh | |
| # 关键 flags / key flags | |
| --policy.train_expert_only=true # freeze PaliGemma, train only gemma_expert | |
| --policy.freeze_vision_encoder=true # explicit redundant lock | |
| --policy.gradient_checkpointing=true # 24GB VRAM under bf16 | |
| --policy.dtype=bfloat16 | |
| --policy.chunk_size=50 | |
| --policy.n_action_steps=50 | |
| --policy.max_state_dim=32 | |
| --policy.max_action_dim=32 | |
| --policy.optimizer_lr=2.5e-5 | |
| --steps=10000 --save_freq=1000 --batch_size=1 | |
| ``` | |
| Camera rename (LeIsaac 2-cam → π0.5 3-cam, missing `left_wrist` auto-padded inside modeling_pi05.py:1195): | |
| ```python | |
| rename_map = { | |
| "observation.images.front": "observation.images.base_0_rgb", | |
| "observation.images.wrist": "observation.images.right_wrist_0_rgb", | |
| } | |
| ``` | |
| ## 复现 / Reproduce | |
| ```python | |
| from lerobot.policies.pi05 import PI05Policy | |
| policy = PI05Policy.from_pretrained("wsagi/Pi0.5-PickOrange") | |
| # 然后接 LeIsaac Isaac Sim eval pipeline | |
| # Then plug into the LeIsaac Isaac Sim eval pipeline: | |
| # scripts/benchmark/run_one_strict.sh | |
| ``` | |
| 20-round strict benchmark(distribution, 20 rounds × 3 episodes): | |
| | P(placed=0) | P(placed=1) | P(placed=2) | P(placed=3) | E(🍊)/ep | | |
| |-------------|-------------|-------------|-------------|----------| | |
| | **95% (57/60)** | **5% (3/60)** | 0% | 0% | **0.05** | | |
| 19/20 rounds 全 0/3,1 round 出现 1/3(Episode 8: placed=[F, T, F])。Bernoulli noise distribution,无 task-completion signal。 | |
| ## 已 sweep 过的 ckpt / Checkpoints evaluated | |
| 10k 训练每 1k 存一个,13 个 ckpt(500/1k/1.5k/.../10k)全 3-round 横评 = **1/60 oranges across 13 ckpts**,**全部 0/9 或 1/9**,无单调收敛迹象。ckpt-2000 是 3-round 抓到 2/9 的那个(最高),20-round 跑下来回归到 1/60,证实是 noise outlier 不是 signal。 | |
| ## 何时该用 / 不该用 / When (not) to use | |
| ❌ **不要在生产环境使用** — 1.7% success rate 没有 task-completion 价值 | |
| ✅ **可以用作**: | |
| - π0.5 在低分辨率 VLM bottleneck 任务上的 baseline reference | |
| - "freeze VLM + train expert only" 配方失败案例的复现 ckpt | |
| - LeIsaac eval pipeline 的 π0.5 wire 协议验证 fixture | |
| ## 替代方案 / Alternatives (better on same task) | |
| 这些是**同任务上真能把橙子夹进盘子**的模型 — 想看成功的就去这里 / models that actually place the orange: | |
| | Model | Strict | Where | | |
| |-------|--------|-------| | |
| | 🥇 GR00T-N1.7 (self-trained) | **68.3%** (2.05/3) | [`wsagi/GR00T-N1.7-PickOrange`](https://huggingface.co/wsagi/GR00T-N1.7-PickOrange) | | |
| | 🥈 ACT (self, h=70) | **43.3%** (1.30/3) | [`wsagi/ACT-PickOrange`](https://huggingface.co/wsagi/ACT-PickOrange) | | |
| | 🥉 SmolVLA (self-trained) | 25.0% | wsagi (待发布 / pending) | | |
| | Diffusion Policy DDIM | 概率性 3/3 / stochastic | [`wsagi/DiffusionPolicy-PickOrange`](https://huggingface.co/wsagi/DiffusionPolicy-PickOrange) | | |
| ## License & Attribution | |
| - Apache-2.0 | |
| - Base model: `lerobot/pi05_base` (Physical Intelligence × LeRobot) | |
| - Dataset: [`LightwheelAI/leisaac-pick-orange`](https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange) | |
| - Trained on RTX Pro 6000 96GB | |
| - Evaluated in Isaac Sim 5.1 + LeIsaac | |