Pi0.5-PickOrange — π0.5 PyTorch expert-only FT (⚠️ negative result)

⚠️ 这是一个有据可查的失败实验（已公开作为反面教材 / educational negative result）： 20-round strict benchmark = 1/60 oranges (1.7%)，在 STRICT_LEADERBOARD 上末位，比同任务的 SmolVLA 低 15 倍。发布的目的是把"为什么 π0.5 在 LeIsaac PickOrange 上学不会"这件事用 ckpt 本身固定下来，供后续研究者复现 / 否证。

This is a deliberately published failure — a documented negative result. 20-round strict eval = 1/60 oranges (1.7%), last place on the strict leaderboard, 15× worse than SmolVLA on the same task. Published to anchor the "why π0.5 doesn't learn this task" claim with a real checkpoint, so others can reproduce / refute.

🔗 项目仓库 / Project repos：

vitorcen/isaaclab-experience — Isaac Lab + LeIsaac 多策略横评（parent project）
vitorcen/LeIsaac-Training — LeIsaac fork（训练脚本 + 设计文档 / training scripts + design docs）
完整 negative report HTML: pi05_pytorch_expert_ft_negative.html

🎥 失败现场录屏 / The failure, on video

π0.5 expert-FT ckpt 在 LeIsaac PickOrange 上的真实录屏：机械臂持续运动满 180s，橙子一颗未入盘（0/3）。这不是 bug，是 SigLIP@224 vision bottleneck 下"看不见橙子"的真实表现——和成功模型（GR00T-N1.7 / ACT）形成直接对照。 Real screen capture: the arm keeps moving for the full 180s but places 0/3 oranges. Not a bug — the genuine behavior under the SigLIP@224 vision bottleneck. Compare against the models that actually succeed (GR00T-N1.7 / ACT) below.

TL;DR

Item	Value
任务 / Task	SO-101 PickOrange — 单臂依次夹起 3 颗橙子放盘子
数据集 / Dataset	`LightwheelAI/leisaac-pick-orange` (60 demos, 30Hz)
架构 / Architecture	π0.5 = PaliGemma-2B VLM (frozen) + Gemma-300M action expert (trainable) + flow-matching
可训参数 / Trainable params	693M (gemma_expert layers 425M + lm_head 263M + norm 3M)
配方 / Recipe	`train_expert_only=true`, `freeze_vision_encoder=true`, bf16, lr=2.5e-5, chunk=50, batch=1 + grad_accum=8, 10k steps
vision input	SigLIP @ 224×224（PaliGemma 硬编码，主嫌）
Strict benchmark	1/60 oranges (1.7%) — 20 rounds × 3 ep × 1 orange/ep, ckpt-2000
σ(5-round)	0.50 / 15 (3.3%) — worst-case (μ-1σ) = -0.25 / 15
Leaderboard 排名 / Rank	6/6（末位），低 SmolVLA 15×
Inference latency	~108 ms / chunk (50-step flow matching, RTX 4090)
GPU hours	~3.5 h on RTX Pro 6000 (bf16, ZeRO-2 offload)

为什么发布失败模型 / Why publish a failed model

科研里负面结果通常被丢进抽屉，但其实和成功一样有价值：

锁定假设：让后续研究者可以 load 这个 ckpt 直接验证"是不是这套配方在这个数据集上真的不行"，避免反复踩同样的坑。
隔离变量：训练侧的 dataloader / preprocessor / postprocessor / camera mapping / freeze 配置都已经调通（基础设施 4 个 bug 修完），失败不是 infra 噪声，而是架构 vs 任务的真实信号。
**量化"偶尔的 1 只"**：用户最初看到 3-round 跑出 2/9 觉得有希望，但 20-round 1/60 证明那只是 Bernoulli outlier (p≈1.7%)。

Negative results matter as much as positive ones. This ckpt lets others verify the failure mode without re-spending the GPU hours.

根因分析（主嫌 80%）/ Root cause (main suspect, 80% confidence)

PaliGemma-2B 的 SigLIP vision encoder 硬编码 224×224 输入，而 LeIsaac 原生 640×480 → 2.86× downscale 后橙子只剩 10–17 px，**≤1 个 SigLIP patch (14px)**。

对比同任务上 work 的模型：

Model	Vision encoder	Input res	Orange size after resize	Result
GR00T-N1.7	Eagle-2 ViT	448	22-34 px (1.5–2.4 patch)	68.3% ✅
SmolVLA	SigLIP	512	24-40 px (1.7–2.9 patch)	25.0% ✅
π0.5 (this)	SigLIP	224	10-17 px (≤1 patch)	1.7% ❌

→ 橙子在 vision token 上几乎不可见，"freeze 整个 PaliGemma + 只训 action expert"再多 token 也无法补救 vision bottleneck。

PaliGemma's SigLIP is hardcoded to 224×224 — after downscaling LeIsaac's native 640×480, oranges shrink to ≤1 SigLIP patch. No amount of expert-only training can recover information already lost at the vision encoder.

训练配方 / Training recipe

# 训练入口 / training entry
bash LeIsaac/scripts/training/pi05_pt/train.sh

# 关键 flags / key flags
--policy.train_expert_only=true       # freeze PaliGemma, train only gemma_expert
--policy.freeze_vision_encoder=true   # explicit redundant lock
--policy.gradient_checkpointing=true  # 24GB VRAM under bf16
--policy.dtype=bfloat16
--policy.chunk_size=50
--policy.n_action_steps=50
--policy.max_state_dim=32
--policy.max_action_dim=32
--policy.optimizer_lr=2.5e-5
--steps=10000  --save_freq=1000  --batch_size=1

Camera rename (LeIsaac 2-cam → π0.5 3-cam, missing left_wrist auto-padded inside modeling_pi05.py:1195):

rename_map = {
    "observation.images.front":  "observation.images.base_0_rgb",
    "observation.images.wrist":  "observation.images.right_wrist_0_rgb",
}

复现 / Reproduce

from lerobot.policies.pi05 import PI05Policy
policy = PI05Policy.from_pretrained("wsagi/Pi0.5-PickOrange")
# 然后接 LeIsaac Isaac Sim eval pipeline
# Then plug into the LeIsaac Isaac Sim eval pipeline:
#   scripts/benchmark/run_one_strict.sh

20-round strict benchmark（distribution, 20 rounds × 3 episodes）：

P(placed=0)	P(placed=1)	P(placed=2)	P(placed=3)	E(🍊)/ep
95% (57/60)	5% (3/60)	0%	0%	0.05

19/20 rounds 全 0/3，1 round 出现 1/3（Episode 8: placed=[F, T, F]）。Bernoulli noise distribution，无 task-completion signal。

已 sweep 过的 ckpt / Checkpoints evaluated

10k 训练每 1k 存一个，13 个 ckpt（500/1k/1.5k/.../10k）全 3-round 横评 = 1/60 oranges across 13 ckpts，全部 0/9 或 1/9，无单调收敛迹象。ckpt-2000 是 3-round 抓到 2/9 的那个（最高），20-round 跑下来回归到 1/60，证实是 noise outlier 不是 signal。

何时该用 / 不该用 / When (not) to use

❌ 不要在生产环境使用 — 1.7% success rate 没有 task-completion 价值 ✅ 可以用作：

π0.5 在低分辨率 VLM bottleneck 任务上的 baseline reference
"freeze VLM + train expert only" 配方失败案例的复现 ckpt
LeIsaac eval pipeline 的 π0.5 wire 协议验证 fixture

替代方案 / Alternatives (better on same task)

这些是同任务上真能把橙子夹进盘子的模型 — 想看成功的就去这里 / models that actually place the orange:

Model	Strict	Where
🥇 GR00T-N1.7 (self-trained)	68.3% (2.05/3)	`wsagi/GR00T-N1.7-PickOrange`
🥈 ACT (self, h=70)	43.3% (1.30/3)	`wsagi/ACT-PickOrange`
🥉 SmolVLA (self-trained)	25.0%	wsagi (待发布 / pending)
Diffusion Policy DDIM	概率性 3/3 / stochastic	`wsagi/DiffusionPolicy-PickOrange`

License & Attribution

Apache-2.0
Base model: lerobot/pi05_base (Physical Intelligence × LeRobot)
Dataset: LightwheelAI/leisaac-pick-orange
Trained on RTX Pro 6000 96GB
Evaluated in Isaac Sim 5.1 + LeIsaac

Downloads last month: 11

Safetensors

Model size

4B params

Tensor type

F32

BF16

Video Preview

Robotics

wsagi
/

Pi0.5-PickOrange