Upload README.md with huggingface_hub

f6ffd30 verified 3 days ago

8.69 kB

	---
	license: apache-2.0
	library_name: lerobot
	pipeline_tag: robotics
	tags:
	- pi05
	- openpi
	- lerobot
	- so101
	- leisaac
	- pick-orange
	- isaac-sim
	- flow-matching
	- vla
	- negative-result
	datasets:
	- LightwheelAI/leisaac-pick-orange
	language:
	- en
	---

	# Pi0.5-PickOrange — π0.5 PyTorch expert-only FT (⚠️ negative result)

	⚠️ 这是一个有据可查的失败实验（已公开作为反面教材 / educational negative result）：
	20-round strict benchmark = 1/60 oranges (1.7%)，在 [STRICT_LEADERBOARD](https://github.com/vitorcen/isaaclab-experience/blob/main/scripts/benchmark/STRICT_LEADERBOARD.md) 上末位，比同任务的 SmolVLA 低 15 倍。发布的目的是把"为什么 π0.5 在 LeIsaac PickOrange 上学不会"这件事用 ckpt 本身固定下来，供后续研究者复现 / 否证。

	_This is a deliberately published failure — a documented negative result. 20-round strict eval = 1/60 oranges (1.7%), last place on the strict leaderboard, 15× worse than SmolVLA on the same task. Published to anchor the "why π0.5 doesn't learn this task" claim with a real checkpoint, so others can reproduce / refute._

	🔗 项目仓库 / Project repos：

	- [vitorcen/isaaclab-experience](https://github.com/vitorcen/isaaclab-experience) — Isaac Lab + LeIsaac 多策略横评（parent project）
	- [vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) — LeIsaac fork（训练脚本 + 设计文档 / training scripts + design docs）
	- 完整 negative report HTML: [`pi05_pytorch_expert_ft_negative.html`](https://github.com/vitorcen/LeIsaac-Training/blob/main/docs/training/pi05_pytorch_expert_ft_negative.html)

	## 🎥 失败现场录屏 / The failure, on video

	<video controls src="https://huggingface.co/wsagi/Pi0.5-PickOrange/resolve/main/Pi0.5-PickOrange.mp4"></video>

	_π0.5 expert-FT ckpt 在 LeIsaac PickOrange 上的真实录屏：机械臂持续运动满 180s，橙子一颗未入盘（0/3）。这不是 bug，是 SigLIP@224 vision bottleneck 下"看不见橙子"的真实表现——和成功模型（GR00T-N1.7 / ACT）形成直接对照。_
	_Real screen capture: the arm keeps moving for the full 180s but places 0/3 oranges. Not a bug — the genuine behavior under the SigLIP@224 vision bottleneck. Compare against the models that actually succeed (GR00T-N1.7 / ACT) below._

	## TL;DR

	\| Item \| Value \|
	\|------\|-------\|
	\| 任务 / Task \| SO-101 PickOrange — 单臂依次夹起 3 颗橙子放盘子 \|
	\| 数据集 / Dataset \| [`LightwheelAI/leisaac-pick-orange`](https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange) (60 demos, 30Hz) \|
	\| 架构 / Architecture \| π0.5 = PaliGemma-2B VLM (frozen) + Gemma-300M action expert (trainable) + flow-matching \|
	\| 可训参数 / Trainable params \| 693M (gemma_expert layers 425M + lm_head 263M + norm 3M) \|
	\| 配方 / Recipe \| `train_expert_only=true`, `freeze_vision_encoder=true`, bf16, lr=2.5e-5, chunk=50, batch=1 + grad_accum=8, 10k steps \|
	\| vision input \| SigLIP @ 224×224（PaliGemma 硬编码，主嫌） \|
	\| Strict benchmark \| 1/60 oranges (1.7%) — 20 rounds × 3 ep × 1 orange/ep, ckpt-2000 \|
	\| σ(5-round) \| 0.50 / 15 (3.3%) — worst-case (μ-1σ) = -0.25 / 15 \|
	\| Leaderboard 排名 / Rank \| 6/6（末位），低 SmolVLA 15× \|
	\| Inference latency \| ~108 ms / chunk (50-step flow matching, RTX 4090) \|
	\| GPU hours \| ~3.5 h on RTX Pro 6000 (bf16, ZeRO-2 offload) \|

	## 为什么发布失败模型 / Why publish a failed model

	科研里负面结果通常被丢进抽屉，但其实和成功一样有价值：

	1. 锁定假设：让后续研究者可以 load 这个 ckpt 直接验证"是不是这套配方在这个数据集上真的不行"，避免反复踩同样的坑。
	2. 隔离变量：训练侧的 dataloader / preprocessor / postprocessor / camera mapping / freeze 配置都已经调通（基础设施 4 个 bug 修完），失败不是 infra 噪声，而是架构 vs 任务的真实信号。
	3. 量化"偶尔的 1 只"：用户最初看到 3-round 跑出 2/9 觉得有希望，但 20-round 1/60 证明那只是 Bernoulli outlier (p≈1.7%)。

	_Negative results matter as much as positive ones. This ckpt lets others verify the failure mode without re-spending the GPU hours._

	## 根因分析（主嫌 80%）/ Root cause (main suspect, 80% confidence)

	PaliGemma-2B 的 SigLIP vision encoder 硬编码 224×224 输入，而 LeIsaac 原生 640×480 → 2.86× downscale 后橙子只剩 10–17 px，≤1 个 SigLIP patch (14px)。

	对比同任务上 work 的模型：

	\| Model \| Vision encoder \| Input res \| Orange size after resize \| Result \|
	\|-------\|---------------\|-----------\|--------------------------\|--------\|
	\| GR00T-N1.7 \| Eagle-2 ViT \| 448 \| 22-34 px (1.5–2.4 patch) \| 68.3% ✅ \|
	\| SmolVLA \| SigLIP \| 512 \| 24-40 px (1.7–2.9 patch) \| 25.0% ✅ \|
	\| π0.5 (this) \| SigLIP \| 224 \| 10-17 px (≤1 patch) \| 1.7% ❌ \|

	→ 橙子在 vision token 上几乎不可见，"freeze 整个 PaliGemma + 只训 action expert"再多 token 也无法补救 vision bottleneck。

	_PaliGemma's SigLIP is hardcoded to 224×224 — after downscaling LeIsaac's native 640×480, oranges shrink to ≤1 SigLIP patch. No amount of expert-only training can recover information already lost at the vision encoder._

	## 训练配方 / Training recipe

	```bash
	# 训练入口 / training entry
	bash LeIsaac/scripts/training/pi05_pt/train.sh

	# 关键 flags / key flags
	--policy.train_expert_only=true # freeze PaliGemma, train only gemma_expert
	--policy.freeze_vision_encoder=true # explicit redundant lock
	--policy.gradient_checkpointing=true # 24GB VRAM under bf16
	--policy.dtype=bfloat16
	--policy.chunk_size=50
	--policy.n_action_steps=50
	--policy.max_state_dim=32
	--policy.max_action_dim=32
	--policy.optimizer_lr=2.5e-5
	--steps=10000 --save_freq=1000 --batch_size=1
	```

	Camera rename (LeIsaac 2-cam → π0.5 3-cam, missing `left_wrist` auto-padded inside modeling_pi05.py:1195):

	```python
	rename_map = {
	"observation.images.front": "observation.images.base_0_rgb",
	"observation.images.wrist": "observation.images.right_wrist_0_rgb",
	}
	```

	## 复现 / Reproduce

	```python
	from lerobot.policies.pi05 import PI05Policy
	policy = PI05Policy.from_pretrained("wsagi/Pi0.5-PickOrange")
	# 然后接 LeIsaac Isaac Sim eval pipeline
	# Then plug into the LeIsaac Isaac Sim eval pipeline:
	# scripts/benchmark/run_one_strict.sh
	```

	20-round strict benchmark（distribution, 20 rounds × 3 episodes）：

	\| P(placed=0) \| P(placed=1) \| P(placed=2) \| P(placed=3) \| E(🍊)/ep \|
	\|-------------\|-------------\|-------------\|-------------\|----------\|
	\| 95% (57/60) \| 5% (3/60) \| 0% \| 0% \| 0.05 \|

	19/20 rounds 全 0/3，1 round 出现 1/3（Episode 8: placed=[F, T, F]）。Bernoulli noise distribution，无 task-completion signal。

	## 已 sweep 过的 ckpt / Checkpoints evaluated

	10k 训练每 1k 存一个，13 个 ckpt（500/1k/1.5k/.../10k）全 3-round 横评 = 1/60 oranges across 13 ckpts，全部 0/9 或 1/9，无单调收敛迹象。ckpt-2000 是 3-round 抓到 2/9 的那个（最高），20-round 跑下来回归到 1/60，证实是 noise outlier 不是 signal。

	## 何时该用 / 不该用 / When (not) to use

	❌ 不要在生产环境使用 — 1.7% success rate 没有 task-completion 价值
	✅ 可以用作：
	- π0.5 在低分辨率 VLM bottleneck 任务上的 baseline reference
	- "freeze VLM + train expert only" 配方失败案例的复现 ckpt
	- LeIsaac eval pipeline 的 π0.5 wire 协议验证 fixture

	## 替代方案 / Alternatives (better on same task)

	这些是同任务上真能把橙子夹进盘子的模型 — 想看成功的就去这里 / models that actually place the orange:

	\| Model \| Strict \| Where \|
	\|-------\|--------\|-------\|
	\| 🥇 GR00T-N1.7 (self-trained) \| 68.3% (2.05/3) \| [`wsagi/GR00T-N1.7-PickOrange`](https://huggingface.co/wsagi/GR00T-N1.7-PickOrange) \|
	\| 🥈 ACT (self, h=70) \| 43.3% (1.30/3) \| [`wsagi/ACT-PickOrange`](https://huggingface.co/wsagi/ACT-PickOrange) \|
	\| 🥉 SmolVLA (self-trained) \| 25.0% \| wsagi (待发布 / pending) \|
	\| Diffusion Policy DDIM \| 概率性 3/3 / stochastic \| [`wsagi/DiffusionPolicy-PickOrange`](https://huggingface.co/wsagi/DiffusionPolicy-PickOrange) \|

	## License & Attribution

	- Apache-2.0
	- Base model: `lerobot/pi05_base` (Physical Intelligence × LeRobot)
	- Dataset: [`LightwheelAI/leisaac-pick-orange`](https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange)
	- Trained on RTX Pro 6000 96GB
	- Evaluated in Isaac Sim 5.1 + LeIsaac