wsagi commited on
Commit
fc41b60
·
verified ·
1 Parent(s): 795d1db

Add files using upload-large-folder tool

Browse files
README.md ADDED
@@ -0,0 +1,149 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: lerobot
4
+ pipeline_tag: robotics
5
+ tags:
6
+ - pi05
7
+ - openpi
8
+ - lerobot
9
+ - so101
10
+ - leisaac
11
+ - pick-orange
12
+ - isaac-sim
13
+ - flow-matching
14
+ - vla
15
+ - negative-result
16
+ datasets:
17
+ - LightwheelAI/leisaac-pick-orange
18
+ language:
19
+ - en
20
+ ---
21
+
22
+ # Pi0.5-PickOrange — π0.5 PyTorch expert-only FT (⚠️ negative result)
23
+
24
+ **⚠️ 这是一个有据可查的失败实验(已公开作为反面教材 / educational negative result)**:
25
+ 20-round strict benchmark = **1/60 oranges (1.7%)**,在 [STRICT_LEADERBOARD](https://github.com/vitorcen/isaaclab-experience/blob/main/scripts/benchmark/STRICT_LEADERBOARD.md) 上末位,**比同任务的 SmolVLA 低 15 倍**。发布的目的是把"为什么 π0.5 在 LeIsaac PickOrange 上学不会"这件事用 ckpt 本身固定下来,供后续研究者复现 / 否证。
26
+
27
+ _This is a **deliberately published failure** — a documented negative result. 20-round strict eval = 1/60 oranges (1.7%), last place on the strict leaderboard, **15× worse than SmolVLA** on the same task. Published to anchor the "why π0.5 doesn't learn this task" claim with a real checkpoint, so others can reproduce / refute._
28
+
29
+ **🔗 项目仓库 / Project repos**:
30
+
31
+ - [vitorcen/isaaclab-experience](https://github.com/vitorcen/isaaclab-experience) — Isaac Lab + LeIsaac 多策略横评(parent project)
32
+ - [vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) — LeIsaac fork(训练脚本 + 设计文档 / training scripts + design docs)
33
+ - 完整 negative report HTML: [`pi05_pytorch_expert_ft_negative.html`](https://github.com/vitorcen/LeIsaac-Training/blob/main/docs/training/pi05_pytorch_expert_ft_negative.html)
34
+
35
+ ## TL;DR
36
+
37
+ | Item | Value |
38
+ |------|-------|
39
+ | **任务 / Task** | SO-101 PickOrange — 单臂依次夹起 3 颗橙子放盘子 |
40
+ | **数据集 / Dataset** | [`LightwheelAI/leisaac-pick-orange`](https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange) (60 demos, 30Hz) |
41
+ | **架构 / Architecture** | π0.5 = PaliGemma-2B VLM (frozen) + Gemma-300M action expert (trainable) + flow-matching |
42
+ | **可训参数 / Trainable params** | 693M (gemma_expert layers 425M + lm_head 263M + norm 3M) |
43
+ | **配方 / Recipe** | `train_expert_only=true`, `freeze_vision_encoder=true`, bf16, lr=2.5e-5, chunk=50, batch=1 + grad_accum=8, 10k steps |
44
+ | **vision input** | **SigLIP @ 224×224**(PaliGemma 硬编码,**主嫌**) |
45
+ | **Strict benchmark** | **1/60 oranges (1.7%)** — 20 rounds × 3 ep × 1 orange/ep, ckpt-2000 |
46
+ | **σ(5-round)** | 0.50 / 15 (3.3%) — worst-case (μ-1σ) = **-0.25 / 15** |
47
+ | **Leaderboard 排名 / Rank** | **6/6(末位)**,低 SmolVLA 15× |
48
+ | **Inference latency** | ~108 ms / chunk (50-step flow matching, RTX 4090) |
49
+ | **GPU hours** | ~3.5 h on RTX Pro 6000 (bf16, ZeRO-2 offload) |
50
+
51
+ ## 为什么发布失败模型 / Why publish a failed model
52
+
53
+ 科研里负面结果通常被丢进抽屉,但其实和成功一样有价值:
54
+
55
+ 1. **锁定假设**:让后续研究者可以 load 这个 ckpt 直接验证"是不是这套配方在这个数据集上真的不行",避免反复踩同样的坑。
56
+ 2. **隔离变量**:训练侧的 dataloader / preprocessor / postprocessor / camera mapping / freeze 配置都已经调通(基础设施 4 个 bug 修完),失败不是 infra 噪声,而是**架构 vs 任务**的真实信号。
57
+ 3. **量化"偶尔的 1 只"**:用户最初看到 3-round 跑出 2/9 觉得有希望,但 20-round 1/60 证明那只是 Bernoulli outlier (p≈1.7%)。
58
+
59
+ _Negative results matter as much as positive ones. This ckpt lets others verify the failure mode without re-spending the GPU hours._
60
+
61
+ ## 根因分析(主嫌 80%)/ Root cause (main suspect, 80% confidence)
62
+
63
+ **PaliGemma-2B 的 SigLIP vision encoder 硬编码 224×224 输入**,而 LeIsaac 原生 640×480 → 2.86× downscale 后橙子只剩 **10–17 px**,**≤1 个 SigLIP patch (14px)**。
64
+
65
+ 对比同任务上 work 的模型:
66
+
67
+ | Model | Vision encoder | Input res | Orange size after resize | Result |
68
+ |-------|---------------|-----------|--------------------------|--------|
69
+ | GR00T-N1.7 | Eagle-2 ViT | 448 | 22-34 px (1.5–2.4 patch) | 68.3% ✅ |
70
+ | SmolVLA | SigLIP | 512 | 24-40 px (1.7–2.9 patch) | 25.0% ✅ |
71
+ | **π0.5 (this)** | **SigLIP** | **224** | **10-17 px (≤1 patch)** | **1.7% ❌** |
72
+
73
+ → 橙子在 vision token 上几乎不可见,"freeze 整个 PaliGemma + 只训 action expert"再多 token 也无法补救 vision bottleneck。
74
+
75
+ _PaliGemma's SigLIP is hardcoded to 224×224 — after downscaling LeIsaac's native 640×480, oranges shrink to ≤1 SigLIP patch. No amount of expert-only training can recover information already lost at the vision encoder._
76
+
77
+ ## 训练配方 / Training recipe
78
+
79
+ ```bash
80
+ # 训练入口 / training entry
81
+ bash LeIsaac/scripts/training/pi05_pt/train.sh
82
+
83
+ # 关键 flags / key flags
84
+ --policy.train_expert_only=true # freeze PaliGemma, train only gemma_expert
85
+ --policy.freeze_vision_encoder=true # explicit redundant lock
86
+ --policy.gradient_checkpointing=true # 24GB VRAM under bf16
87
+ --policy.dtype=bfloat16
88
+ --policy.chunk_size=50
89
+ --policy.n_action_steps=50
90
+ --policy.max_state_dim=32
91
+ --policy.max_action_dim=32
92
+ --policy.optimizer_lr=2.5e-5
93
+ --steps=10000 --save_freq=1000 --batch_size=1
94
+ ```
95
+
96
+ Camera rename (LeIsaac 2-cam → π0.5 3-cam, missing `left_wrist` auto-padded inside modeling_pi05.py:1195):
97
+
98
+ ```python
99
+ rename_map = {
100
+ "observation.images.front": "observation.images.base_0_rgb",
101
+ "observation.images.wrist": "observation.images.right_wrist_0_rgb",
102
+ }
103
+ ```
104
+
105
+ ## 复现 / Reproduce
106
+
107
+ ```python
108
+ from lerobot.policies.pi05 import PI05Policy
109
+ policy = PI05Policy.from_pretrained("wsagi/Pi0.5-PickOrange")
110
+ # 然后接 LeIsaac Isaac Sim eval pipeline
111
+ # Then plug into the LeIsaac Isaac Sim eval pipeline:
112
+ # scripts/benchmark/run_one_strict.sh
113
+ ```
114
+
115
+ 20-round strict benchmark(distribution, 20 rounds × 3 episodes):
116
+
117
+ | P(placed=0) | P(placed=1) | P(placed=2) | P(placed=3) | E(🍊)/ep |
118
+ |-------------|-------------|-------------|-------------|----------|
119
+ | **95% (57/60)** | **5% (3/60)** | 0% | 0% | **0.05** |
120
+
121
+ 19/20 rounds 全 0/3,1 round 出现 1/3(Episode 8: placed=[F, T, F])。Bernoulli noise distribution,无 task-completion signal。
122
+
123
+ ## 已 sweep 过的 ckpt / Checkpoints evaluated
124
+
125
+ 10k 训练每 1k 存一个,13 个 ckpt(500/1k/1.5k/.../10k)全 3-round 横评 = **1/60 oranges across 13 ckpts**,**全部 0/9 或 1/9**,无单调收敛迹象。ckpt-2000 是 3-round 抓到 2/9 的那个(最高),20-round 跑下来回归到 1/60,证实是 noise outlier 不是 signal。
126
+
127
+ ## 何时该用 / 不该用 / When (not) to use
128
+
129
+ ❌ **不要在生产环境使用** — 1.7% success rate 没有 task-completion 价值
130
+ ✅ **可以用作**:
131
+ - π0.5 在低分辨率 VLM bottleneck 任务上的 baseline reference
132
+ - "freeze VLM + train expert only" 配方失败案例的复现 ckpt
133
+ - LeIsaac eval pipeline 的 π0.5 wire 协议验证 fixture
134
+
135
+ ## 替代方案 / Alternatives (better on same task)
136
+
137
+ | Model | Strict | Where |
138
+ |-------|--------|-------|
139
+ | 🥇 GR00T-N1.7 (self-trained) | 68.3% | [`wsagi/GR00T-N1.6-PickOrange`](https://huggingface.co/wsagi) |
140
+ | 🥈 SmolVLA (self-trained) | 25.0% | wsagi (待发布 / pending) |
141
+ | 🥉 Diffusion Policy DDIM | 概率性 3/3 | [`wsagi/DiffusionPolicy-PickOrange`](https://huggingface.co/wsagi/DiffusionPolicy-PickOrange) |
142
+
143
+ ## License & Attribution
144
+
145
+ - Apache-2.0
146
+ - Base model: `lerobot/pi05_base` (Physical Intelligence × LeRobot)
147
+ - Dataset: [`LightwheelAI/leisaac-pick-orange`](https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange)
148
+ - Trained on RTX Pro 6000 96GB
149
+ - Evaluated in Isaac Sim 5.1 + LeIsaac
config.json ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "type": "pi05",
3
+ "n_obs_steps": 1,
4
+ "input_features": {
5
+ "observation.images.base_0_rgb": {
6
+ "type": "VISUAL",
7
+ "shape": [
8
+ 3,
9
+ 224,
10
+ 224
11
+ ]
12
+ },
13
+ "observation.images.left_wrist_0_rgb": {
14
+ "type": "VISUAL",
15
+ "shape": [
16
+ 3,
17
+ 224,
18
+ 224
19
+ ]
20
+ },
21
+ "observation.images.right_wrist_0_rgb": {
22
+ "type": "VISUAL",
23
+ "shape": [
24
+ 3,
25
+ 224,
26
+ 224
27
+ ]
28
+ },
29
+ "observation.state": {
30
+ "type": "STATE",
31
+ "shape": [
32
+ 32
33
+ ]
34
+ }
35
+ },
36
+ "output_features": {
37
+ "action": {
38
+ "type": "ACTION",
39
+ "shape": [
40
+ 6
41
+ ]
42
+ }
43
+ },
44
+ "device": "cuda",
45
+ "use_amp": false,
46
+ "use_peft": false,
47
+ "push_to_hub": false,
48
+ "repo_id": null,
49
+ "private": null,
50
+ "tags": null,
51
+ "license": null,
52
+ "pretrained_path": "lerobot/pi05_base",
53
+ "paligemma_variant": "gemma_2b",
54
+ "action_expert_variant": "gemma_300m",
55
+ "dtype": "bfloat16",
56
+ "chunk_size": 50,
57
+ "n_action_steps": 50,
58
+ "max_state_dim": 32,
59
+ "max_action_dim": 32,
60
+ "num_inference_steps": 10,
61
+ "time_sampling_beta_alpha": 1.5,
62
+ "time_sampling_beta_beta": 1.0,
63
+ "time_sampling_scale": 0.999,
64
+ "time_sampling_offset": 0.001,
65
+ "min_period": 0.004,
66
+ "max_period": 4.0,
67
+ "use_relative_actions": false,
68
+ "relative_exclude_joints": [
69
+ "gripper"
70
+ ],
71
+ "action_feature_names": [
72
+ "shoulder_pan.pos",
73
+ "shoulder_lift.pos",
74
+ "elbow_flex.pos",
75
+ "wrist_flex.pos",
76
+ "wrist_roll.pos",
77
+ "gripper.pos"
78
+ ],
79
+ "rtc_config": null,
80
+ "image_resolution": [
81
+ 224,
82
+ 224
83
+ ],
84
+ "empty_cameras": 0,
85
+ "tokenizer_max_length": 200,
86
+ "normalization_mapping": {
87
+ "VISUAL": "IDENTITY",
88
+ "STATE": "QUANTILES",
89
+ "ACTION": "QUANTILES"
90
+ },
91
+ "gradient_checkpointing": true,
92
+ "compile_model": false,
93
+ "compile_mode": "max-autotune",
94
+ "freeze_vision_encoder": true,
95
+ "train_expert_only": true,
96
+ "optimizer_lr": 2.5e-05,
97
+ "optimizer_betas": [
98
+ 0.9,
99
+ 0.95
100
+ ],
101
+ "optimizer_eps": 1e-08,
102
+ "optimizer_weight_decay": 0.01,
103
+ "optimizer_grad_clip_norm": 1.0,
104
+ "scheduler_warmup_steps": 1000,
105
+ "scheduler_decay_steps": 30000,
106
+ "scheduler_decay_lr": 2.5e-06
107
+ }
model-00001-of-00011.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ca835f0e50fcf350dc518ea1dbd04ac6609744c4aaf079abc8e97d6e4a72bc7c
3
+ size 898280512
model-00002-of-00011.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e662f131ce8545c87b1ed74b0af238725ee50bca4529a7ee73513e79fdeec483
3
+ size 1053294760
model-00003-of-00011.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8f9ba4311a7118eec21bac703085a38c91f36771758c59d188a320020b2a48ab
3
+ size 1053294784
model-00004-of-00011.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b702accf389b55f313872ee796d522cf9630086042712102250b63a8bd4b91ff
3
+ size 851767984
model-00005-of-00011.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6d207be560b6bd5078bf0714a7e71952b31b9e894cf259e4d356b4f7e9a584c7
3
+ size 880875344
model-00006-of-00011.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6bfc8b874024ff4c38c5862b611cfd1572d809a132f29ef78653e6d1601ecd72
3
+ size 880875360
model-00007-of-00011.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1439d7dd98da0154639b9fd5245c183784a31281ad76d4961b1181da7d7142e6
3
+ size 880875336
model-00008-of-00011.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dc3418e4b52fb237dd8b065a8e3f89559a6d969113104aaa941c87e310097921
3
+ size 880875320
model-00009-of-00011.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:385d990ea155f324c1dc98946ec27dfcfc28c386fadc5baa8101601f83d8458b
3
+ size 888081800
model-00010-of-00011.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:011cf1287caf19d5402e7e69c52e18cb603e644d26cd7990dd41ee749fae2ed9
3
+ size 894575464
model-00011-of-00011.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2f3f34536e9854636f2b100a3dd1cd09cd390ececce0f9d24e5f379f356f19b0
3
+ size 191252832
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
policy_postprocessor.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "name": "policy_postprocessor",
3
+ "steps": [
4
+ {
5
+ "registry_name": "unnormalizer_processor",
6
+ "config": {
7
+ "eps": 1e-08,
8
+ "features": {
9
+ "action": {
10
+ "type": "ACTION",
11
+ "shape": [
12
+ 6
13
+ ]
14
+ }
15
+ },
16
+ "norm_map": {
17
+ "VISUAL": "IDENTITY",
18
+ "STATE": "QUANTILES",
19
+ "ACTION": "QUANTILES"
20
+ }
21
+ },
22
+ "state_file": "policy_postprocessor_step_0_unnormalizer_processor.safetensors"
23
+ },
24
+ {
25
+ "registry_name": "device_processor",
26
+ "config": {
27
+ "device": "cpu",
28
+ "float_dtype": null
29
+ }
30
+ }
31
+ ]
32
+ }
policy_postprocessor_step_0_unnormalizer_processor.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5ac4af145fa293fb9282322bee7c87eb369ba8aca3e09dbf1db7600f46142fd5
3
+ size 7552
policy_preprocessor.json ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "name": "policy_preprocessor",
3
+ "steps": [
4
+ {
5
+ "registry_name": "rename_observations_processor",
6
+ "config": {
7
+ "rename_map": {
8
+ "observation.images.front": "observation.images.base_0_rgb",
9
+ "observation.images.wrist": "observation.images.right_wrist_0_rgb"
10
+ }
11
+ }
12
+ },
13
+ {
14
+ "registry_name": "to_batch_processor",
15
+ "config": {}
16
+ },
17
+ {
18
+ "registry_name": "normalizer_processor",
19
+ "config": {
20
+ "eps": 1e-08,
21
+ "features": {
22
+ "observation.images.base_0_rgb": {
23
+ "type": "VISUAL",
24
+ "shape": [
25
+ 3,
26
+ 224,
27
+ 224
28
+ ]
29
+ },
30
+ "observation.images.left_wrist_0_rgb": {
31
+ "type": "VISUAL",
32
+ "shape": [
33
+ 3,
34
+ 224,
35
+ 224
36
+ ]
37
+ },
38
+ "observation.images.right_wrist_0_rgb": {
39
+ "type": "VISUAL",
40
+ "shape": [
41
+ 3,
42
+ 224,
43
+ 224
44
+ ]
45
+ },
46
+ "observation.state": {
47
+ "type": "STATE",
48
+ "shape": [
49
+ 32
50
+ ]
51
+ },
52
+ "action": {
53
+ "type": "ACTION",
54
+ "shape": [
55
+ 6
56
+ ]
57
+ }
58
+ },
59
+ "norm_map": {
60
+ "VISUAL": "IDENTITY",
61
+ "STATE": "QUANTILES",
62
+ "ACTION": "QUANTILES"
63
+ }
64
+ },
65
+ "state_file": "policy_preprocessor_step_2_normalizer_processor.safetensors"
66
+ },
67
+ {
68
+ "registry_name": "pi05_prepare_state_tokenizer_processor_step",
69
+ "config": {}
70
+ },
71
+ {
72
+ "registry_name": "tokenizer_processor",
73
+ "config": {
74
+ "max_length": 200,
75
+ "task_key": "task",
76
+ "padding_side": "right",
77
+ "padding": "max_length",
78
+ "truncation": true,
79
+ "tokenizer_name": "google/paligemma-3b-pt-224"
80
+ }
81
+ },
82
+ {
83
+ "registry_name": "device_processor",
84
+ "config": {
85
+ "device": "cuda",
86
+ "float_dtype": null
87
+ }
88
+ }
89
+ ]
90
+ }
policy_preprocessor_step_2_normalizer_processor.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5ac4af145fa293fb9282322bee7c87eb369ba8aca3e09dbf1db7600f46142fd5
3
+ size 7552
train_config.json ADDED
@@ -0,0 +1,248 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dataset": {
3
+ "repo_id": "LightwheelAI/leisaac-pick-orange",
4
+ "root": "/home/david/work/isaaclab-experience/LeIsaac/datasets/raw/leisaac-pick-orange",
5
+ "episodes": null,
6
+ "image_transforms": {
7
+ "enable": false,
8
+ "max_num_transforms": 3,
9
+ "random_order": false,
10
+ "tfs": {
11
+ "brightness": {
12
+ "weight": 1.0,
13
+ "type": "ColorJitter",
14
+ "kwargs": {
15
+ "brightness": [
16
+ 0.8,
17
+ 1.2
18
+ ]
19
+ }
20
+ },
21
+ "contrast": {
22
+ "weight": 1.0,
23
+ "type": "ColorJitter",
24
+ "kwargs": {
25
+ "contrast": [
26
+ 0.8,
27
+ 1.2
28
+ ]
29
+ }
30
+ },
31
+ "saturation": {
32
+ "weight": 1.0,
33
+ "type": "ColorJitter",
34
+ "kwargs": {
35
+ "saturation": [
36
+ 0.5,
37
+ 1.5
38
+ ]
39
+ }
40
+ },
41
+ "hue": {
42
+ "weight": 1.0,
43
+ "type": "ColorJitter",
44
+ "kwargs": {
45
+ "hue": [
46
+ -0.05,
47
+ 0.05
48
+ ]
49
+ }
50
+ },
51
+ "sharpness": {
52
+ "weight": 1.0,
53
+ "type": "SharpnessJitter",
54
+ "kwargs": {
55
+ "sharpness": [
56
+ 0.5,
57
+ 1.5
58
+ ]
59
+ }
60
+ },
61
+ "affine": {
62
+ "weight": 1.0,
63
+ "type": "RandomAffine",
64
+ "kwargs": {
65
+ "degrees": [
66
+ -5.0,
67
+ 5.0
68
+ ],
69
+ "translate": [
70
+ 0.05,
71
+ 0.05
72
+ ]
73
+ }
74
+ }
75
+ }
76
+ },
77
+ "revision": null,
78
+ "use_imagenet_stats": true,
79
+ "video_backend": "torchcodec",
80
+ "return_uint8": false,
81
+ "streaming": false
82
+ },
83
+ "env": null,
84
+ "policy": {
85
+ "type": "pi05",
86
+ "n_obs_steps": 1,
87
+ "input_features": {
88
+ "observation.images.base_0_rgb": {
89
+ "type": "VISUAL",
90
+ "shape": [
91
+ 3,
92
+ 224,
93
+ 224
94
+ ]
95
+ },
96
+ "observation.images.left_wrist_0_rgb": {
97
+ "type": "VISUAL",
98
+ "shape": [
99
+ 3,
100
+ 224,
101
+ 224
102
+ ]
103
+ },
104
+ "observation.images.right_wrist_0_rgb": {
105
+ "type": "VISUAL",
106
+ "shape": [
107
+ 3,
108
+ 224,
109
+ 224
110
+ ]
111
+ },
112
+ "observation.state": {
113
+ "type": "STATE",
114
+ "shape": [
115
+ 32
116
+ ]
117
+ }
118
+ },
119
+ "output_features": {
120
+ "action": {
121
+ "type": "ACTION",
122
+ "shape": [
123
+ 6
124
+ ]
125
+ }
126
+ },
127
+ "device": "cuda",
128
+ "use_amp": false,
129
+ "use_peft": false,
130
+ "push_to_hub": false,
131
+ "repo_id": null,
132
+ "private": null,
133
+ "tags": null,
134
+ "license": null,
135
+ "pretrained_path": "lerobot/pi05_base",
136
+ "paligemma_variant": "gemma_2b",
137
+ "action_expert_variant": "gemma_300m",
138
+ "dtype": "bfloat16",
139
+ "chunk_size": 50,
140
+ "n_action_steps": 50,
141
+ "max_state_dim": 32,
142
+ "max_action_dim": 32,
143
+ "num_inference_steps": 10,
144
+ "time_sampling_beta_alpha": 1.5,
145
+ "time_sampling_beta_beta": 1.0,
146
+ "time_sampling_scale": 0.999,
147
+ "time_sampling_offset": 0.001,
148
+ "min_period": 0.004,
149
+ "max_period": 4.0,
150
+ "use_relative_actions": false,
151
+ "relative_exclude_joints": [
152
+ "gripper"
153
+ ],
154
+ "action_feature_names": [
155
+ "shoulder_pan.pos",
156
+ "shoulder_lift.pos",
157
+ "elbow_flex.pos",
158
+ "wrist_flex.pos",
159
+ "wrist_roll.pos",
160
+ "gripper.pos"
161
+ ],
162
+ "rtc_config": null,
163
+ "image_resolution": [
164
+ 224,
165
+ 224
166
+ ],
167
+ "empty_cameras": 0,
168
+ "tokenizer_max_length": 200,
169
+ "normalization_mapping": {
170
+ "VISUAL": "IDENTITY",
171
+ "STATE": "QUANTILES",
172
+ "ACTION": "QUANTILES"
173
+ },
174
+ "gradient_checkpointing": true,
175
+ "compile_model": false,
176
+ "compile_mode": "max-autotune",
177
+ "freeze_vision_encoder": true,
178
+ "train_expert_only": true,
179
+ "optimizer_lr": 2.5e-05,
180
+ "optimizer_betas": [
181
+ 0.9,
182
+ 0.95
183
+ ],
184
+ "optimizer_eps": 1e-08,
185
+ "optimizer_weight_decay": 0.01,
186
+ "optimizer_grad_clip_norm": 1.0,
187
+ "scheduler_warmup_steps": 1000,
188
+ "scheduler_decay_steps": 30000,
189
+ "scheduler_decay_lr": 2.5e-06
190
+ },
191
+ "reward_model": null,
192
+ "output_dir": "/home/david/work/isaaclab-experience/LeIsaac/outputs/pi05-expert-leisaac-pick-orange",
193
+ "job_name": "pi05",
194
+ "resume": false,
195
+ "seed": 1000,
196
+ "cudnn_deterministic": false,
197
+ "num_workers": 4,
198
+ "batch_size": 1,
199
+ "prefetch_factor": 4,
200
+ "persistent_workers": true,
201
+ "steps": 2500,
202
+ "eval_freq": 20000,
203
+ "log_freq": 200,
204
+ "tolerance_s": 0.0001,
205
+ "save_checkpoint": true,
206
+ "save_freq": 500,
207
+ "use_policy_training_preset": true,
208
+ "optimizer": {
209
+ "type": "adamw",
210
+ "lr": 2.5e-05,
211
+ "weight_decay": 0.01,
212
+ "grad_clip_norm": 1.0,
213
+ "betas": [
214
+ 0.9,
215
+ 0.95
216
+ ],
217
+ "eps": 1e-08
218
+ },
219
+ "scheduler": {
220
+ "type": "cosine_decay_with_warmup",
221
+ "num_warmup_steps": 1000,
222
+ "num_decay_steps": 30000,
223
+ "peak_lr": 2.5e-05,
224
+ "decay_lr": 2.5e-06
225
+ },
226
+ "eval": {
227
+ "n_episodes": 50,
228
+ "batch_size": 22,
229
+ "use_async_envs": true
230
+ },
231
+ "wandb": {
232
+ "enable": false,
233
+ "disable_artifact": false,
234
+ "project": "lerobot",
235
+ "entity": null,
236
+ "notes": null,
237
+ "run_id": null,
238
+ "mode": null,
239
+ "add_tags": true
240
+ },
241
+ "peft": null,
242
+ "sample_weighting": null,
243
+ "rename_map": {
244
+ "observation.images.front": "observation.images.base_0_rgb",
245
+ "observation.images.wrist": "observation.images.right_wrist_0_rgb"
246
+ },
247
+ "checkpoint_path": null
248
+ }