ConorWang commited on
Commit
7d25255
·
verified ·
1 Parent(s): 331dce5

Upload pi05_base originals + sigma-renamed weight copies (flat root)

Browse files
README.md CHANGED
@@ -1,3 +1,79 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ language:
4
+ - en
5
+ ---
6
+ # π₀.₅ (Pi05)
7
+
8
+ These weights directly come from the Pytorch conversion script of openpi and their `pi05_base` model.
9
+
10
+ π₀.₅ is a **Vision-Language-Action model with open-world generalization**, from Physical Intelligence. The LeRobot implementation is adapted from their open source [OpenPI](https://github.com/Physical-Intelligence/openpi) repository.
11
+
12
+ ## Model Overview
13
+
14
+ π₀.₅ represents a significant evolution from π₀, developed by [Physical Intelligence](https://www.physicalintelligence.company/blog/pi05) to address a big challenge in robotics: **open-world generalization**. While robots can perform impressive tasks in controlled environments, π₀.₅ is designed to generalize to entirely new environments and situations that were never seen during training.
15
+
16
+ ### The Generalization Challenge
17
+
18
+ As Physical Intelligence explains, the fundamental challenge isn't performing tasks of agility or dexterity, but generalization, the ability to correctly perform tasks in new settings with new objects. Consider a robot cleaning different homes: each home has different objects in different places. Generalization must occur at multiple levels:
19
+
20
+ - **Physical Level**: Understanding how to pick up a spoon (by the handle) or plate (by the edge), even with unseen objects in cluttered environments
21
+ - **Semantic Level**: Understanding task semantics, where to put clothes and shoes (laundry hamper, not on the bed), and what tools are appropriate for cleaning spills
22
+ - **Environmental Level**: Adapting to "messy" real-world environments like homes, grocery stores, offices, and hospitals
23
+
24
+ ### Co-Training on Heterogeneous Data
25
+
26
+ The breakthrough innovation in π₀.₅ is **co-training on heterogeneous data sources**. The model learns from:
27
+
28
+ 1. **Multimodal Web Data**: Image captioning, visual question answering, object detection
29
+ 2. **Verbal Instructions**: Humans coaching robots through complex tasks step-by-step
30
+ 3. **Subtask Commands**: High-level semantic behavior labels (e.g., "pick up the pillow" for an unmade bed)
31
+ 4. **Cross-Embodiment Robot Data**: Data from various robot platforms with different capabilities
32
+ 5. **Multi-Environment Data**: Static robots deployed across many different homes
33
+ 6. **Mobile Manipulation Data**: ~400 hours of mobile robot demonstrations
34
+
35
+ This diverse training mixture creates a "curriculum" that enables generalization across physical, visual, and semantic levels simultaneously.
36
+
37
+
38
+ ## Training
39
+
40
+ Here's a complete training command for finetuning the base π₀.₅ model on your own dataset:
41
+
42
+ ```bash
43
+ python src/lerobot/scripts/train.py \
44
+ --dataset.repo_id=your_dataset \
45
+ --policy.type=pi05 \
46
+ --output_dir=./outputs/pi05_training \
47
+ --job_name=pi05_training \
48
+ --policy.repo_id=your_repo_id \
49
+ --policy.pretrained_path=lerobot/pi05_base \
50
+ --policy.compile_model=true \
51
+ --policy.gradient_checkpointing=true \
52
+ --wandb.enable=true \
53
+ --policy.dtype=bfloat16 \
54
+ --steps=3000 \
55
+ --policy.scheduler_decay_steps=3000 \
56
+ --policy.device=cuda \
57
+ --batch_size=32
58
+ ```
59
+
60
+ ## Citation
61
+
62
+ If you use this model, please cite the original OpenPI work:
63
+
64
+ ```bibtex
65
+ @article{openpi2024,
66
+ title={Open-World Robotic Manipulation with Vision-Language-Action Models},
67
+ author={Physical Intelligence},
68
+ year={2024},
69
+ url={https://github.com/Physical-Intelligence/openpi}
70
+ }
71
+ ```
72
+
73
+ ## Original Repository
74
+
75
+ [OpenPI GitHub Repository](https://github.com/Physical-Intelligence/openpi)
76
+
77
+ ## License
78
+
79
+ This model follows the same license as the original OpenPI repository.
config.json ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "type": "pi05",
3
+ "n_obs_steps": 1,
4
+ "input_features": {
5
+ "observation.images.base_0_rgb": {
6
+ "type": "VISUAL",
7
+ "shape": [
8
+ 3,
9
+ 224,
10
+ 224
11
+ ]
12
+ },
13
+ "observation.images.left_wrist_0_rgb": {
14
+ "type": "VISUAL",
15
+ "shape": [
16
+ 3,
17
+ 224,
18
+ 224
19
+ ]
20
+ },
21
+ "observation.images.right_wrist_0_rgb": {
22
+ "type": "VISUAL",
23
+ "shape": [
24
+ 3,
25
+ 224,
26
+ 224
27
+ ]
28
+ },
29
+ "observation.state": {
30
+ "type": "STATE",
31
+ "shape": [
32
+ 32
33
+ ]
34
+ }
35
+ },
36
+ "output_features": {
37
+ "action": {
38
+ "type": "ACTION",
39
+ "shape": [
40
+ 32
41
+ ]
42
+ }
43
+ },
44
+ "device": "mps",
45
+ "use_amp": false,
46
+ "push_to_hub": true,
47
+ "repo_id": null,
48
+ "private": null,
49
+ "tags": null,
50
+ "license": null,
51
+ "paligemma_variant": "gemma_2b",
52
+ "action_expert_variant": "gemma_300m",
53
+ "dtype": "float32",
54
+ "chunk_size": 50,
55
+ "n_action_steps": 50,
56
+ "max_action_dim": 32,
57
+ "max_state_dim": 32,
58
+ "num_inference_steps": 10,
59
+ "time_sampling_beta_alpha": 1.5,
60
+ "time_sampling_beta_beta": 1.0,
61
+ "min_period": 0.004,
62
+ "max_period": 4.0,
63
+ "image_resolution": [
64
+ 224,
65
+ 224
66
+ ],
67
+ "gradient_checkpointing": false,
68
+ "compile_model": false,
69
+ "compile_mode": "max-autotune",
70
+ "optimizer_lr": 2.5e-05,
71
+ "optimizer_betas": [
72
+ 0.9,
73
+ 0.95
74
+ ],
75
+ "optimizer_eps": 1e-08,
76
+ "optimizer_weight_decay": 0.01,
77
+ "optimizer_grad_clip_norm": 1.0,
78
+ "scheduler_warmup_steps": 1000,
79
+ "scheduler_decay_steps": 30000,
80
+ "scheduler_decay_lr": 2.5e-06,
81
+ "tokenizer_max_length": 200
82
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0eb11ca9587678c1d2ef8cf32807c29f8ce53a2bfdfc1aa4a4c96f16fca59b0f
3
+ size 14467165872
policy_postprocessor.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "name": "policy_postprocessor",
3
+ "steps": [
4
+ {
5
+ "registry_name": "unnormalizer_processor",
6
+ "config": {
7
+ "eps": 1e-08,
8
+ "features": {},
9
+ "norm_map": {
10
+ "VISUAL": "IDENTITY",
11
+ "STATE": "QUANTILES",
12
+ "ACTION": "QUANTILES"
13
+ }
14
+ }
15
+ },
16
+ {
17
+ "registry_name": "device_processor",
18
+ "config": {
19
+ "device": "cpu",
20
+ "float_dtype": null
21
+ }
22
+ }
23
+ ]
24
+ }
policy_preprocessor.json ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "name": "policy_preprocessor",
3
+ "steps": [
4
+ {
5
+ "registry_name": "rename_observations_processor",
6
+ "config": {
7
+ "rename_map": {}
8
+ }
9
+ },
10
+ {
11
+ "registry_name": "to_batch_processor",
12
+ "config": {}
13
+ },
14
+ {
15
+ "registry_name": "normalizer_processor",
16
+ "config": {
17
+ "eps": 1e-08,
18
+ "features": {},
19
+ "norm_map": {
20
+ "VISUAL": "IDENTITY",
21
+ "STATE": "QUANTILES",
22
+ "ACTION": "QUANTILES"
23
+ }
24
+ }
25
+ },
26
+ {
27
+ "registry_name": "pi05_prepare_state_tokenizer_processor_step",
28
+ "config": {}
29
+ },
30
+ {
31
+ "registry_name": "tokenizer_processor",
32
+ "config": {
33
+ "max_length": 200,
34
+ "task_key": "task",
35
+ "padding_side": "right",
36
+ "padding": "max_length",
37
+ "truncation": true,
38
+ "tokenizer_name": "google/paligemma-3b-pt-224"
39
+ }
40
+ },
41
+ {
42
+ "registry_name": "device_processor",
43
+ "config": {
44
+ "device": "cpu",
45
+ "float_dtype": null
46
+ }
47
+ }
48
+ ]
49
+ }