Upload pi05_base originals + sigma-renamed weight copies (flat root)
Browse files- README.md +79 -3
- config.json +82 -0
- model.safetensors +3 -0
- policy_postprocessor.json +24 -0
- policy_preprocessor.json +49 -0
README.md
CHANGED
|
@@ -1,3 +1,79 @@
|
|
| 1 |
-
---
|
| 2 |
-
license:
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: gemma
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
---
|
| 6 |
+
# π₀.₅ (Pi05)
|
| 7 |
+
|
| 8 |
+
These weights directly come from the Pytorch conversion script of openpi and their `pi05_base` model.
|
| 9 |
+
|
| 10 |
+
π₀.₅ is a **Vision-Language-Action model with open-world generalization**, from Physical Intelligence. The LeRobot implementation is adapted from their open source [OpenPI](https://github.com/Physical-Intelligence/openpi) repository.
|
| 11 |
+
|
| 12 |
+
## Model Overview
|
| 13 |
+
|
| 14 |
+
π₀.₅ represents a significant evolution from π₀, developed by [Physical Intelligence](https://www.physicalintelligence.company/blog/pi05) to address a big challenge in robotics: **open-world generalization**. While robots can perform impressive tasks in controlled environments, π₀.₅ is designed to generalize to entirely new environments and situations that were never seen during training.
|
| 15 |
+
|
| 16 |
+
### The Generalization Challenge
|
| 17 |
+
|
| 18 |
+
As Physical Intelligence explains, the fundamental challenge isn't performing tasks of agility or dexterity, but generalization, the ability to correctly perform tasks in new settings with new objects. Consider a robot cleaning different homes: each home has different objects in different places. Generalization must occur at multiple levels:
|
| 19 |
+
|
| 20 |
+
- **Physical Level**: Understanding how to pick up a spoon (by the handle) or plate (by the edge), even with unseen objects in cluttered environments
|
| 21 |
+
- **Semantic Level**: Understanding task semantics, where to put clothes and shoes (laundry hamper, not on the bed), and what tools are appropriate for cleaning spills
|
| 22 |
+
- **Environmental Level**: Adapting to "messy" real-world environments like homes, grocery stores, offices, and hospitals
|
| 23 |
+
|
| 24 |
+
### Co-Training on Heterogeneous Data
|
| 25 |
+
|
| 26 |
+
The breakthrough innovation in π₀.₅ is **co-training on heterogeneous data sources**. The model learns from:
|
| 27 |
+
|
| 28 |
+
1. **Multimodal Web Data**: Image captioning, visual question answering, object detection
|
| 29 |
+
2. **Verbal Instructions**: Humans coaching robots through complex tasks step-by-step
|
| 30 |
+
3. **Subtask Commands**: High-level semantic behavior labels (e.g., "pick up the pillow" for an unmade bed)
|
| 31 |
+
4. **Cross-Embodiment Robot Data**: Data from various robot platforms with different capabilities
|
| 32 |
+
5. **Multi-Environment Data**: Static robots deployed across many different homes
|
| 33 |
+
6. **Mobile Manipulation Data**: ~400 hours of mobile robot demonstrations
|
| 34 |
+
|
| 35 |
+
This diverse training mixture creates a "curriculum" that enables generalization across physical, visual, and semantic levels simultaneously.
|
| 36 |
+
|
| 37 |
+
|
| 38 |
+
## Training
|
| 39 |
+
|
| 40 |
+
Here's a complete training command for finetuning the base π₀.₅ model on your own dataset:
|
| 41 |
+
|
| 42 |
+
```bash
|
| 43 |
+
python src/lerobot/scripts/train.py \
|
| 44 |
+
--dataset.repo_id=your_dataset \
|
| 45 |
+
--policy.type=pi05 \
|
| 46 |
+
--output_dir=./outputs/pi05_training \
|
| 47 |
+
--job_name=pi05_training \
|
| 48 |
+
--policy.repo_id=your_repo_id \
|
| 49 |
+
--policy.pretrained_path=lerobot/pi05_base \
|
| 50 |
+
--policy.compile_model=true \
|
| 51 |
+
--policy.gradient_checkpointing=true \
|
| 52 |
+
--wandb.enable=true \
|
| 53 |
+
--policy.dtype=bfloat16 \
|
| 54 |
+
--steps=3000 \
|
| 55 |
+
--policy.scheduler_decay_steps=3000 \
|
| 56 |
+
--policy.device=cuda \
|
| 57 |
+
--batch_size=32
|
| 58 |
+
```
|
| 59 |
+
|
| 60 |
+
## Citation
|
| 61 |
+
|
| 62 |
+
If you use this model, please cite the original OpenPI work:
|
| 63 |
+
|
| 64 |
+
```bibtex
|
| 65 |
+
@article{openpi2024,
|
| 66 |
+
title={Open-World Robotic Manipulation with Vision-Language-Action Models},
|
| 67 |
+
author={Physical Intelligence},
|
| 68 |
+
year={2024},
|
| 69 |
+
url={https://github.com/Physical-Intelligence/openpi}
|
| 70 |
+
}
|
| 71 |
+
```
|
| 72 |
+
|
| 73 |
+
## Original Repository
|
| 74 |
+
|
| 75 |
+
[OpenPI GitHub Repository](https://github.com/Physical-Intelligence/openpi)
|
| 76 |
+
|
| 77 |
+
## License
|
| 78 |
+
|
| 79 |
+
This model follows the same license as the original OpenPI repository.
|
config.json
ADDED
|
@@ -0,0 +1,82 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"type": "pi05",
|
| 3 |
+
"n_obs_steps": 1,
|
| 4 |
+
"input_features": {
|
| 5 |
+
"observation.images.base_0_rgb": {
|
| 6 |
+
"type": "VISUAL",
|
| 7 |
+
"shape": [
|
| 8 |
+
3,
|
| 9 |
+
224,
|
| 10 |
+
224
|
| 11 |
+
]
|
| 12 |
+
},
|
| 13 |
+
"observation.images.left_wrist_0_rgb": {
|
| 14 |
+
"type": "VISUAL",
|
| 15 |
+
"shape": [
|
| 16 |
+
3,
|
| 17 |
+
224,
|
| 18 |
+
224
|
| 19 |
+
]
|
| 20 |
+
},
|
| 21 |
+
"observation.images.right_wrist_0_rgb": {
|
| 22 |
+
"type": "VISUAL",
|
| 23 |
+
"shape": [
|
| 24 |
+
3,
|
| 25 |
+
224,
|
| 26 |
+
224
|
| 27 |
+
]
|
| 28 |
+
},
|
| 29 |
+
"observation.state": {
|
| 30 |
+
"type": "STATE",
|
| 31 |
+
"shape": [
|
| 32 |
+
32
|
| 33 |
+
]
|
| 34 |
+
}
|
| 35 |
+
},
|
| 36 |
+
"output_features": {
|
| 37 |
+
"action": {
|
| 38 |
+
"type": "ACTION",
|
| 39 |
+
"shape": [
|
| 40 |
+
32
|
| 41 |
+
]
|
| 42 |
+
}
|
| 43 |
+
},
|
| 44 |
+
"device": "mps",
|
| 45 |
+
"use_amp": false,
|
| 46 |
+
"push_to_hub": true,
|
| 47 |
+
"repo_id": null,
|
| 48 |
+
"private": null,
|
| 49 |
+
"tags": null,
|
| 50 |
+
"license": null,
|
| 51 |
+
"paligemma_variant": "gemma_2b",
|
| 52 |
+
"action_expert_variant": "gemma_300m",
|
| 53 |
+
"dtype": "float32",
|
| 54 |
+
"chunk_size": 50,
|
| 55 |
+
"n_action_steps": 50,
|
| 56 |
+
"max_action_dim": 32,
|
| 57 |
+
"max_state_dim": 32,
|
| 58 |
+
"num_inference_steps": 10,
|
| 59 |
+
"time_sampling_beta_alpha": 1.5,
|
| 60 |
+
"time_sampling_beta_beta": 1.0,
|
| 61 |
+
"min_period": 0.004,
|
| 62 |
+
"max_period": 4.0,
|
| 63 |
+
"image_resolution": [
|
| 64 |
+
224,
|
| 65 |
+
224
|
| 66 |
+
],
|
| 67 |
+
"gradient_checkpointing": false,
|
| 68 |
+
"compile_model": false,
|
| 69 |
+
"compile_mode": "max-autotune",
|
| 70 |
+
"optimizer_lr": 2.5e-05,
|
| 71 |
+
"optimizer_betas": [
|
| 72 |
+
0.9,
|
| 73 |
+
0.95
|
| 74 |
+
],
|
| 75 |
+
"optimizer_eps": 1e-08,
|
| 76 |
+
"optimizer_weight_decay": 0.01,
|
| 77 |
+
"optimizer_grad_clip_norm": 1.0,
|
| 78 |
+
"scheduler_warmup_steps": 1000,
|
| 79 |
+
"scheduler_decay_steps": 30000,
|
| 80 |
+
"scheduler_decay_lr": 2.5e-06,
|
| 81 |
+
"tokenizer_max_length": 200
|
| 82 |
+
}
|
model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0eb11ca9587678c1d2ef8cf32807c29f8ce53a2bfdfc1aa4a4c96f16fca59b0f
|
| 3 |
+
size 14467165872
|
policy_postprocessor.json
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"name": "policy_postprocessor",
|
| 3 |
+
"steps": [
|
| 4 |
+
{
|
| 5 |
+
"registry_name": "unnormalizer_processor",
|
| 6 |
+
"config": {
|
| 7 |
+
"eps": 1e-08,
|
| 8 |
+
"features": {},
|
| 9 |
+
"norm_map": {
|
| 10 |
+
"VISUAL": "IDENTITY",
|
| 11 |
+
"STATE": "QUANTILES",
|
| 12 |
+
"ACTION": "QUANTILES"
|
| 13 |
+
}
|
| 14 |
+
}
|
| 15 |
+
},
|
| 16 |
+
{
|
| 17 |
+
"registry_name": "device_processor",
|
| 18 |
+
"config": {
|
| 19 |
+
"device": "cpu",
|
| 20 |
+
"float_dtype": null
|
| 21 |
+
}
|
| 22 |
+
}
|
| 23 |
+
]
|
| 24 |
+
}
|
policy_preprocessor.json
ADDED
|
@@ -0,0 +1,49 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"name": "policy_preprocessor",
|
| 3 |
+
"steps": [
|
| 4 |
+
{
|
| 5 |
+
"registry_name": "rename_observations_processor",
|
| 6 |
+
"config": {
|
| 7 |
+
"rename_map": {}
|
| 8 |
+
}
|
| 9 |
+
},
|
| 10 |
+
{
|
| 11 |
+
"registry_name": "to_batch_processor",
|
| 12 |
+
"config": {}
|
| 13 |
+
},
|
| 14 |
+
{
|
| 15 |
+
"registry_name": "normalizer_processor",
|
| 16 |
+
"config": {
|
| 17 |
+
"eps": 1e-08,
|
| 18 |
+
"features": {},
|
| 19 |
+
"norm_map": {
|
| 20 |
+
"VISUAL": "IDENTITY",
|
| 21 |
+
"STATE": "QUANTILES",
|
| 22 |
+
"ACTION": "QUANTILES"
|
| 23 |
+
}
|
| 24 |
+
}
|
| 25 |
+
},
|
| 26 |
+
{
|
| 27 |
+
"registry_name": "pi05_prepare_state_tokenizer_processor_step",
|
| 28 |
+
"config": {}
|
| 29 |
+
},
|
| 30 |
+
{
|
| 31 |
+
"registry_name": "tokenizer_processor",
|
| 32 |
+
"config": {
|
| 33 |
+
"max_length": 200,
|
| 34 |
+
"task_key": "task",
|
| 35 |
+
"padding_side": "right",
|
| 36 |
+
"padding": "max_length",
|
| 37 |
+
"truncation": true,
|
| 38 |
+
"tokenizer_name": "google/paligemma-3b-pt-224"
|
| 39 |
+
}
|
| 40 |
+
},
|
| 41 |
+
{
|
| 42 |
+
"registry_name": "device_processor",
|
| 43 |
+
"config": {
|
| 44 |
+
"device": "cpu",
|
| 45 |
+
"float_dtype": null
|
| 46 |
+
}
|
| 47 |
+
}
|
| 48 |
+
]
|
| 49 |
+
}
|