| | --- |
| | license: mit |
| | tags: |
| | - robotics |
| | - pi-zero |
| | - diffusion |
| | - vision-language-action |
| | - aloha |
| | - manipulation |
| | - bolt-nut-sorting |
| | base_model: google/paligemma-3b-pt-224 |
| | library_name: openpi |
| | pipeline_tag: robotics |
| | --- |
| | |
| | # Pi-0 Bolt Nut Sort Model |
| |
|
| | This is a Pi-0 (Pi-Zero) model trained for bolt and nut sorting tasks using the OpenPI framework. |
| |
|
| | ## Model Description |
| |
|
| | - **Architecture**: Pi-0 (diffusion-based vision-language-action model) |
| | - **Base Model**: PaLiGemma 3B with SigLIP vision encoder |
| | - **Task**: Sorting bolts and nuts into separate baskets |
| | - **Robot**: Dual-arm ALOHA setup |
| | - **Action Space**: 14-DoF (7 per arm: 6 joints + 1 gripper) |
| | - **Training Steps**: 9,999 |
| | - **Action Horizon**: 50 steps |
| | - **Image Resolution**: 224x224 |
| |
|
| | ## Dataset |
| |
|
| | Trained on the `naungth/pi0_bolt_nut_sort` dataset with the task instruction: |
| | "sort the bolts and the nuts into separate baskets" |
| |
|
| | ## Usage |
| |
|
| | ### With OpenPI |
| |
|
| | ```python |
| | from openpi.policies import policy_config |
| | from openpi.training import config |
| | |
| | # Load the model configuration |
| | config_name = "pi0_bns" |
| | train_config = config.get_config(config_name) |
| | |
| | # Create policy from your local checkpoint |
| | policy = policy_config.create_trained_policy( |
| | train_config, |
| | "path/to/checkpoint", |
| | default_prompt="sort the bolts and the nuts into separate baskets" |
| | ) |
| | |
| | # Use for inference |
| | observation = { |
| | "images": { |
| | "cam_high": image_array, # [H, W, 3] uint8 |
| | "cam_left_wrist": left_wrist_image, # [H, W, 3] uint8 |
| | "cam_right_wrist": right_wrist_image, # [H, W, 3] uint8 |
| | }, |
| | "state": joint_positions, # [14] float32 |
| | "prompt": "sort the bolts and the nuts into separate baskets" |
| | } |
| | |
| | actions = policy.infer(observation)["actions"] # [50, 14] |
| | ``` |
| |
|
| | ### With Policy Server |
| |
|
| | ```bash |
| | # Start the policy server |
| | uv run scripts/serve_policy.py policy:checkpoint \ |
| | --policy.config=pi0_bns \ |
| | --policy.dir=path/to/checkpoint |
| | |
| | # Use with client |
| | from openpi_client import websocket_client_policy |
| | client = websocket_client_policy.WebsocketClientPolicy("localhost", 8000) |
| | actions = client.infer(observation) |
| | ``` |
| |
|
| | ## Model Architecture |
| |
|
| | - **Vision Encoder**: SigLIP-So400m/14 |
| | - **Language Model**: Gemma 2B + Gemma 300M (action expert) |
| | - **Training**: Diffusion-based action prediction |
| | - **Input**: Multi-camera RGB + proprioception + language instruction |
| | - **Output**: Future action sequence (50 timesteps) |
| |
|
| | ## Training Details |
| |
|
| | - **Framework**: JAX/Flax with OpenPI |
| | - **Optimizer**: AdamW |
| | - **Base Checkpoint**: Pi-0 base model from Google |
| | - **Fine-tuning**: Task-specific fine-tuning on bolt nut sort data |
| | - **Normalization**: Dataset-specific state/action normalization |
| |
|
| | ## License |
| |
|
| | MIT License |
| |
|
| | ## Citation |
| |
|
| | If you use this model, please cite: |
| |
|
| | ```bibtex |
| | @article{pi0, |
| | title={Pi-Zero: A Diffusion-Based Policy for Robot Manipulation}, |
| | author={TODO: Add authors}, |
| | year={2024} |
| | } |
| | ``` |
| |
|
| | ## Acknowledgments |
| |
|
| | - Built using the [OpenPI](https://github.com/google-deepmind/openpi) framework |
| | - Based on the Pi-0 architecture |
| | - Training data from bolt nut sorting demonstrations |
| |
|