YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Οβ.β - Base
This is a PyTorch version of the PI0.5 pi05_base model, converted from the original JAX/Flax implementation.
Model Details
- Architecture: PI0.5 (Vision-Language-Action model with discrete state input)
- Model Type: PI0.5
- Domain: Base model (general purpose)
- Precision: 32-bit floating point (fp32)
- Vision Model: PaliGemma (gemma_2b)
- Action Expert: gemma_300m
Key Features
- Discrete State Input: Uses discrete language tokens for state representation
- Flow Matching: Utilizes adaRMSNorm for timestep injection in action expert
- Enhanced Action Modeling: Improved action prediction with flow matching approach
Conversion Details
This model was converted from JAX to PyTorch using the OpenPI conversion script:
python examples/convert_jax_model_to_pytorch.py \
--checkpoint_dir /pi05_base \
--config_name pi05_base \
--output_path /pi05_base/pytorch/fp32/ \
--precision float32
Usage
from openpi.models_pytorch.pi0_pytorch import PI0Pytorch
import torch
# Load the model
model = PI0Pytorch.from_pretrained("pepijn223/pi05_base_fp32")
# The model expects inputs in the format:
# - images: torch.Tensor of shape [batch, height, width, channels]
# - text: tokenized text prompts
# - proprioceptive_state: robot state information (if applicable)
Model Architecture
The model consists of:
- Vision Encoder: PaliGemma-based vision processing
- Language Encoder: Text prompt understanding
- Action Expert: Specialized network for action prediction
- Integration Layer: Combines multimodal information for action output
Training Data
This model was trained on robotics datasets appropriate for its domain:
- DROID models: Trained on diverse robot manipulation data
- LIBERO models: Trained on diverse tabletop manipulation scenarios
- Base models: Trained on general robotics datasets
Limitations
- Model performance depends on similarity between deployment and training environments
- May require domain-specific fine-tuning for optimal performance
- Action space must match the trained action dimension (32)
Citation
If you use this model, please cite the original OpenPI work:
@article{openpi2024,
title={Open-World Robotic Manipulation with Vision-Language-Action Models},
author={Physical Intelligence},
year={2024},
url={https://github.com/Physical-Intelligence/openpi}
}
Original Repository
License
This model follows the same license as the original OpenPI repository.
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support