lerobot
/

pi05_base

Safetensors

English

Model card Files Files and versions

xet

Community

pepijn223 HF Staff commited on Sep 9, 2025

Commit

6f0faba

verified ·

1 Parent(s): 5f59472

Add generated README

Browse files

Files changed (1) hide show

README.md +93 -0

README.md ADDED Viewed

	@@ -0,0 +1,93 @@

+# PI0.5 Pi05 Base (PyTorch, 32-bit floating point)
+This is a PyTorch version of the PI0.5 pi05_base model, converted from the original JAX/Flax implementation.
+## Model Details
+- **Architecture**: PI0.5 (Vision-Language-Action model with discrete state input)
+- **Model Type**: PI0.5
+- **Domain**: Base model (general purpose)
+- **Precision**: 32-bit floating point (fp32)
+- **Action Dimension**: 32
+- **Action Horizon**: 50
+- **Max Token Length**: 200
+- **Vision Model**: PaliGemma (gemma_2b)
+- **Action Expert**: gemma_300m
+## Key Features
+- **Discrete State Input**: Uses discrete language tokens for state representation
+- **Flow Matching**: Utilizes adaRMSNorm for timestep injection in action expert
+- **Enhanced Action Modeling**: Improved action prediction with flow matching approach
+## Conversion Details
+This model was converted from JAX to PyTorch using the OpenPI conversion script:
+```bash
+python examples/convert_jax_model_to_pytorch.py \
+    --checkpoint_dir /fsx/pepijn/pi05_base \
+    --config_name pi05_base \
+    --output_path /fsx/pepijn/pi05_base/pytorch/fp32/ \
+    --precision float32
+```
+**Conversion Date**: 2025-09-09
+## Usage
+```python
+from openpi.models_pytorch.pi0_pytorch import PI0Pytorch
+import torch
+# Load the model
+model = PI0Pytorch.from_pretrained("pepijn223/pi05_base_fp32")
+# The model expects inputs in the format:
+# - images: torch.Tensor of shape [batch, height, width, channels]
+# - text: tokenized text prompts
+# - proprioceptive_state: robot state information (if applicable)
+```
+## Model Architecture
+The model consists of:
+1. **Vision Encoder**: PaliGemma-based vision processing
+2. **Language Encoder**: Text prompt understanding
+3. **Action Expert**: Specialized network for action prediction
+4. **Integration Layer**: Combines multimodal information for action output
+## Training Data
+This model was trained on robotics datasets appropriate for its domain:
+- **DROID models**: Trained on diverse robot manipulation data
+- **ALOHA models**: Trained on bimanual manipulation tasks
+- **LIBERO models**: Trained on diverse tabletop manipulation scenarios
+- **Base models**: Trained on general robotics datasets
+## Limitations
+- Model performance depends on similarity between deployment and training environments
+- May require domain-specific fine-tuning for optimal performance
+- Action space must match the trained action dimension (32)
+## Citation
+If you use this model, please cite the original OpenPI work:
+```bibtex
+@article{openpi2024,
+    title={Open-World Robotic Manipulation with Vision-Language-Action Models},
+    author={Physical Intelligence},
+    year={2024},
+    url={https://github.com/Physical-Intelligence/openpi}
+}
+```
+## Original Repository
+[OpenPI GitHub Repository](https://github.com/Physical-Intelligence/openpi)
+## License
+This model follows the same license as the original OpenPI repository.