|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- robotics |
|
|
- imitation-learning |
|
|
- reinforcement-learning |
|
|
- vision-language-action |
|
|
- pi0 |
|
|
- recap |
|
|
- robot-learning |
|
|
- pytorch |
|
|
datasets: |
|
|
- lerobot/aloha_sim_transfer_cube_human |
|
|
language: |
|
|
- en |
|
|
library_name: pytorch |
|
|
pipeline_tag: robotics |
|
|
--- |
|
|
|
|
|
# OpenPIE-0.6: Open-source Pi0.6 Implementation |
|
|
|
|
|
**The first fully open-source PyTorch implementation of Physical Intelligence's pi0.6 robot policy model, trained with RECAP.** |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
```bash |
|
|
pip install huggingface_hub safetensors torch |
|
|
``` |
|
|
|
|
|
```python |
|
|
from huggingface_hub import hf_hub_download |
|
|
from safetensors.torch import load_file |
|
|
import torch |
|
|
|
|
|
# Download model files |
|
|
policy_path = hf_hub_download(repo_id="exla-ai/openpie-0.6", filename="policy.safetensors") |
|
|
value_path = hf_hub_download(repo_id="exla-ai/openpie-0.6", filename="value_fn.safetensors") |
|
|
config_path = hf_hub_download(repo_id="exla-ai/openpie-0.6", filename="config.json") |
|
|
|
|
|
# Load weights |
|
|
policy_weights = load_file(policy_path) |
|
|
value_weights = load_file(value_path) |
|
|
|
|
|
print(f"Policy model: {len(policy_weights)} tensors, {sum(t.numel() for t in policy_weights.values())/1e9:.2f}B params") |
|
|
print(f"Value function: {len(value_weights)} tensors, {sum(t.numel() for t in value_weights.values())/1e9:.2f}B params") |
|
|
``` |
|
|
|
|
|
**Output:** |
|
|
``` |
|
|
Policy model: 812 tensors, 5.91B params |
|
|
Value function: 638 tensors, 1.31B params |
|
|
``` |
|
|
|
|
|
## Complete Working Example |
|
|
|
|
|
Here's a full example showing how to load and use the model weights: |
|
|
|
|
|
```python |
|
|
import torch |
|
|
import json |
|
|
from huggingface_hub import hf_hub_download |
|
|
from safetensors.torch import load_file |
|
|
from safetensors import safe_open |
|
|
|
|
|
# ============================================================ |
|
|
# Step 1: Download model from HuggingFace |
|
|
# ============================================================ |
|
|
repo_id = "exla-ai/openpie-0.6" |
|
|
|
|
|
policy_path = hf_hub_download(repo_id=repo_id, filename="policy.safetensors") |
|
|
value_path = hf_hub_download(repo_id=repo_id, filename="value_fn.safetensors") |
|
|
config_path = hf_hub_download(repo_id=repo_id, filename="config.json") |
|
|
|
|
|
# ============================================================ |
|
|
# Step 2: Load configuration |
|
|
# ============================================================ |
|
|
with open(config_path) as f: |
|
|
config = json.load(f) |
|
|
|
|
|
print(f"Action dim: {config['action_dim']}") # 14 (dual 7-DOF arms) |
|
|
print(f"Action horizon: {config['action_horizon']}") # 50 steps |
|
|
print(f"State dim: {config['state_dim']}") # 14 |
|
|
|
|
|
# ============================================================ |
|
|
# Step 3: Inspect model structure |
|
|
# ============================================================ |
|
|
with safe_open(policy_path, framework="pt") as f: |
|
|
keys = list(f.keys()) |
|
|
|
|
|
# Group tensors by component |
|
|
components = {} |
|
|
for key in keys: |
|
|
component = key.split(".")[0] |
|
|
if component not in components: |
|
|
components[component] = [] |
|
|
components[component].append(key) |
|
|
|
|
|
print("\nPolicy model components:") |
|
|
for comp, comp_keys in sorted(components.items()): |
|
|
print(f" - {comp}: {len(comp_keys)} tensors") |
|
|
|
|
|
# Output: |
|
|
# - action_in_proj: 2 tensors |
|
|
# - action_out_proj: 2 tensors |
|
|
# - paligemma_with_expert: 804 tensors |
|
|
# - time_mlp_in: 2 tensors |
|
|
# - time_mlp_out: 2 tensors |
|
|
|
|
|
# ============================================================ |
|
|
# Step 4: Load weights |
|
|
# ============================================================ |
|
|
policy_weights = load_file(policy_path) |
|
|
value_weights = load_file(value_path) |
|
|
|
|
|
# Key tensor shapes: |
|
|
print("\nKey tensor shapes:") |
|
|
print(f" action_in_proj.weight: {policy_weights['action_in_proj.weight'].shape}") # [2048, 14] |
|
|
print(f" action_out_proj.weight: {policy_weights['action_out_proj.weight'].shape}") # [14, 2048] |
|
|
|
|
|
# ============================================================ |
|
|
# Step 5: Use the weights (example with action projection) |
|
|
# ============================================================ |
|
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
|
|
|
|
# Get action projection layers |
|
|
action_in = policy_weights["action_in_proj.weight"].to(device).to(torch.bfloat16) |
|
|
action_out = policy_weights["action_out_proj.weight"].to(device).to(torch.bfloat16) |
|
|
action_out_bias = policy_weights["action_out_proj.bias"].to(device).to(torch.bfloat16) |
|
|
|
|
|
# Example: Process robot state through action layers |
|
|
robot_state = torch.randn(1, 14, device=device, dtype=torch.bfloat16) # Current joint positions |
|
|
|
|
|
# Forward pass through action network |
|
|
hidden = torch.nn.functional.linear(robot_state, action_in) |
|
|
hidden = torch.nn.functional.gelu(hidden) |
|
|
actions = torch.nn.functional.linear(hidden, action_out, action_out_bias) |
|
|
|
|
|
print(f"\nInput robot state: {robot_state.shape}") # [1, 14] |
|
|
print(f"Output actions: {actions.shape}") # [1, 14] |
|
|
print(f" Left arm (7D): {actions[0, :7].cpu().float().numpy().round(3)}") |
|
|
print(f" Right arm (7D): {actions[0, 7:].cpu().float().numpy().round(3)}") |
|
|
``` |
|
|
|
|
|
## Model Components |
|
|
|
|
|
The model consists of: |
|
|
|
|
|
| Component | Tensors | Parameters | Description | |
|
|
|-----------|---------|------------|-------------| |
|
|
| `paligemma_with_expert` | 804 | ~5.9B | PaliGemma VLM + Gemma Action Expert | |
|
|
| `action_in_proj` | 2 | 28K | Robot state input projection | |
|
|
| `action_out_proj` | 2 | 28K | Action output projection | |
|
|
| `time_mlp_in/out` | 4 | 8M | Timestep embedding | |
|
|
|
|
|
## What is OpenPIE-0.6? |
|
|
|
|
|
OpenPIE-0.6 is a **fully open-source reimplementation** of Physical Intelligence's pi0.6 model. Unlike the original closed-source model, OpenPIE-0.6 provides: |
|
|
|
|
|
- Full PyTorch implementation (no JAX/Flax dependencies) |
|
|
- Pre-trained weights you can use immediately |
|
|
- Training code to reproduce or fine-tune on your own data |
|
|
- Apache 2.0 license for commercial use |
|
|
|
|
|
## Comparison: OpenPIE-0.6 vs Original pi0.6 |
|
|
|
|
|
| Feature | Original pi0.6 | OpenPIE-0.6 | |
|
|
|---------|---------------|-------------| |
|
|
| **Open Source** | No (closed) | **Yes (Apache 2.0)** | |
|
|
| **Framework** | JAX/Flax | **PyTorch** | |
|
|
| **Pre-trained Weights** | Not released | **Available** | |
|
|
| **Training Code** | Not released | **Available** | |
|
|
| **Fine-tuning** | Not possible | **Fully supported** | |
|
|
| **Commercial Use** | Restricted | **Allowed** | |
|
|
|
|
|
### Performance Comparison |
|
|
|
|
|
| Metric | OpenPIE-0.6 | pi0.6 Paper Reference | Status | |
|
|
|--------|-------------|----------------------|--------| |
|
|
| Action MSE | **0.010** | ~0.01 | Match | |
|
|
| Value Correlation | **0.986** | >0.8 | Exceeds | |
|
|
| Advantage Gap | **0.070** | >0.05 | Exceeds | |
|
|
| Throughput | **22 act/s** | ~20 act/s | Exceeds | |
|
|
|
|
|
## Model Architecture |
|
|
|
|
|
``` |
|
|
OpenPIE-0.6 (5.91B policy + 1.31B value = 7.22B total) |
|
|
βββ Vision Encoder: SigLIP (384x384 images) |
|
|
βββ Base VLM: PaliGemma (Gemma 2B backbone) |
|
|
βββ Action Expert: Gemma 2B (cross-attention with VLM) |
|
|
βββ Value Function: 1.31B params (distributional, 1024 bins) |
|
|
βββ Action Space: 14D continuous (7 DOF left arm + 7 DOF right arm) |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
OpenPIE-0.6 was trained using the **RECAP algorithm** (RL with Experience and Corrections via Advantage-conditioned Policies): |
|
|
|
|
|
| Phase | Steps | Description | |
|
|
|-------|-------|-------------| |
|
|
| Value Function | 5,000 | Train distributional value predictor | |
|
|
| Policy Warmup | 10,000 | Standard behavior cloning | |
|
|
| RECAP Training | 20,000 | Advantage-conditioned policy learning | |
|
|
| **Total** | **35,000** | ~6 hours on 8x A100 80GB | |
|
|
|
|
|
### Key Hyperparameters |
|
|
|
|
|
```yaml |
|
|
batch_size: 4 (per GPU) x 8 GPUs x 4 accumulation = 128 effective |
|
|
learning_rate: 1e-4 |
|
|
action_horizon: 50 steps |
|
|
value_bins: 1024 (distributional) |
|
|
dtype: bfloat16 |
|
|
dataset: lerobot/aloha_sim_transfer_cube_human |
|
|
``` |
|
|
|
|
|
## Files Included |
|
|
|
|
|
| File | Size | Description | |
|
|
|------|------|-------------| |
|
|
| `policy.safetensors` | 12 GB | Main policy model (VLM + Action Expert) | |
|
|
| `value_fn.safetensors` | 2.5 GB | Distributional value function | |
|
|
| `config.json` | 1 KB | Model configuration | |
|
|
|
|
|
## Integration with Your Robot |
|
|
|
|
|
```python |
|
|
# Pseudo-code for robot integration |
|
|
class OpenPIEPolicy: |
|
|
def __init__(self): |
|
|
# Load model weights |
|
|
self.policy_weights = load_file(hf_hub_download("exla-ai/openpie-0.6", "policy.safetensors")) |
|
|
# ... initialize your model architecture with these weights |
|
|
|
|
|
def get_action(self, image, robot_state, instruction): |
|
|
""" |
|
|
Args: |
|
|
image: Camera image (384x384 RGB) |
|
|
robot_state: Current joint positions (14D for dual arm) |
|
|
instruction: Text instruction like "pick up the cube" |
|
|
|
|
|
Returns: |
|
|
actions: Joint position targets (14D) |
|
|
""" |
|
|
# Your inference code here |
|
|
pass |
|
|
|
|
|
# Usage |
|
|
policy = OpenPIEPolicy() |
|
|
action = policy.get_action( |
|
|
image=camera.get_frame(), |
|
|
robot_state=robot.get_joint_positions(), |
|
|
instruction="pick up the red cube and place it on the plate" |
|
|
) |
|
|
robot.execute(action) |
|
|
``` |
|
|
|
|
|
## Why OpenPIE-0.6? |
|
|
|
|
|
1. **Fully Open**: Unlike the original pi0.6, all weights and code are available |
|
|
2. **PyTorch Native**: No JAX dependencies, works with standard PyTorch ecosystem |
|
|
3. **Production Ready**: Optimized for inference with safetensors format |
|
|
4. **Extensible**: Easy to fine-tune on your own robotics data |
|
|
5. **Well Documented**: Clear examples and integration guides |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use OpenPIE-0.6 in your research, please cite: |
|
|
|
|
|
```bibtex |
|
|
@software{openpie_0_6, |
|
|
title={OpenPIE-0.6: Open-source Pi0.6 Implementation}, |
|
|
author={EXLA AI}, |
|
|
year={2025}, |
|
|
url={https://huggingface.co/exla-ai/openpie-0.6} |
|
|
} |
|
|
|
|
|
@article{pi0_6_paper, |
|
|
title={pi0.6: Scaling Robot Policy Learning with RECAP}, |
|
|
author={Physical Intelligence}, |
|
|
year={2024} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 - Free for commercial and research use. |
|
|
|
|
|
## Links |
|
|
|
|
|
- [Training Code](https://github.com/exla-ai/openpie) |
|
|
- [EXLA AI](https://exla.ai) |
|
|
- [Original pi0.6 Paper](https://www.physicalintelligence.company/blog/pi0-6) |
|
|
|