--- license: apache-2.0 tags: - robotics - imitation-learning - reinforcement-learning - vision-language-action - pi0 - recap - robot-learning - pytorch datasets: - lerobot/aloha_sim_transfer_cube_human language: - en library_name: pytorch pipeline_tag: robotics --- # OpenPIE-0.6: Open-source Pi0.6 Implementation **The first fully open-source PyTorch implementation of Physical Intelligence's pi0.6 robot policy model, trained with RECAP.** ## Quick Start ```bash pip install huggingface_hub safetensors torch ``` ```python from huggingface_hub import hf_hub_download from safetensors.torch import load_file import torch # Download model files policy_path = hf_hub_download(repo_id="exla-ai/openpie-0.6", filename="policy.safetensors") value_path = hf_hub_download(repo_id="exla-ai/openpie-0.6", filename="value_fn.safetensors") config_path = hf_hub_download(repo_id="exla-ai/openpie-0.6", filename="config.json") # Load weights policy_weights = load_file(policy_path) value_weights = load_file(value_path) print(f"Policy model: {len(policy_weights)} tensors, {sum(t.numel() for t in policy_weights.values())/1e9:.2f}B params") print(f"Value function: {len(value_weights)} tensors, {sum(t.numel() for t in value_weights.values())/1e9:.2f}B params") ``` **Output:** ``` Policy model: 812 tensors, 5.91B params Value function: 638 tensors, 1.31B params ``` ## Complete Working Example Here's a full example showing how to load and use the model weights: ```python import torch import json from huggingface_hub import hf_hub_download from safetensors.torch import load_file from safetensors import safe_open # ============================================================ # Step 1: Download model from HuggingFace # ============================================================ repo_id = "exla-ai/openpie-0.6" policy_path = hf_hub_download(repo_id=repo_id, filename="policy.safetensors") value_path = hf_hub_download(repo_id=repo_id, filename="value_fn.safetensors") config_path = hf_hub_download(repo_id=repo_id, filename="config.json") # ============================================================ # Step 2: Load configuration # ============================================================ with open(config_path) as f: config = json.load(f) print(f"Action dim: {config['action_dim']}") # 14 (dual 7-DOF arms) print(f"Action horizon: {config['action_horizon']}") # 50 steps print(f"State dim: {config['state_dim']}") # 14 # ============================================================ # Step 3: Inspect model structure # ============================================================ with safe_open(policy_path, framework="pt") as f: keys = list(f.keys()) # Group tensors by component components = {} for key in keys: component = key.split(".")[0] if component not in components: components[component] = [] components[component].append(key) print("\nPolicy model components:") for comp, comp_keys in sorted(components.items()): print(f" - {comp}: {len(comp_keys)} tensors") # Output: # - action_in_proj: 2 tensors # - action_out_proj: 2 tensors # - paligemma_with_expert: 804 tensors # - time_mlp_in: 2 tensors # - time_mlp_out: 2 tensors # ============================================================ # Step 4: Load weights # ============================================================ policy_weights = load_file(policy_path) value_weights = load_file(value_path) # Key tensor shapes: print("\nKey tensor shapes:") print(f" action_in_proj.weight: {policy_weights['action_in_proj.weight'].shape}") # [2048, 14] print(f" action_out_proj.weight: {policy_weights['action_out_proj.weight'].shape}") # [14, 2048] # ============================================================ # Step 5: Use the weights (example with action projection) # ============================================================ device = "cuda" if torch.cuda.is_available() else "cpu" # Get action projection layers action_in = policy_weights["action_in_proj.weight"].to(device).to(torch.bfloat16) action_out = policy_weights["action_out_proj.weight"].to(device).to(torch.bfloat16) action_out_bias = policy_weights["action_out_proj.bias"].to(device).to(torch.bfloat16) # Example: Process robot state through action layers robot_state = torch.randn(1, 14, device=device, dtype=torch.bfloat16) # Current joint positions # Forward pass through action network hidden = torch.nn.functional.linear(robot_state, action_in) hidden = torch.nn.functional.gelu(hidden) actions = torch.nn.functional.linear(hidden, action_out, action_out_bias) print(f"\nInput robot state: {robot_state.shape}") # [1, 14] print(f"Output actions: {actions.shape}") # [1, 14] print(f" Left arm (7D): {actions[0, :7].cpu().float().numpy().round(3)}") print(f" Right arm (7D): {actions[0, 7:].cpu().float().numpy().round(3)}") ``` ## Model Components The model consists of: | Component | Tensors | Parameters | Description | |-----------|---------|------------|-------------| | `paligemma_with_expert` | 804 | ~5.9B | PaliGemma VLM + Gemma Action Expert | | `action_in_proj` | 2 | 28K | Robot state input projection | | `action_out_proj` | 2 | 28K | Action output projection | | `time_mlp_in/out` | 4 | 8M | Timestep embedding | ## What is OpenPIE-0.6? OpenPIE-0.6 is a **fully open-source reimplementation** of Physical Intelligence's pi0.6 model. Unlike the original closed-source model, OpenPIE-0.6 provides: - Full PyTorch implementation (no JAX/Flax dependencies) - Pre-trained weights you can use immediately - Training code to reproduce or fine-tune on your own data - Apache 2.0 license for commercial use ## Comparison: OpenPIE-0.6 vs Original pi0.6 | Feature | Original pi0.6 | OpenPIE-0.6 | |---------|---------------|-------------| | **Open Source** | No (closed) | **Yes (Apache 2.0)** | | **Framework** | JAX/Flax | **PyTorch** | | **Pre-trained Weights** | Not released | **Available** | | **Training Code** | Not released | **Available** | | **Fine-tuning** | Not possible | **Fully supported** | | **Commercial Use** | Restricted | **Allowed** | ### Performance Comparison | Metric | OpenPIE-0.6 | pi0.6 Paper Reference | Status | |--------|-------------|----------------------|--------| | Action MSE | **0.010** | ~0.01 | Match | | Value Correlation | **0.986** | >0.8 | Exceeds | | Advantage Gap | **0.070** | >0.05 | Exceeds | | Throughput | **22 act/s** | ~20 act/s | Exceeds | ## Model Architecture ``` OpenPIE-0.6 (5.91B policy + 1.31B value = 7.22B total) ├── Vision Encoder: SigLIP (384x384 images) ├── Base VLM: PaliGemma (Gemma 2B backbone) ├── Action Expert: Gemma 2B (cross-attention with VLM) ├── Value Function: 1.31B params (distributional, 1024 bins) └── Action Space: 14D continuous (7 DOF left arm + 7 DOF right arm) ``` ## Training Details OpenPIE-0.6 was trained using the **RECAP algorithm** (RL with Experience and Corrections via Advantage-conditioned Policies): | Phase | Steps | Description | |-------|-------|-------------| | Value Function | 5,000 | Train distributional value predictor | | Policy Warmup | 10,000 | Standard behavior cloning | | RECAP Training | 20,000 | Advantage-conditioned policy learning | | **Total** | **35,000** | ~6 hours on 8x A100 80GB | ### Key Hyperparameters ```yaml batch_size: 4 (per GPU) x 8 GPUs x 4 accumulation = 128 effective learning_rate: 1e-4 action_horizon: 50 steps value_bins: 1024 (distributional) dtype: bfloat16 dataset: lerobot/aloha_sim_transfer_cube_human ``` ## Files Included | File | Size | Description | |------|------|-------------| | `policy.safetensors` | 12 GB | Main policy model (VLM + Action Expert) | | `value_fn.safetensors` | 2.5 GB | Distributional value function | | `config.json` | 1 KB | Model configuration | ## Integration with Your Robot ```python # Pseudo-code for robot integration class OpenPIEPolicy: def __init__(self): # Load model weights self.policy_weights = load_file(hf_hub_download("exla-ai/openpie-0.6", "policy.safetensors")) # ... initialize your model architecture with these weights def get_action(self, image, robot_state, instruction): """ Args: image: Camera image (384x384 RGB) robot_state: Current joint positions (14D for dual arm) instruction: Text instruction like "pick up the cube" Returns: actions: Joint position targets (14D) """ # Your inference code here pass # Usage policy = OpenPIEPolicy() action = policy.get_action( image=camera.get_frame(), robot_state=robot.get_joint_positions(), instruction="pick up the red cube and place it on the plate" ) robot.execute(action) ``` ## Why OpenPIE-0.6? 1. **Fully Open**: Unlike the original pi0.6, all weights and code are available 2. **PyTorch Native**: No JAX dependencies, works with standard PyTorch ecosystem 3. **Production Ready**: Optimized for inference with safetensors format 4. **Extensible**: Easy to fine-tune on your own robotics data 5. **Well Documented**: Clear examples and integration guides ## Citation If you use OpenPIE-0.6 in your research, please cite: ```bibtex @software{openpie_0_6, title={OpenPIE-0.6: Open-source Pi0.6 Implementation}, author={EXLA AI}, year={2025}, url={https://huggingface.co/exla-ai/openpie-0.6} } @article{pi0_6_paper, title={pi0.6: Scaling Robot Policy Learning with RECAP}, author={Physical Intelligence}, year={2024} } ``` ## License Apache 2.0 - Free for commercial and research use. ## Links - [Training Code](https://github.com/exla-ai/openpie) - [EXLA AI](https://exla.ai) - [Original pi0.6 Paper](https://www.physicalintelligence.company/blog/pi0-6)