openpie-0.6 / README.md

Update README with complete working examples

b15d734 verified 9 days ago

9.77 kB

	---
	license: apache-2.0
	tags:
	- robotics
	- imitation-learning
	- reinforcement-learning
	- vision-language-action
	- pi0
	- recap
	- robot-learning
	- pytorch
	datasets:
	- lerobot/aloha_sim_transfer_cube_human
	language:
	- en
	library_name: pytorch
	pipeline_tag: robotics
	---

	# OpenPIE-0.6: Open-source Pi0.6 Implementation

	The first fully open-source PyTorch implementation of Physical Intelligence's pi0.6 robot policy model, trained with RECAP.

	## Quick Start

	```bash
	pip install huggingface_hub safetensors torch
	```

	```python
	from huggingface_hub import hf_hub_download
	from safetensors.torch import load_file
	import torch

	# Download model files
	policy_path = hf_hub_download(repo_id="exla-ai/openpie-0.6", filename="policy.safetensors")
	value_path = hf_hub_download(repo_id="exla-ai/openpie-0.6", filename="value_fn.safetensors")
	config_path = hf_hub_download(repo_id="exla-ai/openpie-0.6", filename="config.json")

	# Load weights
	policy_weights = load_file(policy_path)
	value_weights = load_file(value_path)

	print(f"Policy model: {len(policy_weights)} tensors, {sum(t.numel() for t in policy_weights.values())/1e9:.2f}B params")
	print(f"Value function: {len(value_weights)} tensors, {sum(t.numel() for t in value_weights.values())/1e9:.2f}B params")
	```

	Output:
	```
	Policy model: 812 tensors, 5.91B params
	Value function: 638 tensors, 1.31B params
	```

	## Complete Working Example

	Here's a full example showing how to load and use the model weights:

	```python
	import torch
	import json
	from huggingface_hub import hf_hub_download
	from safetensors.torch import load_file
	from safetensors import safe_open

	# ============================================================
	# Step 1: Download model from HuggingFace
	# ============================================================
	repo_id = "exla-ai/openpie-0.6"

	policy_path = hf_hub_download(repo_id=repo_id, filename="policy.safetensors")
	value_path = hf_hub_download(repo_id=repo_id, filename="value_fn.safetensors")
	config_path = hf_hub_download(repo_id=repo_id, filename="config.json")

	# ============================================================
	# Step 2: Load configuration
	# ============================================================
	with open(config_path) as f:
	config = json.load(f)

	print(f"Action dim: {config['action_dim']}") # 14 (dual 7-DOF arms)
	print(f"Action horizon: {config['action_horizon']}") # 50 steps
	print(f"State dim: {config['state_dim']}") # 14

	# ============================================================
	# Step 3: Inspect model structure
	# ============================================================
	with safe_open(policy_path, framework="pt") as f:
	keys = list(f.keys())

	# Group tensors by component
	components = {}
	for key in keys:
	component = key.split(".")[0]
	if component not in components:
	components[component] = []
	components[component].append(key)

	print("\nPolicy model components:")
	for comp, comp_keys in sorted(components.items()):
	print(f" - {comp}: {len(comp_keys)} tensors")

	# Output:
	# - action_in_proj: 2 tensors
	# - action_out_proj: 2 tensors
	# - paligemma_with_expert: 804 tensors
	# - time_mlp_in: 2 tensors
	# - time_mlp_out: 2 tensors

	# ============================================================
	# Step 4: Load weights
	# ============================================================
	policy_weights = load_file(policy_path)
	value_weights = load_file(value_path)

	# Key tensor shapes:
	print("\nKey tensor shapes:")
	print(f" action_in_proj.weight: {policy_weights['action_in_proj.weight'].shape}") # [2048, 14]
	print(f" action_out_proj.weight: {policy_weights['action_out_proj.weight'].shape}") # [14, 2048]

	# ============================================================
	# Step 5: Use the weights (example with action projection)
	# ============================================================
	device = "cuda" if torch.cuda.is_available() else "cpu"

	# Get action projection layers
	action_in = policy_weights["action_in_proj.weight"].to(device).to(torch.bfloat16)
	action_out = policy_weights["action_out_proj.weight"].to(device).to(torch.bfloat16)
	action_out_bias = policy_weights["action_out_proj.bias"].to(device).to(torch.bfloat16)

	# Example: Process robot state through action layers
	robot_state = torch.randn(1, 14, device=device, dtype=torch.bfloat16) # Current joint positions

	# Forward pass through action network
	hidden = torch.nn.functional.linear(robot_state, action_in)
	hidden = torch.nn.functional.gelu(hidden)
	actions = torch.nn.functional.linear(hidden, action_out, action_out_bias)

	print(f"\nInput robot state: {robot_state.shape}") # [1, 14]
	print(f"Output actions: {actions.shape}") # [1, 14]
	print(f" Left arm (7D): {actions[0, :7].cpu().float().numpy().round(3)}")
	print(f" Right arm (7D): {actions[0, 7:].cpu().float().numpy().round(3)}")
	```

	## Model Components

	The model consists of:

	\| Component \| Tensors \| Parameters \| Description \|
	\|-----------\|---------\|------------\|-------------\|
	\| `paligemma_with_expert` \| 804 \| ~5.9B \| PaliGemma VLM + Gemma Action Expert \|
	\| `action_in_proj` \| 2 \| 28K \| Robot state input projection \|
	\| `action_out_proj` \| 2 \| 28K \| Action output projection \|
	\| `time_mlp_in/out` \| 4 \| 8M \| Timestep embedding \|

	## What is OpenPIE-0.6?

	OpenPIE-0.6 is a fully open-source reimplementation of Physical Intelligence's pi0.6 model. Unlike the original closed-source model, OpenPIE-0.6 provides:

	- Full PyTorch implementation (no JAX/Flax dependencies)
	- Pre-trained weights you can use immediately
	- Training code to reproduce or fine-tune on your own data
	- Apache 2.0 license for commercial use

	## Comparison: OpenPIE-0.6 vs Original pi0.6

	\| Feature \| Original pi0.6 \| OpenPIE-0.6 \|
	\|---------\|---------------\|-------------\|
	\| Open Source \| No (closed) \| Yes (Apache 2.0) \|
	\| Framework \| JAX/Flax \| PyTorch \|
	\| Pre-trained Weights \| Not released \| Available \|
	\| Training Code \| Not released \| Available \|
	\| Fine-tuning \| Not possible \| Fully supported \|
	\| Commercial Use \| Restricted \| Allowed \|

	### Performance Comparison

	\| Metric \| OpenPIE-0.6 \| pi0.6 Paper Reference \| Status \|
	\|--------\|-------------\|----------------------\|--------\|
	\| Action MSE \| 0.010 \| ~0.01 \| Match \|
	\| Value Correlation \| 0.986 \| >0.8 \| Exceeds \|
	\| Advantage Gap \| 0.070 \| >0.05 \| Exceeds \|
	\| Throughput \| 22 act/s \| ~20 act/s \| Exceeds \|

	## Model Architecture

	```
	OpenPIE-0.6 (5.91B policy + 1.31B value = 7.22B total)
	├── Vision Encoder: SigLIP (384x384 images)
	├── Base VLM: PaliGemma (Gemma 2B backbone)
	├── Action Expert: Gemma 2B (cross-attention with VLM)
	├── Value Function: 1.31B params (distributional, 1024 bins)
	└── Action Space: 14D continuous (7 DOF left arm + 7 DOF right arm)
	```

	## Training Details

	OpenPIE-0.6 was trained using the RECAP algorithm (RL with Experience and Corrections via Advantage-conditioned Policies):

	\| Phase \| Steps \| Description \|
	\|-------\|-------\|-------------\|
	\| Value Function \| 5,000 \| Train distributional value predictor \|
	\| Policy Warmup \| 10,000 \| Standard behavior cloning \|
	\| RECAP Training \| 20,000 \| Advantage-conditioned policy learning \|
	\| Total \| 35,000 \| ~6 hours on 8x A100 80GB \|

	### Key Hyperparameters

	```yaml
	batch_size: 4 (per GPU) x 8 GPUs x 4 accumulation = 128 effective
	learning_rate: 1e-4
	action_horizon: 50 steps
	value_bins: 1024 (distributional)
	dtype: bfloat16
	dataset: lerobot/aloha_sim_transfer_cube_human
	```

	## Files Included

	\| File \| Size \| Description \|
	\|------\|------\|-------------\|
	\| `policy.safetensors` \| 12 GB \| Main policy model (VLM + Action Expert) \|
	\| `value_fn.safetensors` \| 2.5 GB \| Distributional value function \|
	\| `config.json` \| 1 KB \| Model configuration \|

	## Integration with Your Robot

	```python
	# Pseudo-code for robot integration
	class OpenPIEPolicy:
	def __init__(self):
	# Load model weights
	self.policy_weights = load_file(hf_hub_download("exla-ai/openpie-0.6", "policy.safetensors"))
	# ... initialize your model architecture with these weights

	def get_action(self, image, robot_state, instruction):
	"""
	Args:
	image: Camera image (384x384 RGB)
	robot_state: Current joint positions (14D for dual arm)
	instruction: Text instruction like "pick up the cube"

	Returns:
	actions: Joint position targets (14D)
	"""
	# Your inference code here
	pass

	# Usage
	policy = OpenPIEPolicy()
	action = policy.get_action(
	image=camera.get_frame(),
	robot_state=robot.get_joint_positions(),
	instruction="pick up the red cube and place it on the plate"
	)
	robot.execute(action)
	```

	## Why OpenPIE-0.6?

	1. Fully Open: Unlike the original pi0.6, all weights and code are available
	2. PyTorch Native: No JAX dependencies, works with standard PyTorch ecosystem
	3. Production Ready: Optimized for inference with safetensors format
	4. Extensible: Easy to fine-tune on your own robotics data
	5. Well Documented: Clear examples and integration guides

	## Citation

	If you use OpenPIE-0.6 in your research, please cite:

	```bibtex
	@software{openpie_0_6,
	title={OpenPIE-0.6: Open-source Pi0.6 Implementation},
	author={EXLA AI},
	year={2025},
	url={https://huggingface.co/exla-ai/openpie-0.6}
	}

	@article{pi0_6_paper,
	title={pi0.6: Scaling Robot Policy Learning with RECAP},
	author={Physical Intelligence},
	year={2024}
	}
	```

	## License

	Apache 2.0 - Free for commercial and research use.

	## Links

	- [Training Code](https://github.com/exla-ai/openpie)
	- [EXLA AI](https://exla.ai)
	- [Original pi0.6 Paper](https://www.physicalintelligence.company/blog/pi0-6)