File size: 4,693 Bytes
26b99b9 d747ca9 26b99b9 e3a5218 26b99b9 d747ca9 26b99b9 d747ca9 ed5cf2c 26b99b9 ed5cf2c 26b99b9 ed5cf2c 26b99b9 ed5cf2c 26b99b9 ed5cf2c 26b99b9 d747ca9 26b99b9 d747ca9 26b99b9 d747ca9 26b99b9 d747ca9 26b99b9 d747ca9 26b99b9 d747ca9 26b99b9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
---
language:
- en
library_name: lerobot
pipeline_tag: robotics
tags:
- vision-language-action
- imitation-learning
- lerobot
inference: false
license: gemma
---
# π₀ (Pi0) (LeRobot)
π₀ is a Vision-Language-Action (VLA) foundation model from Physical Intelligence that jointly reasons over vision, language, and actions to control robots, serving as the base architecture that later enabled π₀.₅’s open-world generalization.
**Original paper:** π0: A Vision-Language-Action Flow Model for General Robot Controlion
**Reference implementation:** https://github.com/Physical-Intelligence/openpi
**LeRobot implementation:** Follows the original reference code for compatibility.
## Model description
- **Inputs:** images (multi-view), proprio/state, optional language instruction
- **Outputs:** continuous actions
- **Training objective:** flow matching
- **Action representation:** continuous
- **Intended use:** Base model to fine tune on your specific use case
## Quick start (inference on a real batch)
### Installation
```bash
pip install "lerobot[pi]@git+https://github.com/huggingface/lerobot.git"
```
For full installation details (including optional video dependencies such as ffmpeg for torchcodec), see the official documentation: https://huggingface.co/docs/lerobot/installation
### Load model + dataset, run `select_action`
```python
import torch
from lerobot.datasets.lerobot_dataset import LeRobotDataset
from lerobot.policies.factory import make_pre_post_processors
# Swap this import per-policy
from lerobot.policies.pi0 import PI0Policy
# load a policy
model_id = "lerobot/pi0_base" # <- swap checkpoint
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
policy = PI0Policy.from_pretrained(model_id).to(device).eval()
preprocess, postprocess = make_pre_post_processors(
policy.config,
model_id,
preprocessor_overrides={"device_processor": {"device": str(device)}},
)
# load a lerobotdataset
dataset = LeRobotDataset("lerobot/libero")
# pick an episode
episode_index = 0
# each episode corresponds to a contiguous range of frame indices
from_idx = dataset.meta.episodes["dataset_from_index"][episode_index]
to_idx = dataset.meta.episodes["dataset_to_index"][episode_index]
# get a single frame from that episode (e.g. the first frame)
frame_index = from_idx
frame = dict(dataset[frame_index])
batch = preprocess(sample)
with torch.inference_mode():
pred_action = policy.select_action(frame)
# use your policy postprocess, this post process the action
# for instance unnormalize the actions, detokenize it etc..
pred_action = postprocess(pred_action)
```
## Training step (loss + backward)
If you’re training / fine-tuning, you typically call `forward(...)` to get a loss and then:
```python
policy.train()
batch = dict(dataset[0])
batch = preprocess(batch)
loss, outputs = policy.forward(batch)
loss.backward()
```
> Notes:
>
> - Some policies expose `policy(**batch)` or return a dict; keep this snippet aligned with the policy API.
> - Use your trainer script (`lerobot-train`) for full training loops.
## How to train / fine-tune
```bash
lerobot-train \
--dataset.repo_id=${HF_USER}/<dataset> \
--output_dir=./outputs/[RUN_NAME] \
--job_name=[RUN_NAME] \
--policy.repo_id=${HF_USER}/<desired_policy_repo_id> \
--policy.path=lerobot/[BASE_CHECKPOINT] \
--policy.dtype=bfloat16 \
--policy.device=cuda \
--steps=100000 \
--batch_size=4
```
Add policy-specific flags below:
- `-policy.chunk_size=...`
- `-policy.n_action_steps=...`
- `-policy.max_action_tokens=...`
- `-policy.gradient_checkpointing=true`
## Real-World Inference & Evaluation
You can use the `record` script from [**`lerobot-record`**](https://github.com/huggingface/lerobot/blob/main/src/lerobot/scripts/lerobot_record.py) with a policy checkpoint as input, to run inference and evaluate your policy.
For instance, run this command or API example to run inference and record 10 evaluation episodes:
```
lerobot-record \
--robot.type=so100_follower \
--robot.port=/dev/ttyACM1 \
--robot.cameras="{ up: {type: opencv, index_or_path: /dev/video10, width: 640, height: 480, fps: 30}, side: {type: intelrealsense, serial_number_or_name: 233522074606, width: 640, height: 480, fps: 30}}" \
--robot.id=my_awesome_follower_arm \
--display_data=false \
--dataset.repo_id=${HF_USER}/eval_so100 \
--dataset.single_task="Put lego brick into the transparent box" \
# <- Teleop optional if you want to teleoperate in between episodes \
# --teleop.type=so100_leader \
# --teleop.port=/dev/ttyACM0 \
# --teleop.id=my_awesome_leader_arm \
--policy.path=${HF_USER}/my_policy
``` |