π₀-FAST SO101 Pick & Place

A finetuned π₀-FAST model for pick and place tasks on the SO101 robot arm.

Model Details

  • Base Model: lerobot/pi0fast-base (3B parameters)
  • Training Dataset: gpudad/so101_pick_cube_chunked
    • 10,990 episodes
    • 1,456,443 frames @ 30 FPS
    • 3 cameras: front, overhead, wrist (512x512)
    • 6-DOF action space
  • Training Steps: 10,000 (quick validation run)
  • Final Loss: 2.35
  • Hardware: NVIDIA RTX 5090 (32GB VRAM)

Performance

Tested on held-out samples from the dataset:

Metric Value
Mean MAE 0.079
Relative Error ~2.6% of action range
Best MAE 0.0085

Usage

from lerobot.policies.pi0_fast.modeling_pi0_fast import PI0FastPolicy
from lerobot.processor.pipeline import PolicyProcessorPipeline

# Load model
policy = PI0FastPolicy.from_pretrained("gpudad/pi0fast-so101-pick-cube")
policy.to("cuda")
policy.eval()

# Load processors
preprocessor = PolicyProcessorPipeline.from_pretrained(
    "gpudad/pi0fast-so101-pick-cube", 
    "policy_preprocessor.json"
)
postprocessor = PolicyProcessorPipeline.from_pretrained(
    "gpudad/pi0fast-so101-pick-cube", 
    "policy_postprocessor.json"
)

# Run inference
observation = {
    "observation.state": state_tensor,
    "observation.images.front": front_image,
    "observation.images.wrist": wrist_image,
    "observation.images.overhead": overhead_image,
    "task": "pick up the object and place it in the target location",
}

batch = preprocessor(observation)
batch['observation.language.attention_mask'] = batch['observation.language.attention_mask'].bool()

policy.reset()
with torch.no_grad():
    action = policy.select_action(batch)

result = postprocessor({"action": action})
final_action = result["action"]

Training Configuration

policy.type: pi0_fast
policy.dtype: bfloat16
policy.gradient_checkpointing: true
policy.chunk_size: 10
policy.n_action_steps: 10
batch_size: 4
optimizer_lr: 2.5e-5
scheduler_warmup_steps: 400

Citation

If you use this model, please cite the original π₀ paper and LeRobot:

@article{black2024pi0,
  title={$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control},
  author={Black, Kevin and Brown, Noah and Driess, Danny and others},
  journal={arXiv preprint arXiv:2410.24164},
  year={2024}
}
Downloads last month
10
Safetensors
Model size
3B params
Tensor type
F32
·
BF16
·
Video Preview
loading

Model tree for gpudad/pi0fast-so101-pick-cube

Finetuned
(2)
this model

Dataset used to train gpudad/pi0fast-so101-pick-cube

Paper for gpudad/pi0fast-so101-pick-cube