gpudad's picture
Add model card
44d2a38 verified
---
license: apache-2.0
tags:
- robotics
- lerobot
- pi0-fast
- imitation-learning
- vla
datasets:
- gpudad/so101_pick_cube_chunked
base_model:
- lerobot/pi0fast-base
pipeline_tag: robotics
---
# π₀-FAST SO101 Pick & Place
A finetuned [π₀-FAST](https://huggingface.co/lerobot/pi0fast-base) model for pick and place tasks on the SO101 robot arm.
## Model Details
- **Base Model:** lerobot/pi0fast-base (3B parameters)
- **Training Dataset:** [gpudad/so101_pick_cube_chunked](https://huggingface.co/datasets/gpudad/so101_pick_cube_chunked)
- 10,990 episodes
- 1,456,443 frames @ 30 FPS
- 3 cameras: front, overhead, wrist (512x512)
- 6-DOF action space
- **Training Steps:** 10,000 (quick validation run)
- **Final Loss:** 2.35
- **Hardware:** NVIDIA RTX 5090 (32GB VRAM)
## Performance
Tested on held-out samples from the dataset:
| Metric | Value |
|--------|-------|
| Mean MAE | 0.079 |
| Relative Error | ~2.6% of action range |
| Best MAE | 0.0085 |
## Usage
```python
from lerobot.policies.pi0_fast.modeling_pi0_fast import PI0FastPolicy
from lerobot.processor.pipeline import PolicyProcessorPipeline
# Load model
policy = PI0FastPolicy.from_pretrained("gpudad/pi0fast-so101-pick-cube")
policy.to("cuda")
policy.eval()
# Load processors
preprocessor = PolicyProcessorPipeline.from_pretrained(
"gpudad/pi0fast-so101-pick-cube",
"policy_preprocessor.json"
)
postprocessor = PolicyProcessorPipeline.from_pretrained(
"gpudad/pi0fast-so101-pick-cube",
"policy_postprocessor.json"
)
# Run inference
observation = {
"observation.state": state_tensor,
"observation.images.front": front_image,
"observation.images.wrist": wrist_image,
"observation.images.overhead": overhead_image,
"task": "pick up the object and place it in the target location",
}
batch = preprocessor(observation)
batch['observation.language.attention_mask'] = batch['observation.language.attention_mask'].bool()
policy.reset()
with torch.no_grad():
action = policy.select_action(batch)
result = postprocessor({"action": action})
final_action = result["action"]
```
## Training Configuration
```yaml
policy.type: pi0_fast
policy.dtype: bfloat16
policy.gradient_checkpointing: true
policy.chunk_size: 10
policy.n_action_steps: 10
batch_size: 4
optimizer_lr: 2.5e-5
scheduler_warmup_steps: 400
```
## Citation
If you use this model, please cite the original π₀ paper and LeRobot:
```bibtex
@article{black2024pi0,
title={$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control},
author={Black, Kevin and Brown, Noah and Driess, Danny and others},
journal={arXiv preprint arXiv:2410.24164},
year={2024}
}
```