--- license: apache-2.0 tags: - robotics - lerobot - pi0-fast - imitation-learning - vla datasets: - gpudad/so101_pick_cube_chunked base_model: - lerobot/pi0fast-base pipeline_tag: robotics --- # π₀-FAST SO101 Pick & Place A finetuned [π₀-FAST](https://huggingface.co/lerobot/pi0fast-base) model for pick and place tasks on the SO101 robot arm. ## Model Details - **Base Model:** lerobot/pi0fast-base (3B parameters) - **Training Dataset:** [gpudad/so101_pick_cube_chunked](https://huggingface.co/datasets/gpudad/so101_pick_cube_chunked) - 10,990 episodes - 1,456,443 frames @ 30 FPS - 3 cameras: front, overhead, wrist (512x512) - 6-DOF action space - **Training Steps:** 10,000 (quick validation run) - **Final Loss:** 2.35 - **Hardware:** NVIDIA RTX 5090 (32GB VRAM) ## Performance Tested on held-out samples from the dataset: | Metric | Value | |--------|-------| | Mean MAE | 0.079 | | Relative Error | ~2.6% of action range | | Best MAE | 0.0085 | ## Usage ```python from lerobot.policies.pi0_fast.modeling_pi0_fast import PI0FastPolicy from lerobot.processor.pipeline import PolicyProcessorPipeline # Load model policy = PI0FastPolicy.from_pretrained("gpudad/pi0fast-so101-pick-cube") policy.to("cuda") policy.eval() # Load processors preprocessor = PolicyProcessorPipeline.from_pretrained( "gpudad/pi0fast-so101-pick-cube", "policy_preprocessor.json" ) postprocessor = PolicyProcessorPipeline.from_pretrained( "gpudad/pi0fast-so101-pick-cube", "policy_postprocessor.json" ) # Run inference observation = { "observation.state": state_tensor, "observation.images.front": front_image, "observation.images.wrist": wrist_image, "observation.images.overhead": overhead_image, "task": "pick up the object and place it in the target location", } batch = preprocessor(observation) batch['observation.language.attention_mask'] = batch['observation.language.attention_mask'].bool() policy.reset() with torch.no_grad(): action = policy.select_action(batch) result = postprocessor({"action": action}) final_action = result["action"] ``` ## Training Configuration ```yaml policy.type: pi0_fast policy.dtype: bfloat16 policy.gradient_checkpointing: true policy.chunk_size: 10 policy.n_action_steps: 10 batch_size: 4 optimizer_lr: 2.5e-5 scheduler_warmup_steps: 400 ``` ## Citation If you use this model, please cite the original π₀ paper and LeRobot: ```bibtex @article{black2024pi0, title={$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control}, author={Black, Kevin and Brown, Noah and Driess, Danny and others}, journal={arXiv preprint arXiv:2410.24164}, year={2024} } ```