gpudad
/

pi0fast-so101-pick-cube

imitation-learning

Model card Files Files and versions

pi0fast-so101-pick-cube / README.md

gpudad's picture

Add model card

44d2a38 verified 11 days ago

|

history blame contribute delete

2.67 kB

	---
	license: apache-2.0
	tags:
	- robotics
	- lerobot
	- pi0-fast
	- imitation-learning
	- vla
	datasets:
	- gpudad/so101_pick_cube_chunked
	base_model:
	- lerobot/pi0fast-base
	pipeline_tag: robotics
	---

	# π₀-FAST SO101 Pick & Place

	A finetuned [π₀-FAST](https://huggingface.co/lerobot/pi0fast-base) model for pick and place tasks on the SO101 robot arm.

	## Model Details

	- Base Model: lerobot/pi0fast-base (3B parameters)
	- Training Dataset: [gpudad/so101_pick_cube_chunked](https://huggingface.co/datasets/gpudad/so101_pick_cube_chunked)
	- 10,990 episodes
	- 1,456,443 frames @ 30 FPS
	- 3 cameras: front, overhead, wrist (512x512)
	- 6-DOF action space
	- Training Steps: 10,000 (quick validation run)
	- Final Loss: 2.35
	- Hardware: NVIDIA RTX 5090 (32GB VRAM)

	## Performance

	Tested on held-out samples from the dataset:

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Mean MAE \| 0.079 \|
	\| Relative Error \| ~2.6% of action range \|
	\| Best MAE \| 0.0085 \|

	## Usage

	```python
	from lerobot.policies.pi0_fast.modeling_pi0_fast import PI0FastPolicy
	from lerobot.processor.pipeline import PolicyProcessorPipeline

	# Load model
	policy = PI0FastPolicy.from_pretrained("gpudad/pi0fast-so101-pick-cube")
	policy.to("cuda")
	policy.eval()

	# Load processors
	preprocessor = PolicyProcessorPipeline.from_pretrained(
	"gpudad/pi0fast-so101-pick-cube",
	"policy_preprocessor.json"
	)
	postprocessor = PolicyProcessorPipeline.from_pretrained(
	"gpudad/pi0fast-so101-pick-cube",
	"policy_postprocessor.json"
	)

	# Run inference
	observation = {
	"observation.state": state_tensor,
	"observation.images.front": front_image,
	"observation.images.wrist": wrist_image,
	"observation.images.overhead": overhead_image,
	"task": "pick up the object and place it in the target location",
	}

	batch = preprocessor(observation)
	batch['observation.language.attention_mask'] = batch['observation.language.attention_mask'].bool()

	policy.reset()
	with torch.no_grad():
	action = policy.select_action(batch)

	result = postprocessor({"action": action})
	final_action = result["action"]
	```

	## Training Configuration

	```yaml
	policy.type: pi0_fast
	policy.dtype: bfloat16
	policy.gradient_checkpointing: true
	policy.chunk_size: 10
	policy.n_action_steps: 10
	batch_size: 4
	optimizer_lr: 2.5e-5
	scheduler_warmup_steps: 400
	```

	## Citation

	If you use this model, please cite the original π₀ paper and LeRobot:

	```bibtex
	@article{black2024pi0,
	title={$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control},
	author={Black, Kevin and Brown, Noah and Driess, Danny and others},
	journal={arXiv preprint arXiv:2410.24164},
	year={2024}
	}
	```