pi0-FAST YouTube Cleaning Experiment
Thesis: Open-source YouTube videos can fine-tune VLA models to produce valid robot actions.
Key Results
| Metric | Value |
|---|---|
| Base model loss | 39.47 |
| Fine-tuned loss | 12.91 |
| Improvement | 67.3% |
| Joint limit compliance | 100% |
| Training time | 77 min (1 epoch, A100 80GB) |
| Trainable params | 13.3M / 2.9B (0.45% LoRA) |
Pipeline
YouTube cleaning videos β HaMeR 3D hand tracking β VLM labeling β
Franka IK retargeting β LeRobot HDF5 β pi0-FAST LoRA fine-tuning
Repository Structure
βββ EXPERIMENT_SUMMARY.txt # Full thesis, results, deployment path
βββ training/
β βββ training_results.json # All hyperparams, loss curve, eval metrics
β βββ train_pi0fast.py # Training script
β βββ train.log # Raw training log
β βββ adapter_config.json # LoRA adapter config
βββ evaluation/
β βββ eval_results.json # Per-episode evaluation metrics
β βββ eval_*.mp4 # MuJoCo rendering videos (source + Franka)
βββ plots/
β βββ loss_curve.png # Training loss curve
β βββ loss_curve_log.png # Loss curve (log scale)
β βββ lr_schedule.png # Learning rate schedule
β βββ base_vs_finetuned.png # Comparison bar chart
β βββ training_speed.png # Wall clock time vs steps
βββ checkpoint_epoch1/ # LoRA adapter weights (51MB)
βββ adapter_model.safetensors
βββ adapter_config.json
What We Proved
- Valid actions: 100% within Franka joint limits from YouTube-derived data
- Significant learning: 67.3% loss reduction in 1 epoch
- Preserved base knowledge: LoRA (0.45% params) keeps pretrained capabilities
- Minimal data pipeline: 2 YouTube videos β 361 episodes, fully automated
See EXPERIMENT_SUMMARY.txt for full details on deployment path.