pi0-FAST YouTube Cleaning Experiment

Thesis: Open-source YouTube videos can fine-tune VLA models to produce valid robot actions.

Key Results

Metric	Value
Base model loss	39.47
Fine-tuned loss	12.91
Improvement	67.3%
Joint limit compliance	100%
Training time	77 min (1 epoch, A100 80GB)
Trainable params	13.3M / 2.9B (0.45% LoRA)

Pipeline

YouTube cleaning videos → HaMeR 3D hand tracking → VLM labeling →
Franka IK retargeting → LeRobot HDF5 → pi0-FAST LoRA fine-tuning

Repository Structure

├── EXPERIMENT_SUMMARY.txt     # Full thesis, results, deployment path
├── training/
│   ├── training_results.json  # All hyperparams, loss curve, eval metrics
│   ├── train_pi0fast.py       # Training script
│   ├── train.log              # Raw training log
│   └── adapter_config.json    # LoRA adapter config
├── evaluation/
│   ├── eval_results.json      # Per-episode evaluation metrics
│   └── eval_*.mp4             # MuJoCo rendering videos (source + Franka)
├── plots/
│   ├── loss_curve.png         # Training loss curve
│   ├── loss_curve_log.png     # Loss curve (log scale)
│   ├── lr_schedule.png        # Learning rate schedule
│   ├── base_vs_finetuned.png  # Comparison bar chart
│   └── training_speed.png     # Wall clock time vs steps
└── checkpoint_epoch1/         # LoRA adapter weights (51MB)
    ├── adapter_model.safetensors
    └── adapter_config.json

What We Proved

Valid actions: 100% within Franka joint limits from YouTube-derived data
Significant learning: 67.3% loss reduction in 1 epoch
Preserved base knowledge: LoRA (0.45% params) keeps pretrained capabilities
Minimal data pipeline: 2 YouTube videos → 361 episodes, fully automated

See EXPERIMENT_SUMMARY.txt for full details on deployment path.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics