openpi-oft / examples /droid /README_train.md
Sichang0621's picture
Upload folder using huggingface_hub
ce5618e verified

Training on DROID

Here we describe how to fine-tune the pi0-FAST model on the DROID dataset. This is an approximate open-source reproduction of the pi0-FAST-DROID training pipeline. (small differences in data loading and the used action space)

In contrast to the rest of openpi, which uses LeRobot for data loading, we need to use RLDS as the data format for DROID training (since atm LeRobot isn't scalable enough for larger datasets like DROID -- they are working on improving it though). Below, we provide instructions for updating your openpi environment for RLDS data loading and where to download the DROID dataset.

Install

We need a few additional dependencies for RLDS data loading. Run:

uv sync --group rlds

Download DROID dataset

You can download a (slightly outdated) version of DROID with the following command (after installing the gsutil google cloud CLI):

gsutil -m cp -r gs://gresearch/robotics/droid <your_download_path>

Note that this version of DROID is slightly outdated: it only contains a partial set of language annotations (~30k episodes). Please email karl.pertsch@gmail.com to get access to the most up-to-date version of the DROID RLDS dataset (with language annotations on 75k episodes)! (sorry, we are working on updating the version on the official bucket).

You will need 1.8TB of disk storage to download the DROID RLDS dataset.

Run

First, change the rlds_data_dir path in your TrainConfig to the directory that you downloaded the droid dataset into (see src/openpi/training/config.py).

Then, compute normalization statistics (this will take ~10 minutes):

uv run --group rlds scripts/compute_norm_stats.py --config-name pi0_fast_droid_finetune --max-frames 10_000_000

Run training:

uv run --group rlds scripts/train.py pi0_fast_droid_finetune --exp-name=my_experiment --overwrite

Note: The original pi0-FAST-DROID model was trained with joint velocity actions. Joint velocity actions are not compatible with simulated evaluation environments (much harder to simulate). Thus, we do not recommend training with joint velocity actions and instead use joint position actions here.

Compute Requirements

Our DROID training config requires approximately 2 days on 8x H100 GPUs for convergence (100k iterations, bs256, approx. 1 epoch). If you start from PaliGemma instead of pi0 initialization, plan with ~5 days on 8x H100s (240k iterations, i.e. 3 epochs).

We have experimented with LoRA for cheaper finetuning, but haven't found the policies to perform well so far.