Buckets:
| # Multi-GPU Training | |
| This guide shows you how to train policies on multiple GPUs using [Hugging Face Accelerate](https://huggingface.co/docs/accelerate). | |
| ## Installation | |
| `accelerate` is included in the `training` extra. Install it with: | |
| ```bash | |
| pip install 'lerobot[training]' | |
| ``` | |
| ## Training with Multiple GPUs | |
| You can launch training in two ways: | |
| ### Option 1: Without config (specify parameters directly) | |
| You can specify all parameters directly in the command without running `accelerate config`: | |
| ```bash | |
| accelerate launch \ | |
| --multi_gpu \ | |
| --num_processes=2 \ | |
| $(which lerobot-train) \ | |
| --dataset.repo_id=${HF_USER}/my_dataset \ | |
| --policy.type=act \ | |
| --policy.repo_id=${HF_USER}/my_trained_policy \ | |
| --output_dir=outputs/train/act_multi_gpu \ | |
| --job_name=act_multi_gpu \ | |
| --wandb.enable=true | |
| ``` | |
| **Key accelerate parameters:** | |
| - `--multi_gpu`: Enable multi-GPU training | |
| - `--num_processes=2`: Number of GPUs to use | |
| - `--mixed_precision=fp16`: Use fp16 mixed precision (or `bf16` if supported) | |
| ### Option 2: Using accelerate config | |
| If you prefer to save your configuration, you can optionally configure accelerate for your hardware setup by running: | |
| ```bash | |
| accelerate config | |
| ``` | |
| This interactive setup will ask you questions about your training environment (number of GPUs, mixed precision settings, etc.) and saves the configuration for future use. For a simple multi-GPU setup on a single machine, you can use these recommended settings: | |
| - Compute environment: This machine | |
| - Number of machines: 1 | |
| - Number of processes: (number of GPUs you want to use) | |
| - GPU ids to use: (leave empty to use all) | |
| - Mixed precision: fp16 or bf16 (recommended for faster training) | |
| Then launch training with: | |
| ```bash | |
| accelerate launch $(which lerobot-train) \ | |
| --dataset.repo_id=${HF_USER}/my_dataset \ | |
| --policy.type=act \ | |
| --policy.repo_id=${HF_USER}/my_trained_policy \ | |
| --output_dir=outputs/train/act_multi_gpu \ | |
| --job_name=act_multi_gpu \ | |
| --wandb.enable=true | |
| ``` | |
| ## How It Works | |
| When you launch training with accelerate: | |
| 1. **Automatic detection**: LeRobot automatically detects if it's running under accelerate | |
| 2. **Data distribution**: Your batch is automatically split across GPUs | |
| 3. **Gradient synchronization**: Gradients are synchronized across GPUs during backpropagation | |
| 4. **Single process logging**: Only the main process logs to wandb and saves checkpoints | |
| ## Learning Rate and Training Steps Scaling | |
| **Important:** LeRobot does **NOT** automatically scale learning rates or training steps based on the number of GPUs. This gives you full control over your training hyperparameters. | |
| ### Why No Automatic Scaling? | |
| Many distributed training frameworks automatically scale the learning rate by the number of GPUs (e.g., `lr = base_lr × num_gpus`). | |
| However, LeRobot keeps the learning rate exactly as you specify it. | |
| ### When and How to Scale | |
| If you want to scale your hyperparameters when using multiple GPUs, you should do it manually: | |
| **Learning Rate Scaling:** | |
| ```bash | |
| # Example: 2 GPUs with linear LR scaling | |
| # Base LR: 1e-4, with 2 GPUs -> 2e-4 | |
| accelerate launch --num_processes=2 $(which lerobot-train) \ | |
| --optimizer.lr=2e-4 \ | |
| --dataset.repo_id=lerobot/pusht \ | |
| --policy=act | |
| ``` | |
| **Training Steps Scaling:** | |
| Since the effective batch size `bs` increases with multiple GPUs (batch_size × num_gpus), you may want to reduce the number of training steps proportionally: | |
| ```bash | |
| # Example: 2 GPUs with effective batch size 2x larger | |
| # Original: batch_size=8, steps=100000 | |
| # With 2 GPUs: batch_size=8 (16 in total), steps=50000 | |
| accelerate launch --num_processes=2 $(which lerobot-train) \ | |
| --batch_size=8 \ | |
| --steps=50000 \ | |
| --dataset.repo_id=lerobot/pusht \ | |
| --policy=act | |
| ``` | |
| ## Notes | |
| - The `--policy.use_amp` flag in `lerobot-train` is only used when **not** running with accelerate. When using accelerate, mixed precision is controlled by accelerate's configuration. | |
| - Training logs, checkpoints, and hub uploads are only done by the main process to avoid conflicts. Non-main processes have console logging disabled to prevent duplicate output. | |
| - The effective batch size is `batch_size × num_gpus`. If you use 4 GPUs with `--batch_size=8`, your effective batch size is 32. | |
| - Learning rate scheduling is handled correctly across multiple processes—LeRobot sets `step_scheduler_with_optimizer=False` to prevent accelerate from adjusting scheduler steps based on the number of processes. | |
| - When saving or pushing models, LeRobot automatically unwraps the model from accelerate's distributed wrapper to ensure compatibility. | |
| - WandB integration automatically initializes only on the main process, preventing multiple runs from being created. | |
| For more advanced configurations and troubleshooting, see the [Accelerate documentation](https://huggingface.co/docs/accelerate). If you want to learn more about how to train on a large number of GPUs, checkout this awesome guide: [Ultrascale Playbook](https://huggingface.co/spaces/nanotron/ultrascale-playbook). | |
Xet Storage Details
- Size:
- 5.02 kB
- Xet hash:
- 35348a2f01a0bdd41fd642dd99a759c396f2e0bc920d11ee66e399b7a1388be5
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.