|
|
--- |
|
|
license: mit |
|
|
--- |
|
|
 |
|
|
β‘ Hugging Face Accelerate β Effortless Multi-GPU & Distributed Training |
|
|
--- |
|
|
Welcome to **π€ Accelerate**, your lightweight, zero-boilerplate training tool for **PyTorch** and **Transformers**. Whether you're working on a single laptop, multi-GPU setup, or a large-scale cloud cluster, Accelerate abstracts away the complexity of hardware and distributed systems β so you can focus on building great models. |
|
|
|
|
|
--- |
|
|
|
|
|
## β¨ Why Use Accelerate? |
|
|
|
|
|
- **Single-to-Multi GPU/TPU** in *one line*. |
|
|
- **No code refactor** β use the same script everywhere. |
|
|
- **Launch-ready**: Easily scale training jobs from dev to prod. |
|
|
- **Supports DeepSpeed, FSDP, TPU, Multi-node**. |
|
|
- **Compatible with π€ Transformers, Datasets, PEFT**, and more. |
|
|
- **Built-in CLI** for quick configuration and debugging. |
|
|
|
|
|
> Accelerate is perfect for DevOps, MLOps, and Full Stack AI teams looking to scale training workloads without managing deep infrastructure internals. |
|
|
|
|
|
--- |
|
|
|
|
|
## βοΈ Installation |
|
|
|
|
|
```bash |
|
|
pip install accelerate |
|
|
``` |
|
|
|
|
|
Optional: For DeepSpeed, TPU, FSDP, and other accelerators: |
|
|
|
|
|
```bash |
|
|
pip install "accelerate[deepspeed]" |
|
|
pip install "accelerate[torch_xla]" |
|
|
pip install "accelerate[fsdp]" |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π Quick Start |
|
|
|
|
|
### Step 1: Configure |
|
|
|
|
|
```bash |
|
|
accelerate config |
|
|
``` |
|
|
|
|
|
You'll be guided through an interactive setup (or use `accelerate config default` to auto-generate). |
|
|
|
|
|
### Step 2: Launch your script |
|
|
|
|
|
```bash |
|
|
accelerate launch train.py |
|
|
``` |
|
|
|
|
|
Accelerate automatically applies device mapping, DDP/FSDP strategy, gradient accumulation, and more β based on your config. |
|
|
|
|
|
--- |
|
|
|
|
|
## π§ͺ Example: Training a Transformer at Scale |
|
|
|
|
|
**train.py** |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments |
|
|
from datasets import load_dataset |
|
|
from accelerate import Accelerator |
|
|
|
|
|
accelerator = Accelerator() |
|
|
dataset = load_dataset("imdb") |
|
|
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") |
|
|
|
|
|
def tokenize(batch): |
|
|
return tokenizer(batch["text"], padding=True, truncation=True) |
|
|
|
|
|
tokenized = dataset.map(tokenize, batched=True) |
|
|
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased") |
|
|
|
|
|
training_args = TrainingArguments( |
|
|
output_dir="output", |
|
|
per_device_train_batch_size=8, |
|
|
evaluation_strategy="epoch", |
|
|
num_train_epochs=1, |
|
|
) |
|
|
|
|
|
trainer = Trainer( |
|
|
model=model, |
|
|
args=training_args, |
|
|
train_dataset=tokenized["train"].shuffle(seed=42).select(range(2000)), |
|
|
eval_dataset=tokenized["test"].select(range(500)), |
|
|
) |
|
|
|
|
|
trainer.train() |
|
|
``` |
|
|
|
|
|
Then just run: |
|
|
|
|
|
```bash |
|
|
accelerate launch train.py |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## βοΈ Advanced Use Cases |
|
|
|
|
|
Accelerate supports: |
|
|
|
|
|
- **DeepSpeed**: ZeRO offloading, memory savings |
|
|
- **FSDP**: Fine-grained model sharding |
|
|
- **TPUs**: Train on TPU cores seamlessly |
|
|
- **Multi-node/multi-GPU**: via SLURM or CLI |
|
|
|
|
|
Configure all options interactively or manually edit the `~/.cache/huggingface/accelerate/default_config.yaml`. |
|
|
|
|
|
--- |
|
|
|
|
|
## π§° API Highlights |
|
|
|
|
|
- `Accelerator()`: Auto-handles devices, mixed precision, gradient clipping, logging. |
|
|
- `.prepare()`: Wraps model, dataloader, optimizer for distributed training. |
|
|
- `.print()`: Replace all `print()` calls for synchronized logging. |
|
|
- `.wait_for_everyone()`: Barrier sync in multi-process setups. |
|
|
|
|
|
Example: |
|
|
|
|
|
```python |
|
|
accelerator = Accelerator() |
|
|
model, optimizer, dataloader = accelerator.prepare(model, optimizer, dataloader) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π§ DevOps / MLOps Friendly |
|
|
|
|
|
- **CI/CD ready**: Use CLI and scripting without touching training code. |
|
|
- **Built-in logging**: Compatible with π€ Hub, WandB, TensorBoard. |
|
|
- **Cloud scaling**: Easily used with SageMaker, Vertex AI, GCP, Azure. |
|
|
- **Kubernetes compatible**: Launch jobs with config-driven strategy. |
|
|
|
|
|
--- |
|
|
|
|
|
## π§© Integrates With |
|
|
|
|
|
- π€ Transformers |
|
|
- π€ Datasets |
|
|
- π€ PEFT (for LoRA / adapters) |
|
|
- π€ Diffusers |
|
|
- DeepSpeed / FSDP / Torch XLA |
|
|
- PyTorch Lightning (via wrapper) |
|
|
|
|
|
--- |
|
|
|
|
|
## π Learn More |
|
|
|
|
|
- **Docs**: [https://huggingface.co/docs/accelerate](https://huggingface.co/docs/accelerate) |
|
|
- **Course**: [Chapter 8 β Distributed Training](https://huggingface.co/course/chapter8) |
|
|
- **Blog**: [How to Train BERT with Accelerate](https://huggingface.co/blog) |
|
|
|
|
|
--- |
|
|
|
|
|
## π€ Contribute |
|
|
|
|
|
```bash |
|
|
git clone https://github.com/huggingface/accelerate |
|
|
cd accelerate |
|
|
pip install -e ".[dev]" |
|
|
``` |
|
|
|
|
|
Check issues, help improve features, or share examples for TPU/FSx setups! |
|
|
|
|
|
|
|
|
## License |
|
|
|
|
|
Accelerate is released under the Apache 2.0 License. |
|
|
|
|
|
> *Accelerate bridges the gap between single-device experimentation and full-scale model training β with zero boilerplate and maximum flexibility.* |
|
|
|
|
|
--- |
|
|
|
|
|
Made with love by [Hugging Face](https://huggingface.co) and the open-source community. |