File size: 4,843 Bytes

576f115

---
license: mit
---
![image.png](https://cdn-uploads.huggingface.co/production/uploads/68205a944ab07873c714ab38/JaZj4EZaOlQyoUlkhXO9o.png)  
 ⚡ Hugging Face Accelerate – Effortless Multi-GPU & Distributed Training
---
Welcome to **🤗 Accelerate**, your lightweight, zero-boilerplate training tool for **PyTorch** and **Transformers**. Whether you're working on a single laptop, multi-GPU setup, or a large-scale cloud cluster, Accelerate abstracts away the complexity of hardware and distributed systems — so you can focus on building great models.

---

## ✨ Why Use Accelerate?

- **Single-to-Multi GPU/TPU** in *one line*.
- **No code refactor** – use the same script everywhere.
- **Launch-ready**: Easily scale training jobs from dev to prod.
- **Supports DeepSpeed, FSDP, TPU, Multi-node**.
- **Compatible with 🤗 Transformers, Datasets, PEFT**, and more.
- **Built-in CLI** for quick configuration and debugging.

> Accelerate is perfect for DevOps, MLOps, and Full Stack AI teams looking to scale training workloads without managing deep infrastructure internals.

---

## ⚙️ Installation

```bash
pip install accelerate
```

Optional: For DeepSpeed, TPU, FSDP, and other accelerators:

```bash
pip install "accelerate[deepspeed]"
pip install "accelerate[torch_xla]"
pip install "accelerate[fsdp]"
```

---

## 🚀 Quick Start

### Step 1: Configure

```bash
accelerate config
```

You'll be guided through an interactive setup (or use `accelerate config default` to auto-generate).

### Step 2: Launch your script

```bash
accelerate launch train.py
```

Accelerate automatically applies device mapping, DDP/FSDP strategy, gradient accumulation, and more — based on your config.

---

## 🧪 Example: Training a Transformer at Scale

**train.py**

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset
from accelerate import Accelerator

accelerator = Accelerator()
dataset = load_dataset("imdb")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

def tokenize(batch):
    return tokenizer(batch["text"], padding=True, truncation=True)

tokenized = dataset.map(tokenize, batched=True)
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")

training_args = TrainingArguments(
    output_dir="output",
    per_device_train_batch_size=8,
    evaluation_strategy="epoch",
    num_train_epochs=1,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized["train"].shuffle(seed=42).select(range(2000)),
    eval_dataset=tokenized["test"].select(range(500)),
)

trainer.train()
```

Then just run:

```bash
accelerate launch train.py
```

---

## ⚙️ Advanced Use Cases

Accelerate supports:

- **DeepSpeed**: ZeRO offloading, memory savings
- **FSDP**: Fine-grained model sharding
- **TPUs**: Train on TPU cores seamlessly
- **Multi-node/multi-GPU**: via SLURM or CLI

Configure all options interactively or manually edit the `~/.cache/huggingface/accelerate/default_config.yaml`.

---

## 🧰 API Highlights

- `Accelerator()`: Auto-handles devices, mixed precision, gradient clipping, logging.
- `.prepare()`: Wraps model, dataloader, optimizer for distributed training.
- `.print()`: Replace all `print()` calls for synchronized logging.
- `.wait_for_everyone()`: Barrier sync in multi-process setups.

Example:

```python
accelerator = Accelerator()
model, optimizer, dataloader = accelerator.prepare(model, optimizer, dataloader)
```

---

## 🧠 DevOps / MLOps Friendly

- **CI/CD ready**: Use CLI and scripting without touching training code.
- **Built-in logging**: Compatible with 🤗 Hub, WandB, TensorBoard.
- **Cloud scaling**: Easily used with SageMaker, Vertex AI, GCP, Azure.
- **Kubernetes compatible**: Launch jobs with config-driven strategy.

---

## 🧩 Integrates With

- 🤗 Transformers
- 🤗 Datasets
- 🤗 PEFT (for LoRA / adapters)
- 🤗 Diffusers
- DeepSpeed / FSDP / Torch XLA
- PyTorch Lightning (via wrapper)

---

## 📚 Learn More

- **Docs**: [https://huggingface.co/docs/accelerate](https://huggingface.co/docs/accelerate)
- **Course**: [Chapter 8 – Distributed Training](https://huggingface.co/course/chapter8)
- **Blog**: [How to Train BERT with Accelerate](https://huggingface.co/blog)

---

## 🤝 Contribute

```bash
git clone https://github.com/huggingface/accelerate
cd accelerate
pip install -e ".[dev]"
```

Check issues, help improve features, or share examples for TPU/FSx setups!


## License

Accelerate is released under the Apache 2.0 License.

> *Accelerate bridges the gap between single-device experimentation and full-scale model training — with zero boilerplate and maximum flexibility.*

---

Made with love by [Hugging Face](https://huggingface.co) and the open-source community.