File size: 4,843 Bytes
576f115
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
---
license: mit
---
![image.png](https://cdn-uploads.huggingface.co/production/uploads/68205a944ab07873c714ab38/JaZj4EZaOlQyoUlkhXO9o.png)  
 ⚑ Hugging Face Accelerate – Effortless Multi-GPU & Distributed Training
---
Welcome to **πŸ€— Accelerate**, your lightweight, zero-boilerplate training tool for **PyTorch** and **Transformers**. Whether you're working on a single laptop, multi-GPU setup, or a large-scale cloud cluster, Accelerate abstracts away the complexity of hardware and distributed systems β€” so you can focus on building great models.

---

## ✨ Why Use Accelerate?

- **Single-to-Multi GPU/TPU** in *one line*.
- **No code refactor** – use the same script everywhere.
- **Launch-ready**: Easily scale training jobs from dev to prod.
- **Supports DeepSpeed, FSDP, TPU, Multi-node**.
- **Compatible with πŸ€— Transformers, Datasets, PEFT**, and more.
- **Built-in CLI** for quick configuration and debugging.

> Accelerate is perfect for DevOps, MLOps, and Full Stack AI teams looking to scale training workloads without managing deep infrastructure internals.

---

## βš™οΈ Installation

```bash
pip install accelerate
```

Optional: For DeepSpeed, TPU, FSDP, and other accelerators:

```bash
pip install "accelerate[deepspeed]"
pip install "accelerate[torch_xla]"
pip install "accelerate[fsdp]"
```

---

## πŸš€ Quick Start

### Step 1: Configure

```bash
accelerate config
```

You'll be guided through an interactive setup (or use `accelerate config default` to auto-generate).

### Step 2: Launch your script

```bash
accelerate launch train.py
```

Accelerate automatically applies device mapping, DDP/FSDP strategy, gradient accumulation, and more β€” based on your config.

---

## πŸ§ͺ Example: Training a Transformer at Scale

**train.py**

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset
from accelerate import Accelerator

accelerator = Accelerator()
dataset = load_dataset("imdb")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

def tokenize(batch):
    return tokenizer(batch["text"], padding=True, truncation=True)

tokenized = dataset.map(tokenize, batched=True)
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")

training_args = TrainingArguments(
    output_dir="output",
    per_device_train_batch_size=8,
    evaluation_strategy="epoch",
    num_train_epochs=1,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized["train"].shuffle(seed=42).select(range(2000)),
    eval_dataset=tokenized["test"].select(range(500)),
)

trainer.train()
```

Then just run:

```bash
accelerate launch train.py
```

---

## βš™οΈ Advanced Use Cases

Accelerate supports:

- **DeepSpeed**: ZeRO offloading, memory savings
- **FSDP**: Fine-grained model sharding
- **TPUs**: Train on TPU cores seamlessly
- **Multi-node/multi-GPU**: via SLURM or CLI

Configure all options interactively or manually edit the `~/.cache/huggingface/accelerate/default_config.yaml`.

---

## 🧰 API Highlights

- `Accelerator()`: Auto-handles devices, mixed precision, gradient clipping, logging.
- `.prepare()`: Wraps model, dataloader, optimizer for distributed training.
- `.print()`: Replace all `print()` calls for synchronized logging.
- `.wait_for_everyone()`: Barrier sync in multi-process setups.

Example:

```python
accelerator = Accelerator()
model, optimizer, dataloader = accelerator.prepare(model, optimizer, dataloader)
```

---

## 🧠 DevOps / MLOps Friendly

- **CI/CD ready**: Use CLI and scripting without touching training code.
- **Built-in logging**: Compatible with πŸ€— Hub, WandB, TensorBoard.
- **Cloud scaling**: Easily used with SageMaker, Vertex AI, GCP, Azure.
- **Kubernetes compatible**: Launch jobs with config-driven strategy.

---

## 🧩 Integrates With

- πŸ€— Transformers
- πŸ€— Datasets
- πŸ€— PEFT (for LoRA / adapters)
- πŸ€— Diffusers
- DeepSpeed / FSDP / Torch XLA
- PyTorch Lightning (via wrapper)

---

## πŸ“š Learn More

- **Docs**: [https://huggingface.co/docs/accelerate](https://huggingface.co/docs/accelerate)
- **Course**: [Chapter 8 – Distributed Training](https://huggingface.co/course/chapter8)
- **Blog**: [How to Train BERT with Accelerate](https://huggingface.co/blog)

---

## 🀝 Contribute

```bash
git clone https://github.com/huggingface/accelerate
cd accelerate
pip install -e ".[dev]"
```

Check issues, help improve features, or share examples for TPU/FSx setups!


## License

Accelerate is released under the Apache 2.0 License.

> *Accelerate bridges the gap between single-device experimentation and full-scale model training β€” with zero boilerplate and maximum flexibility.*

---

Made with love by [Hugging Face](https://huggingface.co) and the open-source community.