--- license: mit --- ![image.png](https://cdn-uploads.huggingface.co/production/uploads/68205a944ab07873c714ab38/JaZj4EZaOlQyoUlkhXO9o.png) โšก Hugging Face Accelerate โ€“ Effortless Multi-GPU & Distributed Training --- Welcome to **๐Ÿค— Accelerate**, your lightweight, zero-boilerplate training tool for **PyTorch** and **Transformers**. Whether you're working on a single laptop, multi-GPU setup, or a large-scale cloud cluster, Accelerate abstracts away the complexity of hardware and distributed systems โ€” so you can focus on building great models. --- ## โœจ Why Use Accelerate? - **Single-to-Multi GPU/TPU** in *one line*. - **No code refactor** โ€“ use the same script everywhere. - **Launch-ready**: Easily scale training jobs from dev to prod. - **Supports DeepSpeed, FSDP, TPU, Multi-node**. - **Compatible with ๐Ÿค— Transformers, Datasets, PEFT**, and more. - **Built-in CLI** for quick configuration and debugging. > Accelerate is perfect for DevOps, MLOps, and Full Stack AI teams looking to scale training workloads without managing deep infrastructure internals. --- ## โš™๏ธ Installation ```bash pip install accelerate ``` Optional: For DeepSpeed, TPU, FSDP, and other accelerators: ```bash pip install "accelerate[deepspeed]" pip install "accelerate[torch_xla]" pip install "accelerate[fsdp]" ``` --- ## ๐Ÿš€ Quick Start ### Step 1: Configure ```bash accelerate config ``` You'll be guided through an interactive setup (or use `accelerate config default` to auto-generate). ### Step 2: Launch your script ```bash accelerate launch train.py ``` Accelerate automatically applies device mapping, DDP/FSDP strategy, gradient accumulation, and more โ€” based on your config. --- ## ๐Ÿงช Example: Training a Transformer at Scale **train.py** ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments from datasets import load_dataset from accelerate import Accelerator accelerator = Accelerator() dataset = load_dataset("imdb") tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") def tokenize(batch): return tokenizer(batch["text"], padding=True, truncation=True) tokenized = dataset.map(tokenize, batched=True) model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased") training_args = TrainingArguments( output_dir="output", per_device_train_batch_size=8, evaluation_strategy="epoch", num_train_epochs=1, ) trainer = Trainer( model=model, args=training_args, train_dataset=tokenized["train"].shuffle(seed=42).select(range(2000)), eval_dataset=tokenized["test"].select(range(500)), ) trainer.train() ``` Then just run: ```bash accelerate launch train.py ``` --- ## โš™๏ธ Advanced Use Cases Accelerate supports: - **DeepSpeed**: ZeRO offloading, memory savings - **FSDP**: Fine-grained model sharding - **TPUs**: Train on TPU cores seamlessly - **Multi-node/multi-GPU**: via SLURM or CLI Configure all options interactively or manually edit the `~/.cache/huggingface/accelerate/default_config.yaml`. --- ## ๐Ÿงฐ API Highlights - `Accelerator()`: Auto-handles devices, mixed precision, gradient clipping, logging. - `.prepare()`: Wraps model, dataloader, optimizer for distributed training. - `.print()`: Replace all `print()` calls for synchronized logging. - `.wait_for_everyone()`: Barrier sync in multi-process setups. Example: ```python accelerator = Accelerator() model, optimizer, dataloader = accelerator.prepare(model, optimizer, dataloader) ``` --- ## ๐Ÿง  DevOps / MLOps Friendly - **CI/CD ready**: Use CLI and scripting without touching training code. - **Built-in logging**: Compatible with ๐Ÿค— Hub, WandB, TensorBoard. - **Cloud scaling**: Easily used with SageMaker, Vertex AI, GCP, Azure. - **Kubernetes compatible**: Launch jobs with config-driven strategy. --- ## ๐Ÿงฉ Integrates With - ๐Ÿค— Transformers - ๐Ÿค— Datasets - ๐Ÿค— PEFT (for LoRA / adapters) - ๐Ÿค— Diffusers - DeepSpeed / FSDP / Torch XLA - PyTorch Lightning (via wrapper) --- ## ๐Ÿ“š Learn More - **Docs**: [https://huggingface.co/docs/accelerate](https://huggingface.co/docs/accelerate) - **Course**: [Chapter 8 โ€“ Distributed Training](https://huggingface.co/course/chapter8) - **Blog**: [How to Train BERT with Accelerate](https://huggingface.co/blog) --- ## ๐Ÿค Contribute ```bash git clone https://github.com/huggingface/accelerate cd accelerate pip install -e ".[dev]" ``` Check issues, help improve features, or share examples for TPU/FSx setups! ## License Accelerate is released under the Apache 2.0 License. > *Accelerate bridges the gap between single-device experimentation and full-scale model training โ€” with zero boilerplate and maximum flexibility.* --- Made with love by [Hugging Face](https://huggingface.co) and the open-source community.