Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / transformers /pr_26617 /en /expert_parallelism.md

rtrm

about 2 months ago

preview code

download

raw

1.8 kB

	# Expert parallelism

	[Expert parallelism](https://huggingface.co/spaces/nanotron/ultrascale-playbook?section=expert_parallelism) is a parallelism strategy for [mixture-of-experts (MoE) models](https://huggingface.co/blog/moe). Each expert's feedforward layer lives on a different hardware accelerator. A router dispatches tokens to the appropriate experts and gathers the results. This approach scales models to far larger parameter counts without increasing computation cost because each token activates only a few experts.

	## DistributedConfig

	> [!WARNING]
	> The `DistributedConfig` API is experimental and its usage may change in the future.

	Enable expert parallelism with the `DistributedConfig` class and the `enable_expert_parallel` argument.

	```py
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from transformers.distributed.configuration_utils import DistributedConfig

	distributed_config = DistributedConfig(enable_expert_parallel=True)

	model = AutoModelForCausalLM.from_pretrained(
	"openai/gpt-oss-120b",
	dtype="auto",
	distributed_config=distributed_config,
	)
	```

	> [!TIP]
	> Expert parallelism automatically enables [tensor parallelism](./perf_infer_gpu_multi) for attention layers.

	This argument switches to the `ep_plan` (expert parallel plan) defined in each MoE model's config file. The `GroupedGemmParallel` class splits expert weights so each device loads only its local experts. The `ep_router` routes tokens to experts and an all-reduce operation combines their outputs.

	Launch your inference script with [torchrun](https://pytorch.org/docs/stable/elastic/run.html) and specify how many devices to use. The number of devices must evenly divide the total number of experts.

	```zsh
	torchrun --nproc-per-node 8 your_script.py
	```

Xet Storage Details

Size:: 1.8 kB
Xet hash:: 6b12639622f2590c7aa8f7e57e886b80da6d9edf4765eedc04f0d19a4196bf36

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.