Upload folder using huggingface_hub

88e6849 verified about 1 month ago

497 Bytes

	# distributed package

	This package contains various utilities to finalize model weight gradients
	on each rank before the optimizer step. This includes a distributed data
	parallelism wrapper to all-reduce or reduce-scatter the gradients across
	data-parallel replicas, and a `finalize_model_grads` method to
	synchronize gradients across different parallelism modes (e.g., 'tied'
	layers on different pipeline stages, or gradients for experts in a MoE on
	different ranks due to expert parallelism).