Buckets:
DeepSpeed utilities
DeepSpeedPlugin
get_active_deepspeed_plugin[[accelerate.utils.get_active_deepspeed_plugin]]
accelerate.utils.get_active_deepspeed_plugin[[accelerate.utils.get_active_deepspeed_plugin]]
Returns the currently active DeepSpeedPlugin.
accelerate.DeepSpeedPlugin[[accelerate.DeepSpeedPlugin]]
This plugin is used to integrate DeepSpeed.
deepspeed_config_processaccelerate.DeepSpeedPlugin.deepspeed_config_processhttps://github.com/huggingface/accelerate/blob/vr_4021/src/accelerate/utils/dataclasses.py#L1388[{"name": "prefix", "val": " = ''"}, {"name": "mismatches", "val": " = None"}, {"name": "config", "val": " = None"}, {"name": "must_match", "val": " = True"}, {"name": "**kwargs", "val": ""}] Process the DeepSpeed config with the values from the kwargs.
Parameters:
hf_ds_config (Any, defaults to None) : Path to DeepSpeed config file or dict or an object of class accelerate.utils.deepspeed.HfDeepSpeedConfig.
gradient_accumulation_steps (int, defaults to None) : Number of steps to accumulate gradients before updating optimizer states. If not set, will use the value from the Accelerator directly.
gradient_clipping (float, defaults to None) : Enable gradient clipping with value.
zero_stage (int, defaults to None) : Possible options are 0, 1, 2, 3. Default will be taken from environment variable.
is_train_batch_min (bool, defaults to True) : If both train & eval dataloaders are specified, this will decide the train_batch_size.
offload_optimizer_device (str, defaults to None) : Possible options are none|cpu|nvme. Only applicable with ZeRO Stages 2 and 3.
offload_param_device (str, defaults to None) : Possible options are none|cpu|nvme. Only applicable with ZeRO Stage 3.
offload_optimizer_nvme_path (str, defaults to None) : Possible options are /nvme|/local_nvme. Only applicable with ZeRO Stage 3.
offload_param_nvme_path (str, defaults to None) : Possible options are /nvme|/local_nvme. Only applicable with ZeRO Stage 3.
zero3_init_flag (bool, defaults to None) : Flag to indicate whether to save 16-bit model. Only applicable with ZeRO Stage-3.
zero3_save_16bit_model (bool, defaults to None) : Flag to indicate whether to save 16-bit model. Only applicable with ZeRO Stage-3.
transformer_moe_cls_names (str, defaults to None) : Comma-separated list of Transformers MoE layer class names (case-sensitive). For example, MixtralSparseMoeBlock, Qwen2MoeSparseMoeBlock, JetMoEAttention, JetMoEBlock, etc.
enable_msamp (bool, defaults to None) : Flag to indicate whether to enable MS-AMP backend for FP8 training.
msasmp_opt_level (Optional[Literal["O1", "O2"]], defaults to None) : Optimization level for MS-AMP (defaults to 'O1'). Only applicable if enable_msamp is True. Should be one of ['O1' or 'O2'].
select[[accelerate.DeepSpeedPlugin.select]]
Sets the HfDeepSpeedWeakref to use the current deepspeed plugin configuration
accelerate.utils.DummyScheduler[[accelerate.utils.DummyScheduler]]
Dummy scheduler presents model parameters or param groups, this is primarily used to follow conventional training loop when scheduler config is specified in the deepspeed config file.
Parameters:
optimizer (torch.optim.optimizer.Optimizer) : The optimizer to wrap.
total_num_steps (int, optional) : Total number of steps.
warmup_num_steps (int, optional) : Number of steps for warmup.
lr_scheduler_callable (callable, optional) : A callable function that creates an LR Scheduler. It accepts only one argument optimizer.
- **kwargs (additional keyword arguments, optional) : Other arguments.
DeepSpeedEnginerWrapper[[accelerate.utils.DeepSpeedEngineWrapper]]
accelerate.utils.DeepSpeedEngineWrapper[[accelerate.utils.DeepSpeedEngineWrapper]]
Internal wrapper for deepspeed.runtime.engine.DeepSpeedEngine. This is used to follow conventional training loop.
get_global_grad_normaccelerate.utils.DeepSpeedEngineWrapper.get_global_grad_normhttps://github.com/huggingface/accelerate/blob/vr_4021/src/accelerate/utils/deepspeed.py#L286[] Get the global gradient norm from DeepSpeed engine.
Parameters:
engine (deepspeed.runtime.engine.DeepSpeedEngine) : deepspeed engine to wrap
DeepSpeedOptimizerWrapper[[accelerate.utils.DeepSpeedOptimizerWrapper]]
accelerate.utils.DeepSpeedOptimizerWrapper[[accelerate.utils.DeepSpeedOptimizerWrapper]]
Internal wrapper around a deepspeed optimizer.
Parameters:
optimizer (torch.optim.optimizer.Optimizer) : The optimizer to wrap.
DeepSpeedSchedulerWrapper[[accelerate.utils.DeepSpeedSchedulerWrapper]]
accelerate.utils.DeepSpeedSchedulerWrapper[[accelerate.utils.DeepSpeedSchedulerWrapper]]
Internal wrapper around a deepspeed scheduler.
Parameters:
scheduler (torch.optim.lr_scheduler.LambdaLR) : The scheduler to wrap.
optimizers (one or a list of torch.optim.Optimizer) --
DummyOptim[[accelerate.utils.DummyOptim]]
accelerate.utils.DummyOptim[[accelerate.utils.DummyOptim]]
Dummy optimizer presents model parameters or param groups, this is primarily used to follow conventional training loop when optimizer config is specified in the deepspeed config file.
Parameters:
lr (float) : Learning rate.
params (iterable) : iterable of parameters to optimize or dicts defining parameter groups
weight_decay (float) : Weight decay.
- **kwargs (additional keyword arguments, optional) : Other arguments.
DummyScheduler
Xet Storage Details
- Size:
- 6.37 kB
- Xet hash:
- bdc28784bf77407ccddb2971ea0a4310d295aec5857c69d7e2b7773a78f5b0a4
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.