Buckets:
| # Fully Sharded Data Parallel utilities | |
| ## enable_fsdp_ram_efficient_loading[[accelerate.utils.enable_fsdp_ram_efficient_loading]] | |
| #### accelerate.utils.enable_fsdp_ram_efficient_loading[[accelerate.utils.enable_fsdp_ram_efficient_loading]] | |
| [Source](https://github.com/huggingface/accelerate/blob/vr_4021/src/accelerate/utils/fsdp_utils.py#L39) | |
| Enables RAM efficient loading of Hugging Face models for FSDP in the environment. | |
| ## disable_fsdp_ram_efficient_loading[[accelerate.utils.disable_fsdp_ram_efficient_loading]] | |
| #### accelerate.utils.disable_fsdp_ram_efficient_loading[[accelerate.utils.disable_fsdp_ram_efficient_loading]] | |
| [Source](https://github.com/huggingface/accelerate/blob/vr_4021/src/accelerate/utils/fsdp_utils.py#L49) | |
| Disables RAM efficient loading of Hugging Face models for FSDP in the environment. | |
| ## merge_fsdp_weights[[accelerate.utils.merge_fsdp_weights]] | |
| #### accelerate.utils.merge_fsdp_weights[[accelerate.utils.merge_fsdp_weights]] | |
| [Source](https://github.com/huggingface/accelerate/blob/vr_4021/src/accelerate/utils/fsdp_utils.py#L366) | |
| Merge the weights from sharded FSDP model checkpoints into a single combined checkpoint. Should be used if | |
| `SHARDED_STATE_DICT` was used for the model. Weights will be saved to `{output_path}/model.safetensors` if | |
| `safe_serialization` else `pytorch_model.bin`. | |
| Note: this is a CPU-bound process. | |
| **Parameters:** | |
| checkpoint_dir (`str`) : The directory containing the FSDP checkpoints (can be either the model or optimizer). | |
| output_path (`str`) : The path to save the merged checkpoint. | |
| safe_serialization (`bool`, *optional*, defaults to `True`) : Whether to save the merged weights with safetensors (recommended). | |
| remove_checkpoint_dir (`bool`, *optional*, defaults to `False`) : Whether to remove the checkpoint directory after merging. | |
| ## FullyShardedDataParallelPlugin[[accelerate.FullyShardedDataParallelPlugin]] | |
| #### accelerate.FullyShardedDataParallelPlugin[[accelerate.FullyShardedDataParallelPlugin]] | |
| [Source](https://github.com/huggingface/accelerate/blob/vr_4021/src/accelerate/utils/dataclasses.py#L1571) | |
| This plugin is used to enable fully sharded data parallelism. | |
| set_auto_wrap_policyaccelerate.FullyShardedDataParallelPlugin.set_auto_wrap_policyhttps://github.com/huggingface/accelerate/blob/vr_4021/src/accelerate/utils/dataclasses.py#L2041[{"name": "model", "val": ""}] | |
| Given `model`, creates an `auto_wrap_policy` based on the passed in policy and if we can use the | |
| `transformer_cls_to_wrap` | |
| **Parameters:** | |
| fsdp_version (`int`, defaults to `1`) : The version of FSDP to use. Defaults to 1. If set to 2, launcher expects the config to be converted to FSDP2 format. | |
| sharding_strategy (`Union[str, torch.distributed.fsdp.ShardingStrategy]`, defaults to `'FULL_SHARD'`) : Sharding strategy to use. Should be either a `str` or an instance of `torch.distributed.fsdp.fully_sharded_data_parallel.ShardingStrategy`. Is deprecated in favor of `reshard_after_forward`. | |
| reshard_after_forward (`Union[str, torch.distributed.fsdp.ShardingStrategy, bool]`, defaults to `'FULL_SHARD'` for `fsdp_version=1` and `True` for `fsdp_version=2`) : Sharding strategy to use. Should be a bool if `fsdp_version` is set to 2 else a `str` or an instance of `torch.distributed.fsdp.fully_sharded_data_parallel.ShardingStrategy`. | |
| backward_prefetch (`Union[str, torch.distributed.fsdp.BackwardPrefetch]`, defaults to `'NO_PREFETCH'`) : Backward prefetch strategy to use. Should be either a `str` or an instance of `torch.distributed.fsdp.fully_sharded_data_parallel.BackwardPrefetch`. | |
| mixed_precision_policy (`Optional[Union[dict, str, torch.distributed.fsdp.MixedPrecision, torch.distributed.fsdp.MixedPrecisionPolicy]]`, defaults to `None`) : A config to enable mixed precision training with FullyShardedDataParallel. If passing in a `dict`, it should have the following keys: `param_dtype`, `reduce_dtype`, and `buffer_dtype`, can be an instance of `torch.distributed.fsdp.MixedPrecisionPolicy` if `fsdp_version` is set to 2. If passing in a `str`, it should be one of the following values: fp8, fp16, bf16, fp32, and used to set `param_dtype`, `reduce_dtype`, and `buffer_dtype`. | |
| auto_wrap_policy (`Optional(Union[Callable, Literal["transformer_based_wrap", "size_based_wrap", "no_wrap"]]), defaults to `NO_WRAP`) : A callable or string specifying a policy to recursively wrap layers with FSDP. If a string, it must be one of `transformer_based_wrap`, `size_based_wrap`, or `no_wrap`. See `torch.distributed.fsdp.wrap.size_based_wrap_policy` for a direction on what it should look like. | |
| cpu_offload (`Union[bool, torch.distributed.fsdp.CPUOffload, torch.distributed.fsdp.CPUOffloadPolicy]`, defaults to `False`) : Whether to offload parameters to CPU. Should be either a `bool` or an instance of `torch.distributed.fsdp.fully_sharded_data_parallel.CPUOffload` or `torch.distributed.fsdp.fully_sharded_data_parallel.CPUOffloadPolicy` if `fsdp_version` is set to 2. | |
| ignored_modules (`Optional[Union[Iterable[torch.nn.Module], str]]`, defaults to `None`) : A list of modules to ignore when wrapping with FSDP. When passing a string, will match the modules by name using regex fullmatch. If `fsdp_version` is set to 2, the modules are converted to parameters and used. | |
| state_dict_type (`Union[str, torch.distributed.fsdp.StateDictType]`, defaults to `'FULL_STATE_DICT'`) : State dict type to use. If a string, it must be one of `full_state_dict`, `local_state_dict`, or `sharded_state_dict`. | |
| state_dict_config (`Optional[Union[torch.distributed.fsdp.FullStateDictConfig, torch.distributed.fsdp.ShardedStateDictConfig]`, defaults to `None`) : State dict config to use. Is determined based on the `state_dict_type` if not passed in. | |
| optim_state_dict_config (`Optional[Union[torch.distributed.fsdp.FullOptimStateDictConfig, torch.distributed.fsdp.ShardedOptimStateDictConfig]`, defaults to `None`) : Optim state dict config to use. Is determined based on the `state_dict_type` if not passed in. | |
| limit_all_gathers (`bool`, defaults to `True`) : Whether to have FSDP explicitly synchronizes the CPU thread to prevent too many in-flight all-gathers. This bool only affects the sharded strategies that schedule all-gathers. Enabling this can help lower the number of CUDA malloc retries. | |
| use_orig_params (`bool`, defaults to `False`) : Whether to use the original parameters for the optimizer. | |
| param_init_fn (`Optional[Callable[[torch.nn.Module], None]`, defaults to `None`) : A `Callable[torch.nn.Module] -> None` that specifies how modules that are currently on the meta device should be initialized onto an actual device. Only applicable when `sync_module_states` is `True`. By default is a `lambda` which calls `to_empty` on the module. | |
| sync_module_states (`bool`, defaults to `False`) : Whether each individually wrapped FSDP unit should broadcast module parameters from rank 0 to ensure they are the same across all ranks after initialization. Defaults to `False` unless `cpu_ram_efficient_loading` is `True`, then will be forcibly enabled. | |
| forward_prefetch (`bool`, defaults to `False`) : Whether to have FSDP explicitly prefetches the next upcoming all-gather while executing in the forward pass. only use with Static graphs. | |
| activation_checkpointing (`bool`, defaults to `False`) : A technique to reduce memory usage by clearing activations of certain layers and recomputing them during a backward pass. Effectively, this trades extra computation time for reduced memory usage. | |
| cpu_ram_efficient_loading (`bool`, defaults to `None`) : If True, only the first process loads the pretrained model checkoint while all other processes have empty weights. Only applicable for Transformers. When using this, `sync_module_states` needs to be `True`. | |
| transformer_cls_names_to_wrap (`Optional[List[str]]`, defaults to `None`) : A list of transformer layer class names to wrap. Only applicable when `auto_wrap_policy` is `transformer_based_wrap`. | |
| min_num_params (`Optional[int]`, defaults to `None`) : The minimum number of parameters a module must have to be wrapped. Only applicable when `auto_wrap_policy` is `size_based_wrap`. | |
| #### set_mixed_precision[[accelerate.FullyShardedDataParallelPlugin.set_mixed_precision]] | |
| [Source](https://github.com/huggingface/accelerate/blob/vr_4021/src/accelerate/utils/dataclasses.py#L2075) | |
| Sets the mixed precision policy for FSDP | |
| #### set_state_dict_type[[accelerate.FullyShardedDataParallelPlugin.set_state_dict_type]] | |
| [Source](https://github.com/huggingface/accelerate/blob/vr_4021/src/accelerate/utils/dataclasses.py#L1996) | |
| Set the state dict config based on the `StateDictType`. | |
| #### validate_mixed_precision_policy[[accelerate.FullyShardedDataParallelPlugin.validate_mixed_precision_policy]] | |
| [Source](https://github.com/huggingface/accelerate/blob/vr_4021/src/accelerate/utils/dataclasses.py#L2127) | |
| Validates the mixed precision policy, abstracted away to not bring in the imports if not needed. | |
| ## fsdp2_load_full_state_dict[[accelerate.utils.fsdp2_load_full_state_dict]] | |
| #### accelerate.utils.fsdp2_load_full_state_dict[[accelerate.utils.fsdp2_load_full_state_dict]] | |
| [Source](https://github.com/huggingface/accelerate/blob/vr_4021/src/accelerate/utils/fsdp_utils.py#L467) | |
| Loads the full state dict (could be only on rank 0) into the sharded model. This is done by broadcasting the | |
| parameters from rank 0 to all other ranks. This function modifies the model in-place. | |
| **Parameters:** | |
| accelerator (`Accelerator`) : The accelerator instance | |
| model (`torch.nn.Module`) : The model to load the state dict into, expected to be on meta device or a VRAM spike can occur | |
| full_sd (`dict`) : The full state dict to load, can only be on rank 0 | |
| cpu_offload (`bool`, defaults to `False`) : If True, move sharded parameters to CPU after distribution. Required when FSDP CPU offloading is enabled. | |
| ## fsdp2_switch_optimizer_parameters[[accelerate.utils.fsdp2_switch_optimizer_parameters]] | |
| #### accelerate.utils.fsdp2_switch_optimizer_parameters[[accelerate.utils.fsdp2_switch_optimizer_parameters]] | |
| [Source](https://github.com/huggingface/accelerate/blob/vr_4021/src/accelerate/utils/fsdp_utils.py#L563) | |
| Switches the parameters of the optimizer to new ones (sharded parameters in usual case). This function modifies the | |
| optimizer in-place. | |
| **Parameters:** | |
| optimizer (`torch.optim.Optimizer`) : Optimizer instance which contains the original model parameters | |
| mapping (`dict`) : Mapping from the original parameter (specified by `data_ptr`) to the sharded parameter | |
| ## fsdp2_prepare_model[[accelerate.utils.fsdp2_prepare_model]] | |
| #### accelerate.utils.fsdp2_prepare_model[[accelerate.utils.fsdp2_prepare_model]] | |
| [Source](https://github.com/huggingface/accelerate/blob/vr_4021/src/accelerate/utils/fsdp_utils.py#L645) | |
| Prepares the model for FSDP2 in-place. Also returns the model to avoid misuse of the original model. | |
| **Parameters:** | |
| accelerator (`Accelerator`) : The accelerator instance | |
| model (`torch.nn.Module`) : The model to prepare | |
| **Returns:** | |
| ``torch.nn.Module`` | |
| Prepared model | |
| ## fsdp2_prepare_auto_wrap_policy | |
Xet Storage Details
- Size:
- 11.1 kB
- Xet hash:
- 91efc6e6e17ea74602cc6786ee46fc2243a17e59fde21554af257b2f29b373a2
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.