Spaces:
Running on Zero
Running on Zero
| dist\_checkpointing package | |
| =========================== | |
| A library for saving and loading the distributed checkpoints. | |
| A "distributed checkpoint" can have various underlying formats (current default format is based on Zarr) | |
| but has a distinctive property - the checkpoint saved in one parallel configuration (tensor/pipeline/data parallelism) | |
| can be loaded in a different parallel configuration. | |
| Using the library requires defining sharded state_dict dictionaries with functions from *mapping* and *optimizer* modules. | |
| Those state dicts can be saved or loaded with a *serialization* module using strategies from *strategies* module. | |
| Safe Checkpoint Loading | |
| ----------------------- | |
| Since **PyTorch 2.6**, the default behavior of `torch.load` is `weights_only=True`. | |
| This ensures that only tensors and allow-listed classes are loaded, reducing the risk of arbitrary code execution. | |
| If you encounter an error such as: | |
| .. code-block:: bash | |
| WeightsUnpickler error: Unsupported global: GLOBAL argparse.Namespace was not an allowed global by default. | |
| you can fix it by explicitly allow-listing the missing class in your script: | |
| .. code-block:: python | |
| import torch, argparse | |
| torch.serialization.add_safe_globals([argparse.Namespace]) | |
| Subpackages | |
| ----------- | |
| .. toctree:: | |
| :maxdepth: 4 | |
| dist_checkpointing.strategies | |
| Submodules | |
| ---------- | |
| dist\_checkpointing.serialization module | |
| ---------------------------------------- | |
| .. automodule:: core.dist_checkpointing.serialization | |
| :members: | |
| :undoc-members: | |
| :show-inheritance: | |
| dist\_checkpointing.mapping module | |
| ---------------------------------- | |
| .. automodule:: core.dist_checkpointing.mapping | |
| :members: | |
| :undoc-members: | |
| :show-inheritance: | |
| dist\_checkpointing.optimizer module | |
| ------------------------------------ | |
| .. automodule:: core.dist_checkpointing.optimizer | |
| :members: | |
| :undoc-members: | |
| :show-inheritance: | |
| dist\_checkpointing.core module | |
| ------------------------------- | |
| .. automodule:: core.dist_checkpointing.core | |
| :members: | |
| :undoc-members: | |
| :show-inheritance: | |
| dist\_checkpointing.dict\_utils module | |
| -------------------------------------- | |
| .. automodule:: core.dist_checkpointing.dict_utils | |
| :members: | |
| :undoc-members: | |
| :show-inheritance: | |
| dist\_checkpointing.utils module | |
| -------------------------------- | |
| .. automodule:: core.dist_checkpointing.utils | |
| :members: | |
| :undoc-members: | |
| :show-inheritance: | |
| Module contents | |
| --------------- | |
| .. automodule:: core.dist_checkpointing | |
| :members: | |
| :undoc-members: | |
| :show-inheritance: | |