Buckets:
| # Launchers | |
| Functions for launching training on distributed processes. | |
| ## notebook_launcher[[accelerate.notebook_launcher]] | |
| #### accelerate.notebook_launcher[[accelerate.notebook_launcher]] | |
| [Source](https://github.com/huggingface/accelerate/blob/vr_4021/src/accelerate/launchers.py#L41) | |
| Launches a training function, using several processes or multiple nodes if it's possible in the current environment | |
| (TPU with multiple cores for instance). | |
| To use this function absolutely zero calls to a device must be made in the notebook session before calling. If any | |
| have been made, you will need to restart the notebook and make sure no cells use any device capability. | |
| Setting `ACCELERATE_DEBUG_MODE="1"` in your environment will run a test before truly launching to ensure that none | |
| of those calls have been made. | |
| Example: | |
| ```python | |
| # Assume this is defined in a Jupyter Notebook on an instance with two devices | |
| from accelerate import notebook_launcher | |
| def train(*args): | |
| # Your training function here | |
| ... | |
| notebook_launcher(train, args=(arg1, arg2), num_processes=2, mixed_precision="fp16") | |
| ``` | |
| **Parameters:** | |
| function (`Callable`) : The training function to execute. If it accepts arguments, the first argument should be the index of the process run. | |
| args (`Tuple`) : Tuple of arguments to pass to the function (it will receive `*args`). | |
| num_processes (`int`, *optional*) : The number of processes to use for training. Will default to 8 in Colab/Kaggle if a TPU is available, to the number of devices available otherwise. | |
| mixed_precision (`str`, *optional*, defaults to `"no"`) : If `fp16` or `bf16`, will use mixed precision training on multi-device. | |
| use_port (`str`, *optional*, defaults to `"29500"`) : The port to use to communicate between processes when launching a multi-device training. | |
| master_addr (`str`, *optional*, defaults to `"127.0.0.1"`) : The address to use for communication between processes. | |
| node_rank (`int`, *optional*, defaults to 0) : The rank of the current node. | |
| num_nodes (`int`, *optional*, defaults to 1) : The number of nodes to use for training. | |
| rdzv_backend (`str`, *optional*, defaults to `"static"`) : The rendezvous method to use, such as 'static' (the default) or 'c10d' | |
| rdzv_endpoint (`str`, *optional*, defaults to `""`) : The endpoint of the rdzv sync. storage. | |
| rdzv_conf (`Dict`, *optional*, defaults to `None`) : Additional rendezvous configuration. | |
| rdzv_id (`str`, *optional*, defaults to `"none"`) : The unique run id of the job. | |
| max_restarts (`int`, *optional*, defaults to 0) : The maximum amount of restarts that elastic agent will conduct on workers before failure. | |
| monitor_interval (`float`, *optional*, defaults to 0.1) : The interval in seconds that is used by the elastic_agent as a period of monitoring workers. | |
| log_line_prefix_template (`str`, *optional*, defaults to `None`) : The prefix template for elastic launch logging. Available from PyTorch 2.2.0. | |
| ## debug_launcher[[accelerate.debug_launcher]] | |
| #### accelerate.debug_launcher[[accelerate.debug_launcher]] | |
| [Source](https://github.com/huggingface/accelerate/blob/vr_4021/src/accelerate/launchers.py#L276) | |
| Launches a training function using several processes on CPU for debugging purposes. | |
| This function is provided for internal testing and debugging, but it's not intended for real trainings. It will | |
| only use the CPU. | |
| **Parameters:** | |
| function (`Callable`) : The training function to execute. | |
| args (`Tuple`) : Tuple of arguments to pass to the function (it will receive `*args`). | |
| num_processes (`int`, *optional*, defaults to 2) : The number of processes to use for training. | |
Xet Storage Details
- Size:
- 3.63 kB
- Xet hash:
- 9d0c5171dd22ffe09026e1718a733fa05ed407c9887da3f7958ac3f4d6ea43ca
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.