Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / accelerate /pr_4021 /en /usage_guides /intel_cpu.md

HuggingFaceDocBuilder

19 days ago

preview code

download

raw

5.84 kB

	# Training on Intel CPU

	## How It Works For Training optimization in CPU

	Accelerate has full support for Intel CPU, all you need to do is enabling it through the config.

	Scenario 1: Acceleration of No distributed CPU training

	Run accelerate config on your machine:

	```bash
	$ accelerate config
	-----------------------------------------------------------------------------------------------------------------------------------------------------------
	In which compute environment are you running?
	This machine
	-----------------------------------------------------------------------------------------------------------------------------------------------------------
	Which type of machine are you using?
	No distributed training
	Do you want to run your training on CPU only (even if a GPU / Apple Silicon device is available)? [yes/NO]:yes
	Do you wish to optimize your script with torch dynamo?[yes/NO]:NO
	Do you want to use DeepSpeed? [yes/NO]: NO
	-----------------------------------------------------------------------------------------------------------------------------------------------------------
	Do you wish to use FP16 or BF16 (mixed precision)?
	bf16
	```
	This will generate a config file that will be used automatically to properly set the
	default options when doing

	```bash
	accelerate launch my_script.py --args_to_my_script
	```

	For instance, here is how you would run the NLP example `examples/nlp_example.py` (from the root of the repo) with `default_config.yaml` which is generated by `accelerate config`

	```bash
	compute_environment: LOCAL_MACHINE
	distributed_type: 'NO'
	downcast_bf16: 'no'
	machine_rank: 0
	main_training_function: main
	mixed_precision: bf16
	num_machines: 1
	num_processes: 1
	rdzv_backend: static
	same_network: true
	tpu_env: []
	tpu_use_cluster: false
	tpu_use_sudo: false
	use_cpu: true
	```
	```bash
	accelerate launch examples/nlp_example.py
	```

	> [!CAUTION]
	> `accelerator.prepare` can currently only handle simultaneously preparing multiple models (and no optimizer) OR a single model-optimizer pair for training. Other attempts (e.g., two model-optimizer pairs) will raise a verbose error. To work around this limitation, consider separately using `accelerator.prepare` for each model-optimizer pair.

	Scenario 2: Acceleration of distributed CPU training
	we use Intel oneCCL for communication, combined with Intel® MPI library to deliver flexible, efficient, scalable cluster messaging on Intel® architecture. you could refer the [here](https://huggingface.co/docs/transformers/perf_train_cpu_many) for the installation guide

	Run accelerate config on your machine(node0):

	```bash
	$ accelerate config
	-----------------------------------------------------------------------------------------------------------------------------------------------------------
	In which compute environment are you running?
	This machine
	-----------------------------------------------------------------------------------------------------------------------------------------------------------
	Which type of machine are you using?
	multi-CPU
	How many different machines will you use (use more than 1 for multi-node training)? [1]: 4
	-----------------------------------------------------------------------------------------------------------------------------------------------------------
	What is the rank of this machine?
	0
	What is the IP address of the machine that will host the main process? 36.112.23.24
	What is the port you will use to communicate with the main process? 29500
	Are all the machines on the same local network? Answer `no` if nodes are on the cloud and/or on different network hosts [YES/no]: yes
	Do you want accelerate to launch mpirun? [yes/NO]: yes
	Please enter the path to the hostfile to use with mpirun [~/hostfile]: ~/hostfile
	Enter the number of oneCCL worker threads [1]: 1
	Do you wish to optimize your script with torch dynamo?[yes/NO]:NO
	How many processes should be used for distributed training? [1]:16
	-----------------------------------------------------------------------------------------------------------------------------------------------------------
	Do you wish to use FP16 or BF16 (mixed precision)?
	bf16
	```
	For instance, here is how you would run the NLP example `examples/nlp_example.py` (from the root of the repo) for distributed CPU training.

	`default_config.yaml` which is generated by `accelerate config`
	```bash
	compute_environment: LOCAL_MACHINE
	distributed_type: MULTI_CPU
	downcast_bf16: 'no'
	machine_rank: 0
	main_process_ip: 36.112.23.24
	main_process_port: 29500
	main_training_function: main
	mixed_precision: bf16
	mpirun_config:
	mpirun_hostfile: /home/user/hostfile
	num_machines: 4
	num_processes: 16
	rdzv_backend: static
	same_network: true
	tpu_env: []
	tpu_use_cluster: false
	tpu_use_sudo: false
	use_cpu: true
	```

	Set following env and using intel MPI to launch the training

	In `node0`, you need to create a configuration file which contains the IP addresses of each node (for example hostfile) and pass that configuration file path as an argument.

	If you selected to let Accelerate launch `mpirun`, ensure that the location of your hostfile matches the path in the config.

	```bash
	$ cat hostfile
	xxx.xxx.xxx.xxx #node0 ip
	xxx.xxx.xxx.xxx #node1 ip
	xxx.xxx.xxx.xxx #node2 ip
	xxx.xxx.xxx.xxx #node3 ip
	```

	```bash
	accelerate launch examples/nlp_example.py
	```

	You can also directly launch distributed training with `mpirun` command, you need to run the following command in node0 and 16DDP will be enabled in node0,node1,node2,node3 with BF16 mixed precision. When using this method, the python script, python environment, and accelerate config file need to be available on all of the machines used for multi-CPU training.

	```bash
	export MASTER_ADDR=xxx.xxx.xxx.xxx #node0 ip
	mpirun -f hostfile -n 16 -ppn 4 accelerate launch examples/nlp_example.py
	```

Xet Storage Details

Size:: 5.84 kB
Xet hash:: 5625fa7681d95f1c9884d77bb979d521fb98d0b421f750e320f4c799f53ac52c

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.