rayraycano
/

finetune-demo-lora

Generated from Trainer

Model card Files Files and versions

finetune-demo-lora / docs /multi-gpu.qmd

rayraycano's picture

Training in progress, step 20

fcca8c8 verified 2 months ago

history blame contribute delete

2.21 kB

	---
	title: "Multi-GPU"
	format:
	html:
	toc: true
	toc-depth: 3
	number-sections: true
	code-tools: true
	execute:
	enabled: false
	---

	This guide covers advanced training configurations for multi-GPU setups using Axolotl.

	## Overview {#sec-overview}

	Axolotl supports several methods for multi-GPU training:

	- DeepSpeed (recommended)
	- FSDP (Fully Sharded Data Parallel)
	- FSDP + QLoRA

	## DeepSpeed {#sec-deepspeed}

	DeepSpeed is the recommended approach for multi-GPU training due to its stability and performance. It provides various optimization levels through ZeRO stages.

	### Configuration {#sec-deepspeed-config}

	Add to your YAML config:

	```{.yaml}
	deepspeed: deepspeed_configs/zero1.json
	```

	### Usage {#sec-deepspeed-usage}

	```{.bash}
	# Passing arg via config
	axolotl train config.yml

	# Passing arg via cli
	axolotl train config.yml --deepspeed deepspeed_configs/zero1.json
	```

	### ZeRO Stages {#sec-zero-stages}

	We provide default configurations for:

	- ZeRO Stage 1 (`zero1.json`)
	- ZeRO Stage 2 (`zero2.json`)
	- ZeRO Stage 3 (`zero3.json`)

	Choose based on your memory requirements and performance needs.

	## FSDP {#sec-fsdp}

	### Basic FSDP Configuration {#sec-fsdp-config}

	```{.yaml}
	fsdp:
	- full_shard
	- auto_wrap
	fsdp_config:
	fsdp_offload_params: true
	fsdp_state_dict_type: FULL_STATE_DICT
	fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
	```

	### FSDP + QLoRA {#sec-fsdp-qlora}

	For combining FSDP with QLoRA, see our [dedicated guide](fsdp_qlora.qmd).

	## Performance Optimization {#sec-performance}

	### Liger Kernel Integration {#sec-liger}

	Please see [docs](custom_integrations.qmd#liger) for more info.

	## Troubleshooting {#sec-troubleshooting}

	### NCCL Issues {#sec-nccl}

	For NCCL-related problems, see our [NCCL troubleshooting guide](nccl.qmd).

	### Common Problems {#sec-common-problems}

	::: {.panel-tabset}

	## Memory Issues

	- Reduce `micro_batch_size`
	- Reduce `eval_batch_size`
	- Adjust `gradient_accumulation_steps`
	- Consider using a higher ZeRO stage

	## Training Instability

	- Start with DeepSpeed ZeRO-2
	- Monitor loss values
	- Check learning rates

	:::

	For more detailed troubleshooting, see our [debugging guide](debugging.qmd).