Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / trl /pr_4331 /en /peft_integration.md

rtrm

about 2 months ago

preview code

download

raw

8.14 kB

	# Examples of using peft with trl to finetune 8-bit models with Low Rank Adaption (LoRA)

	The notebooks and scripts in these examples show how to use Low Rank Adaptation (LoRA) to fine-tune models in a memory efficient manner. Most of PEFT methods supported in peft library but note that some PEFT methods such as Prompt tuning are not supported.
	For more information on LoRA, see the [original paper](https://huggingface.co/papers/2106.09685).

	## Installation

	Note: peft is in active development, so we install directly from their Github page.
	Peft also relies on the latest version of transformers.

	```bash
	pip install trl[peft]
	pip install bitsandbytes loralib
	pip install git+https://github.com/huggingface/transformers.git@main
	#optional: wandb
	pip install wandb
	```

	Note: if you don't want to log with `wandb` remove `log_with="wandb"` in the scripts/notebooks. You can also replace it with your favourite experiment tracker that's [supported by `accelerate`](https://huggingface.co/docs/accelerate/usage_guides/tracking).

	## How to use it?

	Simply declare a [PeftConfig](https://huggingface.co/docs/peft/main/en/package_reference/config#peft.PeftConfig) object in your script and pass it through `.from_pretrained` to load the TRL+PEFT model.

	```python
	from peft import LoraConfig
	from trl import AutoModelForCausalLMWithValueHead

	model_id = "edbeeching/gpt-neo-125M-imdb"
	lora_config = LoraConfig(
	r=16,
	lora_alpha=32,
	lora_dropout=0.05,
	bias="none",
	task_type="CAUSAL_LM",
	)

	model = AutoModelForCausalLMWithValueHead.from_pretrained(
	model_id,
	peft_config=lora_config,
	)
	```

	And if you want to load your model in 8bit precision:

	```python
	pretrained_model = AutoModelForCausalLMWithValueHead.from_pretrained(
	config.model_name,
	load_in_8bit=True,
	peft_config=lora_config,
	)
	```

	... or in 4bit precision:

	```python
	pretrained_model = AutoModelForCausalLMWithValueHead.from_pretrained(
	config.model_name,
	peft_config=lora_config,
	load_in_4bit=True,
	)
	```

	## Launch scripts

	The `trl` library is powered by `accelerate`. As such it is best to configure and launch trainings with the following commands:

	```bash
	accelerate config # will prompt you to define the training configuration
	accelerate launch examples/scripts/ppo.py --use_peft # launch`es training
	```

	## Using `trl` + `peft` and Data Parallelism

	You can scale up to as many GPUs as you want, as long as you are able to fit the training process in a single device. The only tweak you need to apply is to load the model as follows:

	```python
	from peft import LoraConfig
	...

	lora_config = LoraConfig(
	r=16,
	lora_alpha=32,
	lora_dropout=0.05,
	bias="none",
	task_type="CAUSAL_LM",
	)

	pretrained_model = AutoModelForCausalLMWithValueHead.from_pretrained(
	config.model_name,
	peft_config=lora_config,
	)
	```

	And if you want to load your model in 8bit precision:

	```python
	pretrained_model = AutoModelForCausalLMWithValueHead.from_pretrained(
	config.model_name,
	peft_config=lora_config,
	load_in_8bit=True,
	)
	```

	... or in 4bit precision:

	```python
	pretrained_model = AutoModelForCausalLMWithValueHead.from_pretrained(
	config.model_name,
	peft_config=lora_config,
	load_in_4bit=True,
	)
	```

	Finally, make sure that the rewards are computed on correct device as well, for that you can use `ppo_trainer.model.current_device`.

	## Multi-Adapter RL Training

	You can use a single base model with multiple PEFT adapters for the entire PPO algorithm - including retrieving reference logits, computing active logits, and calculating rewards. This approach is useful for memory-efficient RL training.

	> [!WARNING]
	> This feature is experimental and convergence has not been extensively tested. We encourage the community to share feedback and report any issues.

	### Requirements

	Install PEFT and optionally bitsandbytes for 8-bit models:

	```bash
	pip install peft bitsandbytes
	```

	### Training Workflow

	The multi-adapter approach requires three stages:

	1. Supervised Fine-Tuning (SFT): Train a base model on your target domain (e.g., IMDB dataset) using `SFTTrainer`
	2. Reward Model Training: Train a reward model adapter using PEFT and `RewardTrainer` (see [reward modeling example](https://github.com/huggingface/trl/tree/main/examples/scripts/reward_modeling.py))
	3. PPO Training: Fine-tune new adapters using PPO with the reward adapter

	> [!IMPORTANT]
	> Use the same base model (architecture and weights) for stages 2 & 3.

	### Basic Usage

	After training your reward adapter and pushing it to the Hub:

	```python
	from peft import LoraConfig
	from trl import AutoModelForCausalLMWithValueHead, PPOTrainer

	model_name = "huggyllama/llama-7b"
	rm_adapter_id = "trl-lib/llama-7b-hh-rm-adapter"

	# Configure PPO adapter
	lora_config = LoraConfig(
	r=16,
	lora_alpha=32,
	lora_dropout=0.05,
	bias="none",
	task_type="CAUSAL_LM",
	)

	# Load model with reward adapter
	model = AutoModelForCausalLMWithValueHead.from_pretrained(
	model_name,
	peft_config=lora_config,
	reward_adapter=rm_adapter_id,
	)

	trainer = PPOTrainer(model=model, ...)
	```

	In your training loop, compute rewards using:

	```python
	rewards = trainer.model.compute_reward_score(**inputs)
	```

	### Advanced Features

	#### Multiple Policy Adapters

	You can train multiple adapters on the same base model for different policies. Control which adapter to activate using the `ppo_adapter_name` argument:

	```python
	adapter_name_policy_1 = "policy_1"
	rewards = trainer.model.compute_reward_score(**inputs, ppo_adapter_name=adapter_name_policy_1)
	```

	#### Quantized Base Models

	For memory-efficient training, load the base model in 8-bit or 4-bit while keeping adapters in float32:

	```python
	from transformers import BitsAndBytesConfig

	model = AutoModelForCausalLMWithValueHead.from_pretrained(
	model_name,
	peft_config=lora_config,
	reward_adapter=rm_adapter_id,
	quantization_config=BitsAndBytesConfig(load_in_8bit=True),
	)
	```

	## Naive pipeline parallelism (NPP) for large models (>60B models)

	The `trl` library also supports naive pipeline parallelism (NPP) for large models (>60B models). This is a simple way to parallelize the model across multiple GPUs.
	This paradigm, termed as "Naive Pipeline Parallelism" (NPP) is a simple way to parallelize the model across multiple GPUs. We load the model and the adapters across multiple GPUs and the activations and gradients will be naively communicated across the GPUs. This supports `int8` models as well as other `dtype` models.

	![NPP](https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/trl-npp.png)

	### How to use NPP?

	Simply load your model with a custom `device_map` argument on the `from_pretrained` to split your model across multiple devices. Check out this [nice tutorial](https://github.com/huggingface/blog/blob/main/accelerate-large-models.md) on how to properly create a `device_map` for your model.

	Also make sure to have the `lm_head` module on the first GPU device as it may throw an error if it is not on the first device. As this time of writing, you need to install the `main` branch of `accelerate`: `pip install git+https://github.com/huggingface/accelerate.git@main` and `peft`: `pip install git+https://github.com/huggingface/peft.git@main`.

	### Launch scripts

	Although `trl` library is powered by `accelerate`, you should run your training script in a single process. Note that we do not support Data Parallelism together with NPP yet.

	```bash
	python PATH_TO_SCRIPT
	```

	## Fine-tuning Llama-2 model

	You can easily fine-tune Llama2 model using `SFTTrainer` and the official script! For example to fine-tune llama2-7b on the Guanaco dataset, run (tested on a single NVIDIA T4-16GB):

	```bash
	python trl/scripts/sft.py --output_dir sft_openassistant-guanaco --model_name meta-llama/Llama-2-7b-hf --dataset_name timdettmers/openassistant-guanaco --load_in_4bit --use_peft --per_device_train_batch_size 4 --gradient_accumulation_steps 2
	```


	<EditOnGithub source="https://github.com/huggingface/trl/blob/main/docs/source/peft_integration.md" />

Xet Storage Details

Size:: 8.14 kB
Xet hash:: 26830ac912d17f7a180585fdcc0d3811124bdf4f3890d8b7418fc15dd5bb7f3c

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.