qwenillustrious / peft /docs /source /developer_guides /low_level_api.md

Add files using upload-large-folder tool

bd33eac verified 4 months ago

4.95 kB

	<!--Copyright 2023 The HuggingFace Team. All rights reserved.

	Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
	an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
	specific language governing permissions and limitations under the License.

	⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
	rendered properly in your Markdown viewer.

	-->

	# Adapter injection

	With PEFT, you can inject trainable adapters into any `torch` module which allows you to use adapter methods without relying on the modeling classes in PEFT. Currently, PEFT supports injecting [LoRA](../conceptual_guides/adapter#low-rank-adaptation-lora), [AdaLoRA](../conceptual_guides/adapter#adaptive-low-rank-adaptation-adalora), and [IA3](../conceptual_guides/ia3) into models because for these adapters, inplace modification of the model is sufficient for finetuning it.

	Check the table below to see when you should inject adapters.

	\| Pros \| Cons \|
	\|---\|---\|
	\| the model is modified inplace, keeping all the original attributes and methods \| manually write the `from_pretrained` and `save_pretrained` utility functions from Hugging Face to save and load adapters \|
	\| works for any `torch` module and modality \| doesn't work with any of the utility methods provided by `PeftModel` such as disabling and merging adapters \|

	## Creating a new PEFT model

	To perform the adapter injection, use the [`inject_adapter_in_model`] method. This method takes 3 arguments, the PEFT config, the model, and an optional adapter name. You can also attach multiple adapters to the model if you call [`inject_adapter_in_model`] multiple times with different adapter names.

	For example, to inject LoRA adapters into the `linear` submodule of the `DummyModel` module:

	```python
	import torch
	from peft import inject_adapter_in_model, LoraConfig

	class DummyModel(torch.nn.Module):
	def __init__(self):
	super().__init__()
	self.embedding = torch.nn.Embedding(10, 10)
	self.linear = torch.nn.Linear(10, 10)
	self.lm_head = torch.nn.Linear(10, 10)

	def forward(self, input_ids):
	x = self.embedding(input_ids)
	x = self.linear(x)
	x = self.lm_head(x)
	return x


	lora_config = LoraConfig(
	lora_alpha=16,
	lora_dropout=0.1,
	r=64,
	bias="none",
	target_modules=["linear"],
	)

	model = DummyModel()
	model = inject_adapter_in_model(lora_config, model)

	dummy_inputs = torch.LongTensor([[0, 1, 2, 3, 4, 5, 6, 7]])
	dummy_outputs = model(dummy_inputs)
	```

	Print the model to see that the adapters have been correctly injected.

	```bash
	DummyModel(
	(embedding): Embedding(10, 10)
	(linear): Linear(
	in_features=10, out_features=10, bias=True
	(lora_dropout): ModuleDict(
	(default): Dropout(p=0.1, inplace=False)
	)
	(lora_A): ModuleDict(
	(default): Linear(in_features=10, out_features=64, bias=False)
	)
	(lora_B): ModuleDict(
	(default): Linear(in_features=64, out_features=10, bias=False)
	)
	(lora_embedding_A): ParameterDict()
	(lora_embedding_B): ParameterDict()
	)
	(lm_head): Linear(in_features=10, out_features=10, bias=True)
	)
	```

	## Saving the model

	To only save the adapter, use the [`get_peft_model_state_dict`] function:

	```python
	from peft import get_peft_model_state_dict

	peft_state_dict = get_peft_model_state_dict(model)
	print(peft_state_dict)
	```

	Otherwise, `model.state_dict()` returns the full state dict of the model.

	## Loading the model

	After loading the saved `state_dict`, it can be applied using the [`set_peft_model_state_dict`] function:

	```python
	from peft import set_peft_model_state_dict

	model = DummyModel()
	model = inject_adapter_in_model(lora_config, model)
	outcome = set_peft_model_state_dict(model, peft_state_dict)
	# check that there were no wrong keys
	print(outcome.unexpected_keys)
	```

	If injecting the adapter is slow or you need to load a large number of adapters, you may use an optimization that allows to create an "empty" adapter on meta device and only fills the weights with real weights when the [`set_peft_model_state_dict`] is called. To do this, pass `low_cpu_mem_usage=True` to both [`inject_adapter_in_model`] and [`set_peft_model_state_dict`].

	```python
	model = DummyModel()
	model = inject_adapter_in_model(lora_config, model, low_cpu_mem_usage=True)

	print(model.linear.lora_A["default"].weight.device.type == "meta") # should be True
	set_peft_model_state_dict(model, peft_state_dict, low_cpu_mem_usage=True)
	print(model.linear.lora_A["default"].weight.device.type == "cpu") # should be True
	```