Buckets:

hf-doc-build
/

doc-dev

The original scaling values are restored when the context manager exits. This context manager works with the transformers and diffusers models that have directly loaded LoRA adapters.

For LoRA, applying this context manager with multiplier in [0, 1] is strictly equivalent to applying wise-ft (see #1940 for details). It can improve the performances of the model if there is a distribution shiftbetween the training data used for fine-tuning, and the test data used during inference.

Warning: It has been reported that when using Apple's MPS backend for PyTorch, it is necessary to add a short sleep time after exiting the context before the scales are fully restored.

Example:

>>> model = ModelWithLoraLayer()
>>> multiplier = 0.5
>>> with rescale_adapter_scale(model, multiplier):
...     outputs = model(**inputs)  # Perform operations with the scaled model
>>> outputs = model(**inputs)  # The original scaling values are restored here

Parameters:

model : The model containing LoraLayer modules whose scaling is to be adjusted.

multiplier (float or int) : The multiplier that rescales the scaling attribute. Must be of type float or int.

Context manager to disable input dtype casting in the `forward` method of LoRA layers[[peft.helpers.disable_input_dtype_casting]]

peft.helpers.disable_input_dtype_casting[[peft.helpers.disable_input_dtype_casting]]

Source

Context manager disables input dtype casting to the dtype of the weight.

Parameters:

model (nn.Module) : The model containing PEFT modules whose input dtype casting is to be adjusted.

active (bool) : Whether the context manager is active (default) or inactive.

Context manager to enable DoRA caching (faster at inference time but requires more memory)[[peft.helpers.DoraCaching]]

peft.helpers.DoraCaching[[peft.helpers.DoraCaching]]

Source

Context manager to enable DoRA caching, which improves speed of DoRA inference at the expense of memory.

With active caching, the materialized LoRA weight (B @ A) and the weight norm (base weight + LoRA weight) are cached.

Even within the caching context, if the model is in training mode, caching is disabled. When the model switches to training mode, the cache will be cleared.

Example:

>>> from peft.helpers import enable_dora_scaling

>>> model.eval()  # put in eval model for caching to work

>>> with DoraCaching():  # use as a context manager
...     output = model(inputs)

>>> dora_caching = DoraCaching()
>>> dora_caching(enabled=True)  # permanently enable caching
>>> output = model(inputs)
>>> dora_caching(enabled=False)  # permanently disable caching
>>> output = model(inputs)

Xet Storage Details

Size:: 3.79 kB
Xet hash:: 9f471e37364d44d6de5845ef8e81dc089ef5bf34161ef74c8c3a1469cf7ebfad

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.

Helper methods

Checking if a model is a PEFT model[[peft.helpers.check_if_peft_model]]

peft.helpers.check_if_peft_model[[peft.helpers.check_if_peft_model]]

Temporarily Rescaling Adapter Scale in LoraLayer Modules[[peft.helpers.rescale_adapter_scale]]

peft.helpers.rescale_adapter_scale[[peft.helpers.rescale_adapter_scale]]

Context manager to disable input dtype casting in the forward method of LoRA layers[[peft.helpers.disable_input_dtype_casting]]

peft.helpers.disable_input_dtype_casting[[peft.helpers.disable_input_dtype_casting]]

Context manager to enable DoRA caching (faster at inference time but requires more memory)[[peft.helpers.DoraCaching]]

peft.helpers.DoraCaching[[peft.helpers.DoraCaching]]

Xet Storage Details

Context manager to disable input dtype casting in the `forward` method of LoRA layers[[peft.helpers.disable_input_dtype_casting]]