Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / trl /pr_4949 /en /model_utils.md

rtrm

about 1 month ago

preview code

download

raw

3.42 kB

Model Utilities

get_act_offloading_ctx_manager[[trl.models.get_act_offloading_ctx_manager]]

trl.models.get_act_offloading_ctx_manager[[trl.models.get_act_offloading_ctx_manager]]

Source

Returns the activation offloading context manager for the model. All but the last output Linear in every step will be offloaded.

If activation offloading is enabled, we return the OffloadActivations context manager. If activation offloading is disabled, we return a NoOpManager context manager.

Parameters:

model (nn.Module) : Model to wrap with the activation offloading context manager.

use_pin_memory (bool, optional, defaults to True) : Whether to offloaded Tensor will be placed in pinned memory on the CPU. Pinned memory allows the Tensor to be moved back onto GPU more quickly but is a limited resource.

use_streams (bool, optional, defaults to True) : Whether to use streams for performance optimization where the communications get overlapped with the computation. Requires a torch build after torch-2.5.0.

min_offload_size (int, optional, defaults to 1024) : Minimum number of bytes a Tensor must be in order to qualify for offloading. If the tensor is too small, we do not want to waste bandwidth and resources moving it to CPU and back.

max_fwd_stash_size (int, optional, defaults to 5) : Maximum size of the forward stash, or the maximum number of consecutive activations to keep alive during the forward pass. This number must be at least 1. Keeping alive more activations will potentially allow more overlap between the communication and compute streams at the cost of increasing memory usage. Keeping alive fewer activations will conserve memory, but may cause poor overlap between the streams, increasing runtime.

warn_if_no_head (bool, optional, defaults to True) : Whether to warn if no output head is detected. If set to False, no warning will be raised if no output head is detected.

Returns:

contextlib.ContextDecorator

Activation offloading context manager for the model.

disable_gradient_checkpointing[[trl.models.utils.disable_gradient_checkpointing]]

trl.models.utils.disable_gradient_checkpointing[[trl.models.utils.disable_gradient_checkpointing]]

Source

Temporarily disable gradient checkpointing, restoring the previous state afterward.

Parameters:

model (PreTrainedModel) : Model for which to temporarily disable gradient checkpointing.

gradient_checkpointing_kwargs (dict or None, optional) : Additional kwargs for gradient checkpointing enabling.

create_reference_model[[trl.create_reference_model]]

trl.create_reference_model[[trl.create_reference_model]]

Source

Creates a static reference copy of a model. Note that model will be in .eval() mode.

Parameters:

model (nn.Module) : The model to be copied.

num_shared_layers (int, optional) : The number of initial layers that are shared between both models and kept frozen.

pattern (str, optional) : The shared layers are selected with a string pattern (e.g. "transformer.h.{layer}" for GPT2) and if a custom pattern is necessary it can be passed here.

Returns:

nn.Module

Xet Storage Details

Size:: 3.42 kB
Xet hash:: c46fb1e2b18a732b89ad1406adf7d32568ba7061f4ff96de6fe94a38c69e2dd4

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.