Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / trl /pr_4331 /en /model_utils.md

rtrm

about 1 month ago

preview code

download

raw

5.75 kB

	# Model Utilities

	## clone_chat_template[[trl.clone_chat_template]]

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>trl.clone_chat_template</name><anchor>trl.clone_chat_template</anchor><source>https://github.com/huggingface/trl/blob/vr_4331/trl/models/utils.py#L164</source><parameters>[{"name": "model", "val": ": PreTrainedModel"}, {"name": "tokenizer", "val": ": PreTrainedTokenizer"}, {"name": "source_tokenizer_path", "val": ": str"}, {"name": "resize_to_multiple_of", "val": ": int \| None = 64"}]</parameters><paramsdesc>- model ([PreTrainedModel](https://huggingface.co/docs/transformers/main/en/main_classes/model#transformers.PreTrainedModel)) --
	Model to update.
	- tokenizer ([PreTrainedTokenizer](https://huggingface.co/docs/transformers/main/en/main_classes/tokenizer#transformers.PreTrainedTokenizer)) --
	Tokenizer to update.
	- source_tokenizer_path (`str`) --
	Path or identifier of the pretrained tokenizer to clone from.
	- resize_to_multiple_of (`int` or `None`, optional, defaults to `64`) --
	The embedding layer will be resized to the new vocabulary size. If this is not `None`, it will round up the
	new vocabulary size to the nearest multiple of this value.</paramsdesc><paramgroups>0</paramgroups><rettype>model ([PreTrainedModel](https://huggingface.co/docs/transformers/main/en/main_classes/model#transformers.PreTrainedModel))</rettype><retdesc>Updated model with resized token embeddings and EOS token configured.
	tokenizer ([PreTrainedTokenizer](https://huggingface.co/docs/transformers/main/en/main_classes/tokenizer#transformers.PreTrainedTokenizer)):
	Updated tokenizer with the chat template and special tokens applied.
	added_tokens (`list[int]`):
	List of tokens that were added to the tokenizer from the source tokenizer.</retdesc></docstring>

	Clones a chat template from a source tokenizer to the target tokenizer and updates the model accordingly.

	This function:
	- Copies the chat template from a source tokenizer to the target tokenizer.
	- Adds any new tokens from the source tokenizer to the target tokenizer.
	- Sets and synchronizes the EOS token across the tokenizer and model.
	- Resizes the model's token embeddings to match the new vocabulary size, optionally rounding it up to a multiple of
	a specified value. In such cases, dummy tokens are added to the tokenizer to ensure the vocabulary size matches
	the embedding dimensions.







	<ExampleCodeBlock anchor="trl.clone_chat_template.example">

	Example:
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from trl import clone_chat_template

	model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
	tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
	model, tokenizer, added_tokens = clone_chat_template(model, tokenizer, "Qwen/Qwen3-0.6B")
	```

	</ExampleCodeBlock>


	</div>

	## get_act_offloading_ctx_manager[[trl.models.get_act_offloading_ctx_manager]]

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>trl.models.get_act_offloading_ctx_manager</name><anchor>trl.models.get_act_offloading_ctx_manager</anchor><source>https://github.com/huggingface/trl/blob/vr_4331/trl/models/activation_offloading.py#L578</source><parameters>[{"name": "model", "val": ": Module"}, {"name": "use_pin_memory", "val": ": bool = True"}, {"name": "use_streams", "val": ": bool = True"}, {"name": "min_offload_size", "val": ": int = 1024"}, {"name": "max_fwd_stash_size", "val": ": int = 5"}, {"name": "warn_if_no_head", "val": ": bool = True"}]</parameters><paramsdesc>- model (`nn.Module`) --
	Model to wrap with the activation offloading context manager.
	- use_pin_memory (`bool`, optional, defaults to `True`) --
	Whether to offloaded Tensor will be placed in pinned memory on the CPU. Pinned memory allows the Tensor to
	be moved back onto GPU more quickly but is a limited resource.
	- use_streams (`bool`, optional, defaults to `True`) --
	Whether to use streams for performance optimization where the communications get overlapped with the
	computation. Requires a torch build after torch-2.5.0.
	- min_offload_size (`int`, optional, defaults to `1024`) --
	Minimum number of bytes a Tensor must be in order to qualify for offloading. If the tensor is too small, we
	do not want to waste bandwidth and resources moving it to CPU and back.
	- max_fwd_stash_size (`int`, optional, defaults to `5`) --
	Maximum size of the forward stash, or the maximum number of consecutive activations to keep alive during
	the forward pass. This number must be at least 1. Keeping alive more activations will potentially allow
	more overlap between the communication and compute streams at the cost of increasing memory usage. Keeping
	alive fewer activations will conserve memory, but may cause poor overlap between the streams, increasing
	runtime.
	- warn_if_no_head (`bool`, optional, defaults to `True`) --
	Whether to warn if no output head is detected. If set to `False`, no warning will be raised if no output
	head is detected.</paramsdesc><paramgroups>0</paramgroups><rettype>`contextlib.ContextDecorator`</rettype><retdesc>Activation offloading context manager for the model.</retdesc></docstring>

	Returns the activation offloading context manager for the model. All but the last output Linear in every step will
	be offloaded.

	If activation offloading is enabled, we return the OffloadActivations context manager. If activation offloading is
	disabled, we return a NoOpManager context manager.








	</div>

	<EditOnGithub source="https://github.com/huggingface/trl/blob/main/docs/source/model_utils.md" />

Xet Storage Details

Size:: 5.75 kB
Xet hash:: 667a80b4f6df696fdb09010ab9825f5a176fb1c94c7692c979d7bde468e3927f

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.