Buckets:
| # Chat template utilities | |
| ## clone_chat_template[[trl.clone_chat_template]] | |
| #### trl.clone_chat_template[[trl.clone_chat_template]] | |
| [Source](https://github.com/huggingface/trl/blob/vr_4949/trl/chat_template_utils.py#L18) | |
| Clones a chat template from a source tokenizer to the target tokenizer and updates the model accordingly. | |
| This function: | |
| - Copies the chat template from a source tokenizer to the target tokenizer. | |
| - Adds any new tokens from the source tokenizer to the target tokenizer. | |
| - Sets and synchronizes the EOS token across the tokenizer and model. | |
| - Resizes the model's token embeddings to match the new vocabulary size, optionally rounding it up to a multiple of | |
| a specified value. In such cases, dummy tokens are added to the tokenizer to ensure the vocabulary size matches | |
| the embedding dimensions. | |
| Example: | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| from trl import clone_chat_template | |
| model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B") | |
| tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B") | |
| model, tokenizer, added_tokens = clone_chat_template(model, tokenizer, "Qwen/Qwen3-0.6B") | |
| ``` | |
| **Parameters:** | |
| model ([PreTrainedModel](https://huggingface.co/docs/transformers/main/en/main_classes/model#transformers.PreTrainedModel)) : Model to update. | |
| tokenizer (`PreTrainedTokenizer`) : Tokenizer to update. | |
| source_tokenizer_path (`str`) : Path or identifier of the pretrained tokenizer to clone from. | |
| resize_to_multiple_of (`int` or `None`, *optional*, defaults to `64`) : The embedding layer will be resized to the new vocabulary size. If this is not `None`, it will round up the new vocabulary size to the nearest multiple of this value. | |
| **Returns:** | |
| `model ([PreTrainedModel](https://huggingface.co/docs/transformers/main/en/main_classes/model#transformers.PreTrainedModel))` | |
| Updated model with resized token embeddings and EOS token configured. | |
| tokenizer (`PreTrainedTokenizer`): | |
| Updated tokenizer with the chat template and special tokens applied. | |
| added_tokens (`list[int]`): | |
| List of tokens that were added to the tokenizer from the source tokenizer. | |
| ## add_response_schema[[trl.add_response_schema]] | |
| #### trl.add_response_schema[[trl.add_response_schema]] | |
| [Source](https://github.com/huggingface/trl/blob/vr_4949/trl/chat_template_utils.py#L238) | |
| Adds the appropriate response schema to the given tokenizer based on its chat template. | |
| At the time of initial implementation, most tokenizers do not have built-in support for response schemas. While | |
| waiting for broader adoption, we provide this utility function to manually set the response schema for known chat | |
| templates. | |
| Examples: | |
| ```python | |
| >>> from trl.chat_template_utils import add_response_schema | |
| >>> from transformers import AutoTokenizer | |
| >>> tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B") | |
| >>> tokenizer = add_response_schema(tokenizer) | |
| >>> assistant_text = '\n{"name": "multiply", "arguments": {"a": 3, "b": 4}}\n' | |
| >>> tokenizer.parse_response(assistant_text) | |
| {'role': 'assistant', 'content': '', 'tool_calls': [{'type': 'function', 'function': {'name': 'multiply', 'arguments': {'a': 3, 'b': 4}}}]} | |
| ``` | |
| **Parameters:** | |
| tokenizer (`PreTrainedTokenizer`) : Tokenizer to which the response schema will be added. | |
| **Returns:** | |
| ``PreTrainedTokenizer`` | |
| Tokenizer with the added response schema. | |
| ## is_chat_template_prefix_preserving[[trl.chat_template_utils.is_chat_template_prefix_preserving]] | |
| #### trl.chat_template_utils.is_chat_template_prefix_preserving[[trl.chat_template_utils.is_chat_template_prefix_preserving]] | |
| [Source](https://github.com/huggingface/trl/blob/vr_4949/trl/chat_template_utils.py#L278) | |
| Check whether the chat template preserves prefixes when applied. | |
| **Parameters:** | |
| tokenizer (`PreTrainedTokenizer`) : Tokenizer instance to check. | |
| **Returns:** | |
| ``bool`` | |
| `True` if the chat template preserves prefixes, `False` otherwise. | |
| ## get_training_chat_template[[trl.get_training_chat_template]] | |
| #### trl.get_training_chat_template[[trl.get_training_chat_template]] | |
| [Source](https://github.com/huggingface/trl/blob/vr_4949/trl/chat_template_utils.py#L401) | |
| Get a prefix-preserving chat template for training, if needed. | |
| If the tokenizer's template isn't prefix-preserving, returns a training-compatible template (currently only Qwen3 | |
| supported). Otherwise, returns `None`. | |
| Example: | |
| ```python | |
| >>> from trl.chat_template_utils import get_training_chat_template | |
| >>> from transformers import AutoTokenizer | |
| >>> tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B") | |
| >>> messages1 = [ | |
| ... {"role": "user", "content": "What color is the sky?"}, | |
| ... {"role": "assistant", "content": "It is blue."}, | |
| ... ] | |
| >>> messages2 = [ | |
| ... {"role": "user", "content": "What color is the sky?"}, | |
| ... {"role": "assistant", "content": "It is blue."}, | |
| ... {"role": "user", "content": "And at night?"}, | |
| ... ] | |
| >>> tokenizer.apply_chat_template(messages1, tokenize=False) | |
| 'user\nWhat color is the sky?\nassistant\n\n\n\n\nIt is blue.\n' | |
| >>> tokenizer.apply_chat_template(messages2, tokenize=False) | |
| 'user\nWhat color is the sky?\nassistant\nIt is blue.\nuser\nAnd at night?\n' | |
| >>> # ^ think tags missing | |
| >>> chat_template = get_training_chat_template(tokenizer) | |
| >>> tokenizer.apply_chat_template(messages1, tokenize=False, chat_template=chat_template) | |
| 'user\nWhat color is the sky?\nassistant\n\n\n\n\nIt is blue.\n' | |
| >>> tokenizer.apply_chat_template(messages2, tokenize=False, chat_template=chat_template) | |
| 'user\nWhat color is the sky?\nassistant\n\n\n\n\nIt is blue.\nuser\nAnd at night?\n' | |
| ``` | |
| **Parameters:** | |
| tokenizer (`PreTrainedTokenizer`) : Tokenizer instance to check. | |
| **Returns:** | |
| ``str` or `None`` | |
| Training-compatible chat template, or `None` if no patching is needed. | |
| ## parse_response[[trl.chat_template_utils.parse_response]] | |
| #### trl.chat_template_utils.parse_response[[trl.chat_template_utils.parse_response]] | |
| [Source](https://github.com/huggingface/trl/blob/vr_4949/trl/chat_template_utils.py#L460) | |
| Parse a token sequence into structured response dictionaries with fallback handling. | |
| Attempts to parse the sequence using `tokenizer.parse_response()`. If parsing fails (e.g., due to malformed tool | |
| calls like `{"type":"function"`), falls back to decoding as plain text. | |
| Also removes incorrectly appended EOS tokens from tool call content when present. | |
| Example: | |
| ```python | |
| >>> from trl.chat_template_utils import parse_response, add_response_schema | |
| >>> from transformers import AutoTokenizer | |
| >>> tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B") | |
| >>> tokenizer = add_response_schema(tokenizer) # temporary until built-in support | |
| >>> text = '\n{"name": "multiply", "arguments": {"a": 3, "b": 4}}\n' | |
| >>> ids = tokenizer(text)["input_ids"] | |
| >>> parse_response(tokenizer, ids) | |
| {'role': 'assistant', 'content': '', 'tool_calls': [{'type': 'function', 'function': {'name': 'multiply', 'arguments': {'a': 3, 'b': 4}}}]} | |
| ``` | |
| **Parameters:** | |
| tokenizer (`PreTrainedTokenizer`) : Tokenizer with a `parse_response()` method. | |
| ids (`list[int]`) : List of token sequences. | |
| **Returns:** | |
| ``dict`` | |
| Response dictionary. | |
Xet Storage Details
- Size:
- 7.22 kB
- Xet hash:
- 0ad40dba29e7b231be5190de7560e271eea7f7b61964229f9452729b3bd88ccb
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.