| <!--Copyright 2023 The HuggingFace Team. All rights reserved. | |
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
| the License. You may obtain a copy of the License at | |
| http://www.apache.org/licenses/LICENSE-2.0 | |
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
| specific language governing permissions and limitations under the License. | |
| ⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be | |
| rendered properly in your Markdown viewer. | |
| --> | |
| # Templates for Chat Models | |
| ## Introduction | |
| An increasingly common use case for LLMs is **chat**. In a chat context, rather than continuing a single string | |
| of text (as is the case with a standard language model), the model instead continues a conversation that consists | |
| of one or more **messages**, each of which includes a **role**, like "user" or "assistant", as well as message text. | |
| Much like tokenization, different models expect very different input formats for chat. This is the reason we added | |
| **chat templates** as a feature. Chat templates are part of the tokenizer. They specify how to convert conversations, | |
| represented as lists of messages, into a single tokenizable string in the format that the model expects. | |
| Let's make this concrete with a quick example using the `BlenderBot` model. BlenderBot has an extremely simple default | |
| template, which mostly just adds whitespace between rounds of dialogue: | |
| ```python | |
| >>> from transformers import AutoTokenizer | |
| >>> tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill") | |
| >>> chat = [ | |
| ... {"role": "user", "content": "Hello, how are you?"}, | |
| ... {"role": "assistant", "content": "I'm doing great. How can I help you today?"}, | |
| ... {"role": "user", "content": "I'd like to show off how chat templating works!"}, | |
| ... ] | |
| >>> tokenizer.apply_chat_template(chat, tokenize=False) | |
| " Hello, how are you? I'm doing great. How can I help you today? I'd like to show off how chat templating works!</s>" | |
| ``` | |
| Notice how the entire chat is condensed into a single string. If we use `tokenize=True`, which is the default setting, | |
| that string will also be tokenized for us. To see a more complex template in action, though, let's use the | |
| `mistralai/Mistral-7B-Instruct-v0.1` model. | |
| ```python | |
| >>> from transformers import AutoTokenizer | |
| >>> tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1") | |
| >>> chat = [ | |
| ... {"role": "user", "content": "Hello, how are you?"}, | |
| ... {"role": "assistant", "content": "I'm doing great. How can I help you today?"}, | |
| ... {"role": "user", "content": "I'd like to show off how chat templating works!"}, | |
| ... ] | |
| >>> tokenizer.apply_chat_template(chat, tokenize=False) | |
| "<s>[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today?</s> [INST] I'd like to show off how chat templating works! [/INST]" | |
| ``` | |
| Note that this time, the tokenizer has added the control tokens [INST] and [/INST] to indicate the start and end of | |
| user messages (but not assistant messages!). Mistral-instruct was trained with these tokens, but BlenderBot was not. | |
| ## How do I use chat templates? | |
| As you can see in the example above, chat templates are easy to use. Simply build a list of messages, with `role` | |
| and `content` keys, and then pass it to the [`~PreTrainedTokenizer.apply_chat_template`] method. Once you do that, | |
| you'll get output that's ready to go! When using chat templates as input for model generation, it's also a good idea | |
| to use `add_generation_prompt=True` to add a [generation prompt](#what-are-generation-prompts). | |
| Here's an example of preparing input for `model.generate()`, using the `Zephyr` assistant model: | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| checkpoint = "HuggingFaceH4/zephyr-7b-beta" | |
| tokenizer = AutoTokenizer.from_pretrained(checkpoint) | |
| model = AutoModelForCausalLM.from_pretrained(checkpoint) # You may want to use bfloat16 and/or move to GPU here | |
| messages = [ | |
| { | |
| "role": "system", | |
| "content": "You are a friendly chatbot who always responds in the style of a pirate", | |
| }, | |
| {"role": "user", "content": "How many helicopters can a human eat in one sitting?"}, | |
| ] | |
| tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt") | |
| print(tokenizer.decode(tokenized_chat[0])) | |
| ``` | |
| This will yield a string in the input format that Zephyr expects. | |
| ```text | |
| <|system|> | |
| You are a friendly chatbot who always responds in the style of a pirate</s> | |
| <|user|> | |
| How many helicopters can a human eat in one sitting?</s> | |
| <|assistant|> | |
| ``` | |
| Now that our input is formatted correctly for Zephyr, we can use the model to generate a response to the user's question: | |
| ```python | |
| outputs = model.generate(tokenized_chat, max_new_tokens=128) | |
| print(tokenizer.decode(outputs[0])) | |
| ``` | |
| This will yield: | |
| ```text | |
| <|system|> | |
| You are a friendly chatbot who always responds in the style of a pirate</s> | |
| <|user|> | |
| How many helicopters can a human eat in one sitting?</s> | |
| <|assistant|> | |
| Matey, I'm afraid I must inform ye that humans cannot eat helicopters. Helicopters are not food, they are flying machines. Food is meant to be eaten, like a hearty plate o' grog, a savory bowl o' stew, or a delicious loaf o' bread. But helicopters, they be for transportin' and movin' around, not for eatin'. So, I'd say none, me hearties. None at all. | |
| ``` | |
| Arr, 'twas easy after all! | |
| ## Is there an automated pipeline for chat? | |
| Yes, there is! Our text generation pipelines support chat inputs, which makes it easy to use chat models. In the past, | |
| we used to use a dedicated "ConversationalPipeline" class, but this has now been deprecated and its functionality | |
| has been merged into the [`TextGenerationPipeline`]. Let's try the `Zephyr` example again, but this time using | |
| a pipeline: | |
| ```python | |
| from transformers import pipeline | |
| pipe = pipeline("text-generation", "HuggingFaceH4/zephyr-7b-beta") | |
| messages = [ | |
| { | |
| "role": "system", | |
| "content": "You are a friendly chatbot who always responds in the style of a pirate", | |
| }, | |
| {"role": "user", "content": "How many helicopters can a human eat in one sitting?"}, | |
| ] | |
| print(pipe(messages, max_new_tokens=128)[0]['generated_text'][-1]) # Print the assistant's response | |
| ``` | |
| ```text | |
| {'role': 'assistant', 'content': "Matey, I'm afraid I must inform ye that humans cannot eat helicopters. Helicopters are not food, they are flying machines. Food is meant to be eaten, like a hearty plate o' grog, a savory bowl o' stew, or a delicious loaf o' bread. But helicopters, they be for transportin' and movin' around, not for eatin'. So, I'd say none, me hearties. None at all."} | |
| ``` | |
| The pipeline will take care of all the details of tokenization and calling `apply_chat_template` for you - | |
| once the model has a chat template, all you need to do is initialize the pipeline and pass it the list of messages! | |
| ## What are "generation prompts"? | |
| You may have noticed that the `apply_chat_template` method has an `add_generation_prompt` argument. This argument tells | |
| the template to add tokens that indicate the start of a bot response. For example, consider the following chat: | |
| ```python | |
| messages = [ | |
| {"role": "user", "content": "Hi there!"}, | |
| {"role": "assistant", "content": "Nice to meet you!"}, | |
| {"role": "user", "content": "Can I ask a question?"} | |
| ] | |
| ``` | |
| Here's what this will look like without a generation prompt, using the ChatML template we saw in the Zephyr example: | |
| ```python | |
| tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False) | |
| """<|im_start|>user | |
| Hi there!<|im_end|> | |
| <|im_start|>assistant | |
| Nice to meet you!<|im_end|> | |
| <|im_start|>user | |
| Can I ask a question?<|im_end|> | |
| """ | |
| ``` | |
| And here's what it looks like **with** a generation prompt: | |
| ```python | |
| tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| """<|im_start|>user | |
| Hi there!<|im_end|> | |
| <|im_start|>assistant | |
| Nice to meet you!<|im_end|> | |
| <|im_start|>user | |
| Can I ask a question?<|im_end|> | |
| <|im_start|>assistant | |
| """ | |
| ``` | |
| Note that this time, we've added the tokens that indicate the start of a bot response. This ensures that when the model | |
| generates text it will write a bot response instead of doing something unexpected, like continuing the user's | |
| message. Remember, chat models are still just language models - they're trained to continue text, and chat is just a | |
| special kind of text to them! You need to guide them with appropriate control tokens, so they know what they're | |
| supposed to be doing. | |
| Not all models require generation prompts. Some models, like BlenderBot and LLaMA, don't have any | |
| special tokens before bot responses. In these cases, the `add_generation_prompt` argument will have no effect. The exact | |
| effect that `add_generation_prompt` has will depend on the template being used. | |
| ## Can I use chat templates in training? | |
| Yes! We recommend that you apply the chat template as a preprocessing step for your dataset. After this, you | |
| can simply continue like any other language model training task. When training, you should usually set | |
| `add_generation_prompt=False`, because the added tokens to prompt an assistant response will not be helpful during | |
| training. Let's see an example: | |
| ```python | |
| from transformers import AutoTokenizer | |
| from datasets import Dataset | |
| tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta") | |
| chat1 = [ | |
| {"role": "user", "content": "Which is bigger, the moon or the sun?"}, | |
| {"role": "assistant", "content": "The sun."} | |
| ] | |
| chat2 = [ | |
| {"role": "user", "content": "Which is bigger, a virus or a bacterium?"}, | |
| {"role": "assistant", "content": "A bacterium."} | |
| ] | |
| dataset = Dataset.from_dict({"chat": [chat1, chat2]}) | |
| dataset = dataset.map(lambda x: {"formatted_chat": tokenizer.apply_chat_template(x["chat"], tokenize=False, add_generation_prompt=False)}) | |
| print(dataset['formatted_chat'][0]) | |
| ``` | |
| And we get: | |
| ```text | |
| <|user|> | |
| Which is bigger, the moon or the sun?</s> | |
| <|assistant|> | |
| The sun.</s> | |
| ``` | |
| From here, just continue training like you would with a standard language modelling task, using the `formatted_chat` column. | |
| ## Advanced: How do chat templates work? | |
| The chat template for a model is stored on the `tokenizer.chat_template` attribute. If no chat template is set, the | |
| default template for that model class is used instead. Let's take a look at the template for `BlenderBot`: | |
| ```python | |
| >>> from transformers import AutoTokenizer | |
| >>> tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill") | |
| >>> tokenizer.default_chat_template | |
| "{% for message in messages %}{% if message['role'] == 'user' %}{{ ' ' }}{% endif %}{{ message['content'] }}{% if not loop.last %}{{ ' ' }}{% endif %}{% endfor %}{{ eos_token }}" | |
| ``` | |
| That's kind of intimidating. Let's add some newlines and indentation to make it more readable. Note that the first | |
| newline after each block as well as any preceding whitespace before a block are ignored by default, using the | |
| Jinja `trim_blocks` and `lstrip_blocks` flags. However, be cautious - although leading whitespace on each | |
| line is stripped, spaces between blocks on the same line are not. We strongly recommend checking that your template | |
| isn't printing extra spaces where it shouldn't be! | |
| ``` | |
| {% for message in messages %} | |
| {% if message['role'] == 'user' %} | |
| {{ ' ' }} | |
| {% endif %} | |
| {{ message['content'] }} | |
| {% if not loop.last %} | |
| {{ ' ' }} | |
| {% endif %} | |
| {% endfor %} | |
| {{ eos_token }} | |
| ``` | |
| If you've never seen one of these before, this is a [Jinja template](https://jinja.palletsprojects.com/en/3.1.x/templates/). | |
| Jinja is a templating language that allows you to write simple code that generates text. In many ways, the code and | |
| syntax resembles Python. In pure Python, this template would look something like this: | |
| ```python | |
| for idx, message in enumerate(messages): | |
| if message['role'] == 'user': | |
| print(' ') | |
| print(message['content']) | |
| if not idx == len(messages) - 1: # Check for the last message in the conversation | |
| print(' ') | |
| print(eos_token) | |
| ``` | |
| Effectively, the template does three things: | |
| 1. For each message, if the message is a user message, add a blank space before it, otherwise print nothing. | |
| 2. Add the message content | |
| 3. If the message is not the last message, add two spaces after it. After the final message, print the EOS token. | |
| This is a pretty simple template - it doesn't add any control tokens, and it doesn't support "system" messages, which | |
| are a common way to give the model directives about how it should behave in the subsequent conversation. | |
| But Jinja gives you a lot of flexibility to do those things! Let's see a Jinja template that can format inputs | |
| similarly to the way LLaMA formats them (note that the real LLaMA template includes handling for default system | |
| messages and slightly different system message handling in general - don't use this one in your actual code!) | |
| ``` | |
| {% for message in messages %} | |
| {% if message['role'] == 'user' %} | |
| {{ bos_token + '[INST] ' + message['content'] + ' [/INST]' }} | |
| {% elif message['role'] == 'system' %} | |
| {{ '<<SYS>>\\n' + message['content'] + '\\n<</SYS>>\\n\\n' }} | |
| {% elif message['role'] == 'assistant' %} | |
| {{ ' ' + message['content'] + ' ' + eos_token }} | |
| {% endif %} | |
| {% endfor %} | |
| ``` | |
| Hopefully if you stare at this for a little bit you can see what this template is doing - it adds specific tokens based | |
| on the "role" of each message, which represents who sent it. User, assistant and system messages are clearly | |
| distinguishable to the model because of the tokens they're wrapped in. | |
| ## Advanced: Adding and editing chat templates | |
| ### How do I create a chat template? | |
| Simple, just write a jinja template and set `tokenizer.chat_template`. You may find it easier to start with an | |
| existing template from another model and simply edit it for your needs! For example, we could take the LLaMA template | |
| above and add "[ASST]" and "[/ASST]" to assistant messages: | |
| ``` | |
| {% for message in messages %} | |
| {% if message['role'] == 'user' %} | |
| {{ bos_token + '[INST] ' + message['content'].strip() + ' [/INST]' }} | |
| {% elif message['role'] == 'system' %} | |
| {{ '<<SYS>>\\n' + message['content'].strip() + '\\n<</SYS>>\\n\\n' }} | |
| {% elif message['role'] == 'assistant' %} | |
| {{ '[ASST] ' + message['content'] + ' [/ASST]' + eos_token }} | |
| {% endif %} | |
| {% endfor %} | |
| ``` | |
| Now, simply set the `tokenizer.chat_template` attribute. Next time you use [`~PreTrainedTokenizer.apply_chat_template`], it will | |
| use your new template! This attribute will be saved in the `tokenizer_config.json` file, so you can use | |
| [`~utils.PushToHubMixin.push_to_hub`] to upload your new template to the Hub and make sure everyone's using the right | |
| template for your model! | |
| ```python | |
| template = tokenizer.chat_template | |
| template = template.replace("SYS", "SYSTEM") # Change the system token | |
| tokenizer.chat_template = template # Set the new template | |
| tokenizer.push_to_hub("model_name") # Upload your new template to the Hub! | |
| ``` | |
| The method [`~PreTrainedTokenizer.apply_chat_template`] which uses your chat template is called by the [`TextGenerationPipeline`] class, so | |
| once you set the correct chat template, your model will automatically become compatible with [`TextGenerationPipeline`]. | |
| <Tip> | |
| If you're fine-tuning a model for chat, in addition to setting a chat template, you should probably add any new chat | |
| control tokens as special tokens in the tokenizer. Special tokens are never split, | |
| ensuring that your control tokens are always handled as single tokens rather than being tokenized in pieces. You | |
| should also set the tokenizer's `eos_token` attribute to the token that marks the end of assistant generations in your | |
| template. This will ensure that text generation tools can correctly figure out when to stop generating text. | |
| </Tip> | |
| ### What are "default" templates? | |
| Before the introduction of chat templates, chat handling was hardcoded at the model class level. For backwards | |
| compatibility, we have retained this class-specific handling as default templates, also set at the class level. If a | |
| model does not have a chat template set, but there is a default template for its model class, the `TextGenerationPipeline` | |
| class and methods like `apply_chat_template` will use the class template instead. You can find out what the default | |
| template for your tokenizer is by checking the `tokenizer.default_chat_template` attribute. | |
| This is something we do purely for backward compatibility reasons, to avoid breaking any existing workflows. Even when | |
| the class template is appropriate for your model, we strongly recommend overriding the default template by | |
| setting the `chat_template` attribute explicitly to make it clear to users that your model has been correctly configured | |
| for chat, and to future-proof in case the default templates are ever altered or deprecated. | |
| ### What template should I use? | |
| When setting the template for a model that's already been trained for chat, you should ensure that the template | |
| exactly matches the message formatting that the model saw during training, or else you will probably experience | |
| performance degradation. This is true even if you're training the model further - you will probably get the best | |
| performance if you keep the chat tokens constant. This is very analogous to tokenization - you generally get the | |
| best performance for inference or fine-tuning when you precisely match the tokenization used during training. | |
| If you're training a model from scratch, or fine-tuning a base language model for chat, on the other hand, | |
| you have a lot of freedom to choose an appropriate template! LLMs are smart enough to learn to handle lots of different | |
| input formats. Our default template for models that don't have a class-specific template follows the | |
| [ChatML format](https://github.com/openai/openai-python/blob/main/chatml.md), and this is a good, flexible choice for many use-cases. It looks like this: | |
| ``` | |
| {% for message in messages %} | |
| {{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}} | |
| {% endfor %} | |
| ``` | |
| If you like this one, here it is in one-liner form, ready to copy into your code. The one-liner also includes | |
| handy support for [generation prompts](#what-are-generation-prompts), but note that it doesn't add BOS or EOS tokens! | |
| If your model expects those, they won't be added automatically by `apply_chat_template` - in other words, the | |
| text will be tokenized with `add_special_tokens=False`. This is to avoid potential conflicts between the template and | |
| the `add_special_tokens` logic. If your model expects special tokens, make sure to add them to the template! | |
| ```python | |
| tokenizer.chat_template = "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}" | |
| ``` | |
| This template wraps each message in `<|im_start|>` and `<|im_end|>` tokens, and simply writes the role as a string, which | |
| allows for flexibility in the roles you train with. The output looks like this: | |
| ```text | |
| <|im_start|>system | |
| You are a helpful chatbot that will do its best not to say anything so stupid that people tweet about it.<|im_end|> | |
| <|im_start|>user | |
| How are you?<|im_end|> | |
| <|im_start|>assistant | |
| I'm doing great!<|im_end|> | |
| ``` | |
| The "user", "system" and "assistant" roles are the standard for chat, and we recommend using them when it makes sense, | |
| particularly if you want your model to operate well with [`TextGenerationPipeline`]. However, you are not limited | |
| to these roles - templating is extremely flexible, and any string can be a role. | |
| ### I want to add some chat templates! How should I get started? | |
| If you have any chat models, you should set their `tokenizer.chat_template` attribute and test it using | |
| [`~PreTrainedTokenizer.apply_chat_template`], then push the updated tokenizer to the Hub. This applies even if you're | |
| not the model owner - if you're using a model with an empty chat template, or one that's still using the default class | |
| template, please open a [pull request](https://huggingface.co/docs/hub/repositories-pull-requests-discussions) to the model repository so that this attribute can be set properly! | |
| Once the attribute is set, that's it, you're done! `tokenizer.apply_chat_template` will now work correctly for that | |
| model, which means it is also automatically supported in places like `TextGenerationPipeline`! | |
| By ensuring that models have this attribute, we can make sure that the whole community gets to use the full power of | |
| open-source models. Formatting mismatches have been haunting the field and silently harming performance for too long - | |
| it's time to put an end to them! | |
| ## Advanced: Template writing tips | |
| If you're unfamiliar with Jinja, we generally find that the easiest way to write a chat template is to first | |
| write a short Python script that formats messages the way you want, and then convert that script into a template. | |
| Remember that the template handler will receive the conversation history as a variable called `messages`. Each | |
| message is a dictionary with two keys, `role` and `content`. You will be able to access `messages` in your template | |
| just like you can in Python, which means you can loop over it with `{% for message in messages %}` or access | |
| individual messages with, for example, `{{ messages[0] }}`. | |
| You can also use the following tips to convert your code to Jinja: | |
| ### For loops | |
| For loops in Jinja look like this: | |
| ``` | |
| {% for message in messages %} | |
| {{ message['content'] }} | |
| {% endfor %} | |
| ``` | |
| Note that whatever's inside the {{ expression block }} will be printed to the output. You can use operators like | |
| `+` to combine strings inside expression blocks. | |
| ### If statements | |
| If statements in Jinja look like this: | |
| ``` | |
| {% if message['role'] == 'user' %} | |
| {{ message['content'] }} | |
| {% endif %} | |
| ``` | |
| Note how where Python uses whitespace to mark the beginnings and ends of `for` and `if` blocks, Jinja requires you | |
| to explicitly end them with `{% endfor %}` and `{% endif %}`. | |
| ### Special variables | |
| Inside your template, you will have access to the list of `messages`, but you can also access several other special | |
| variables. These include special tokens like `bos_token` and `eos_token`, as well as the `add_generation_prompt` | |
| variable that we discussed above. You can also use the `loop` variable to access information about the current loop | |
| iteration, for example using `{% if loop.last %}` to check if the current message is the last message in the | |
| conversation. Here's an example that puts these ideas together to add a generation prompt at the end of the | |
| conversation if add_generation_prompt is `True`: | |
| ``` | |
| {% if loop.last and add_generation_prompt %} | |
| {{ bos_token + 'Assistant:\n' }} | |
| {% endif %} | |
| ``` | |
| ### Notes on whitespace | |
| As much as possible, we've tried to get Jinja to ignore whitespace outside of {{ expressions }}. However, be aware | |
| that Jinja is a general-purpose templating engine, and it may treat whitespace between blocks on the same line | |
| as significant and print it to the output. We **strongly** recommend checking that your template isn't printing extra | |
| spaces where it shouldn't be before you upload it! |