prompt template

#33

by maymo - opened 28 days ago

I fine tuned Gemma-3 and Gemma-4 for machine translation and the result of Gemma-4-31-it is much worse compared to Gemma-3-12B.
I am wondering if the cause is the chat_template. When applying the chat template using the tokenizer of Gemma-4 on a list of messages I get the following:

msg = [
{
"role": "user",
"content": "Translate the text below from English to German:\nRegulation"
},
{
"role": "assistant",
"content": "Regulierung"
}
]
tokenizer.apply_chat_template(msg, add_generation_prompt=True, enable_thinking=False, tokenize=False)

It adds the assistance/model twice.

For inference, I have the same issue, for example:
msg = [
{
"role": "user",
"content": "Translate the text below from English to German:\nRegulation"
}
]
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

tokenizer.apply_chat_template(msg, add_generation_prompt=True, tokenize=False, enable_thinking=False)

tokenizer.apply_chat_template(
msg,
add_generation_prompt=True,
enable_thinking=False,
tokenize=False
)

And the model generate garbage, but if I remove <|channel>thought\n<channel|>, the model generate text that makes sense.

I am wondering if the chat template makes the training of Gemma-4 results much worse than Gemma-3.

maymo changed discussion status to closed 28 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment