Chat template for vLLM compatibility?

by vrdn23 - opened May 11

May 11

Hey folks,
Thanks for the model! Is there a chat_template.json/jinja to be added to make it vLLM compatible?
cc @RyanMullins

Currently deployment start-up succeeds, but inference fails with this error

curl -i http://localhost:8000/v1/chat/completions \                                                                                                                                        
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "user", "content": "9.11 and 9.8, which is greater?" }
    ], "max_tokens": 250,"reasoning_effort": "low",
    "extra_body": {
      "thinking_token_budget": 10
    }
  }'
HTTP/1.1 400 Bad Request
date: Mon, 11 May 2026 21:59:40 GMT
server: uvicorn
content-length: 216
content-type: application/json

{"error":{"message":"As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.","type":"BadRequestError","param":null,"code":400}}%

thnamratha

Google org May 12

Hi @vrdn23 ,

Thanks for addressing this issue!
The error message "As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one." you are seeing related to transformers. To resolve this, could you please upgrade to the latest version transformers >= 5.5.0 which supports Gemma4 and let us know if the issue still persists. Also, could you please clarify which specific model you are using? Is it the pretrained (base) version or the IT (Instruction Tuned) model?

vrdn23

May 13

Hi @thnamratha ,

Thanks for responding back! But I don't think the issue is related to the transformers version not being up-to-date. I am using vLLM 0.19.1 (which has transformers >=5 support https://github.com/vllm-project/vllm/releases/tag/v0.19.1) and you can see that the package installed is a higher transformers version.

root@llm-gemma-4-e2b-599895b557-znhw2:/app# uv pip show transformers
Name: transformers
Version: 5.6.2
Location: /app/.venv/lib/python3.12/site-packages
Requires: huggingface-hub, numpy, packaging, pyyaml, regex, safetensors, tokenizers, tqdm, typer
Required-by: compressed-tensors, vllm, xgrammar
root@llm-gemma-4-e2b-599895b557-znhw2:/app# curl -i http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{"messages": [{ "role": "user", "content": "9.11 and 9.8, which is greater?" }], "max_tokens": 250,"reasoning_effort": "low","extra_body": {"thinking_token_budget": 10}}'
HTTP/1.1 400 Bad Request
date: Wed, 13 May 2026 23:42:54 GMT
server: uvicorn
content-length: 216
content-type: application/json

{"error":{"message":"As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.","type":"BadRequestError","param":null,"code":400}}root@llm-gemma-4-e2b-599895b557-znhw2:/app#

I am using the model that is provided as part of this repo as-is, and it seems to me since the transformers v5.0 update, all models that support chat completions require a chat_template.jinja or json to be present in order to serve the request. Please let me know if there is something I've misunderstood here. Thank you again for the quick response!

thnamratha

Google org May 22

Hi @vrdn23 ,

Thanks for the information. Could you please confirm if you are using the E2B version? If yes, there you will not get chat template for E2B model, so could you please try using the IT (instruction-tuned) models instead?