Chat template for vLLM compatibility?

#8
by vrdn23 - opened

Hey folks,
Thanks for the model! Is there a chat_template.json/jinja to be added to make it vLLM compatible?
cc @RyanMullins

Currently deployment start-up succeeds, but inference fails with this error

curl -i http://localhost:8000/v1/chat/completions \                                                                                                                                        
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "user", "content": "9.11 and 9.8, which is greater?" }
    ], "max_tokens": 250,"reasoning_effort": "low",
    "extra_body": {
      "thinking_token_budget": 10
    }
  }'
HTTP/1.1 400 Bad Request
date: Mon, 11 May 2026 21:59:40 GMT
server: uvicorn
content-length: 216
content-type: application/json

{"error":{"message":"As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.","type":"BadRequestError","param":null,"code":400}}%                                                                   
Google org

Hi @vrdn23 ,

Thanks for addressing this issue!
The error message "As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one." you are seeing related to transformers. To resolve this, could you please upgrade to the latest version transformers >= 5.5.0 which supports Gemma4 and let us know if the issue still persists. Also, could you please clarify which specific model you are using? Is it the pretrained (base) version or the IT (Instruction Tuned) model?

Hi @thnamratha ,

Thanks for responding back! But I don't think the issue is related to the transformers version not being up-to-date. I am using vLLM 0.19.1 (which has transformers >=5 support https://github.com/vllm-project/vllm/releases/tag/v0.19.1) and you can see that the package installed is a higher transformers version.

root@llm-gemma-4-e2b-599895b557-znhw2:/app# uv pip show transformers
Name: transformers
Version: 5.6.2
Location: /app/.venv/lib/python3.12/site-packages
Requires: huggingface-hub, numpy, packaging, pyyaml, regex, safetensors, tokenizers, tqdm, typer
Required-by: compressed-tensors, vllm, xgrammar
root@llm-gemma-4-e2b-599895b557-znhw2:/app# curl -i http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{"messages": [{ "role": "user", "content": "9.11 and 9.8, which is greater?" }], "max_tokens": 250,"reasoning_effort": "low","extra_body": {"thinking_token_budget": 10}}'
HTTP/1.1 400 Bad Request
date: Wed, 13 May 2026 23:42:54 GMT
server: uvicorn
content-length: 216
content-type: application/json

{"error":{"message":"As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.","type":"BadRequestError","param":null,"code":400}}root@llm-gemma-4-e2b-599895b557-znhw2:/app#

I am using the model that is provided as part of this repo as-is, and it seems to me since the transformers v5.0 update, all models that support chat completions require a chat_template.jinja or json to be present in order to serve the request. Please let me know if there is something I've misunderstood here. Thank you again for the quick response!

Sign up or log in to comment