Chat template to make vLLM deployment working?

#4
by vrdn23 - opened

Hey folks,
Thanks for the model! Is there a chat_template.json/jinja to be added to make it vLLM compatible?
cc @RyanMullins

Currently deployment start-up succeeds, but inference fails with this error

curl -i http://localhost:8000/v1/chat/completions \                                                                                                                                        
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "user", "content": "9.11 and 9.8, which is greater?" }
    ], "max_tokens": 250,"reasoning_effort": "low",
    "extra_body": {
      "thinking_token_budget": 10
    }
  }'
HTTP/1.1 400 Bad Request
date: Mon, 11 May 2026 21:59:40 GMT
server: uvicorn
content-length: 216
content-type: application/json

{"error":{"message":"As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.","type":"BadRequestError","param":null,"code":400}}%                                                                   

Hello, this is a base model without instruction-tuning. For common chat, I think you should use google/gemma-4-E4B-it instead, and there is a chat_template.jinja file in it.

Sign up or log in to comment