FoolDev Claude Opus 4.7 commited on
Commit
6711c39
·
1 Parent(s): c61756c

Document Ollama tool-calling gap honestly

Browse files

Ollama refuses tool-call requests with "<model> does not support tools"
on both /api/chat and /v1/chat/completions because our Modelfile has no
TEMPLATE directive exposing the Qwen 3.6 tool-jinja blocks — Ollama
falls back to the trivial {{ .Prompt }} template and its capability
detector reports completion only.

README "Tool / function calling" and Known limitations now reflect
reality. examples/ollama_chat.py:tool_round_trip keeps the function as
a reference for the request shape but adds a docstring warning, and the
demo wraps the call in try/except so it prints [skip] instead of
crashing with HTTPError 400.

Plain chat, streaming, system overrides, and the OpenAI-compat endpoint
all still work — verified end-to-end against the running daemon.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Files changed (3) hide show
  1. CHANGELOG.md +10 -0
  2. README.md +18 -4
  3. examples/ollama_chat.py +17 -2
CHANGELOG.md CHANGED
@@ -7,6 +7,16 @@ and documentation**, not the underlying base model.
7
 
8
  ## [Unreleased]
9
 
 
 
 
 
 
 
 
 
 
 
10
  ### Fixed
11
  - `Modelfile`: added explicit `PARAMETER stop` directives for `<|im_end|>`,
12
  `<|endoftext|>`, and `<|im_start|>`. Ollama was only picking up
 
7
 
8
  ## [Unreleased]
9
 
10
+ ### Documented
11
+ - Tool calling via Ollama is currently rejected with `does not support
12
+ tools` because the Modelfile has no `TEMPLATE` directive exposing the
13
+ Qwen 3.6 tool-jinja blocks. Verified against both `/api/chat` and
14
+ `/v1/chat/completions`. README "Tool / function calling" + Known
15
+ limitations updated to reflect reality. `examples/ollama_chat.py:
16
+ tool_round_trip` keeps the helper as a reference shape but adds a
17
+ docstring warning, and the demo wraps the call in a try/except so it
18
+ prints `[skip]` instead of crashing.
19
+
20
  ### Fixed
21
  - `Modelfile`: added explicit `PARAMETER stop` directives for `<|im_end|>`,
22
  `<|endoftext|>`, and `<|im_start|>`. Ollama was only picking up
README.md CHANGED
@@ -295,7 +295,7 @@ client doesn't.
295
 
296
  #### Tool / function calling
297
 
298
- The embedded template uses Qwen's XML format:
299
 
300
  ```text
301
  <tool_call>
@@ -306,15 +306,29 @@ The embedded template uses Qwen's XML format:
306
  </tool_call>
307
  ```
308
 
309
- Most OpenAI-compatible servers (Ollama, LM Studio, vLLM) translate
310
- between this and the JSON `tool_calls` shape automatically. See
311
- `examples/ollama_chat.py:tool_round_trip` for a working round-trip.
 
 
 
 
 
 
 
 
 
 
 
 
 
312
 
313
  ## Known limitations
314
 
315
  - **Slower per token than the 35B-A3B sibling.** Dense 27B beats sparse 35B/3B-active on steps-per-second benchmarks because every parameter contributes; if you optimize for tokens-per-second, the MoE wins.
316
  - **No mmproj in this release**, and **vision via Ollama is broken upstream** (qwen35/qwen35moe arch entries missing from Ollama's vendored llama.cpp fork — see the [Vision](#vision) section). For image input use llama.cpp directly until that's fixed.
317
  - **Q4_K_M quality loss** is real. Use Q5_K_M or Q6_K if you have the VRAM (~20-22 GB).
 
318
  - **No formal evaluation in this card.** Numbers above are estimates.
319
 
320
  ## Related models
 
295
 
296
  #### Tool / function calling
297
 
298
+ Qwen 3.6's chat template uses Qwen's XML format:
299
 
300
  ```text
301
  <tool_call>
 
306
  </tool_call>
307
  ```
308
 
309
+ > **Tool calling via Ollama is currently disabled for this Modelfile.**
310
+ > Both `/api/chat` and `/v1/chat/completions` reject requests with
311
+ > `"<model> does not support tools"` because Ollama's tool-capability
312
+ > detection requires an explicit Modelfile `TEMPLATE` directive
313
+ > containing tool-jinja blocks, and we currently fall back to the
314
+ > trivial `{{ .Prompt }}` template (the GGUF's embedded jinja isn't
315
+ > picked up by Ollama's detector). Plain chat, streaming, and
316
+ > system-prompt overrides all work — only the `tools` array is
317
+ > rejected.
318
+ >
319
+ > If you need tool calling, use **llama.cpp** / **llama-cpp-python**
320
+ > directly (they read the GGUF's embedded chat template), or write a
321
+ > Modelfile `TEMPLATE` mirroring the official Qwen 3.6 chat template.
322
+ > The reference Python helper `examples/ollama_chat.py:tool_round_trip`
323
+ > is shipped for documentation but raises `HTTPError 400` against
324
+ > Ollama until the above is fixed.
325
 
326
  ## Known limitations
327
 
328
  - **Slower per token than the 35B-A3B sibling.** Dense 27B beats sparse 35B/3B-active on steps-per-second benchmarks because every parameter contributes; if you optimize for tokens-per-second, the MoE wins.
329
  - **No mmproj in this release**, and **vision via Ollama is broken upstream** (qwen35/qwen35moe arch entries missing from Ollama's vendored llama.cpp fork — see the [Vision](#vision) section). For image input use llama.cpp directly until that's fixed.
330
  - **Q4_K_M quality loss** is real. Use Q5_K_M or Q6_K if you have the VRAM (~20-22 GB).
331
+ - **Tool calling via Ollama is currently disabled** because the Modelfile has no `TEMPLATE` directive exposing the Qwen 3.6 tool-jinja blocks; Ollama returns `does not support tools` for any request with a `tools` array. Use llama.cpp directly for tool calling, or contribute a Modelfile `TEMPLATE`. See [Tool / function calling](#tool--function-calling).
332
  - **No formal evaluation in this card.** Numbers above are estimates.
333
 
334
  ## Related models
examples/ollama_chat.py CHANGED
@@ -109,7 +109,15 @@ def fake_weather(city: str, unit: str) -> str:
109
 
110
 
111
  def tool_round_trip(prompt: str) -> str:
112
- """Single-shot tool call: model -> tool -> model -> final answer."""
 
 
 
 
 
 
 
 
113
  history: list[dict[str, Any]] = [{"role": "user", "content": prompt}]
114
  r = requests.post(
115
  f"{HOST}/api/chat",
@@ -183,7 +191,14 @@ def _demo() -> None:
183
  print()
184
 
185
  print("\n=== 3. tool round-trip ===")
186
- print(tool_round_trip("What is the weather in Paris in celsius?"))
 
 
 
 
 
 
 
187
 
188
  print("\n=== 4. OpenAI-compat ===")
189
  print(openai_chat("Say 'OpenAI endpoint OK' and nothing else."))
 
109
 
110
 
111
  def tool_round_trip(prompt: str) -> str:
112
+ """Single-shot tool call: model -> tool -> model -> final answer.
113
+
114
+ NOTE: Currently fails against Ollama with HTTPError 400
115
+ "<model> does not support tools" because the project Modelfile has
116
+ no TEMPLATE directive exposing the Qwen 3.6 tool-jinja blocks. The
117
+ function is shipped as a reference for the request shape — wire it
118
+ against llama-cpp-python or a custom-templated Modelfile to actually
119
+ run it. See README "Tool / function calling".
120
+ """
121
  history: list[dict[str, Any]] = [{"role": "user", "content": prompt}]
122
  r = requests.post(
123
  f"{HOST}/api/chat",
 
191
  print()
192
 
193
  print("\n=== 3. tool round-trip ===")
194
+ try:
195
+ print(tool_round_trip("What is the weather in Paris in celsius?"))
196
+ except requests.HTTPError as e:
197
+ if e.response is not None and "does not support tools" in e.response.text:
198
+ print("[skip] Ollama refuses tools for this Modelfile (no TEMPLATE).")
199
+ print(" See README 'Tool / function calling' for context.")
200
+ else:
201
+ raise
202
 
203
  print("\n=== 4. OpenAI-compat ===")
204
  print(openai_chat("Say 'OpenAI endpoint OK' and nothing else."))