FoolDev Claude Opus 4.7 commited on
Commit
80f4494
·
1 Parent(s): ab19d26

Modelfile: enable Ollama tool calling via TEMPLATE directive

Browse files

Ollama's tool-capability detector reads the Modelfile TEMPLATE for
.Tools / .ToolCalls references. Without one, /api/chat and
/v1/chat/completions reject any request carrying a `tools` array with
"<model> does not support tools" — even though the GGUF's embedded
jinja handles tool calls fine.

Ship the same Qwen 3.6 ChatML (Go-template form) the 35B sibling uses,
including the JSON-in-<tool_call> envelope Ollama's parser understands.
After `make build`, `ollama show janus-27b` lists `tools` and
`thinking` under Capabilities, and the existing
`examples/ollama_chat.py:tool_round_trip` helper now completes the
model -> tool -> model loop end-to-end against Ollama.

Drop the now-stale "[skip]" path from the demo, the docstring warning
on the helper, the README "Known limitations" bullet, and the
matching CHANGELOG entry that documented the gap.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Files changed (4) hide show
  1. CHANGELOG.md +11 -10
  2. Modelfile +62 -4
  3. README.md +22 -22
  4. examples/ollama_chat.py +2 -17
CHANGELOG.md CHANGED
@@ -7,17 +7,18 @@ and documentation**, not the underlying base model.
7
 
8
  ## [Unreleased]
9
 
10
- ### Documented
11
- - Tool calling via Ollama is currently rejected with `does not support
12
- tools` because the Modelfile has no `TEMPLATE` directive exposing the
13
- Qwen 3.6 tool-jinja blocks. Verified against both `/api/chat` and
14
- `/v1/chat/completions`. README "Tool / function calling" + Known
15
- limitations updated to reflect reality. `examples/ollama_chat.py:
16
- tool_round_trip` keeps the helper as a reference shape but adds a
17
- docstring warning, and the demo wraps the call in a try/except so it
18
- prints `[skip]` instead of crashing.
19
-
20
  ### Fixed
 
 
 
 
 
 
 
 
 
 
 
21
  - `Modelfile`: added explicit `PARAMETER stop` directives for `<|im_end|>`,
22
  `<|endoftext|>`, and `<|im_start|>`. Ollama was only picking up
23
  `<|im_end|>` from the GGUF metadata, so when the model emitted
 
7
 
8
  ## [Unreleased]
9
 
 
 
 
 
 
 
 
 
 
 
10
  ### Fixed
11
+ - `Modelfile`: ship a `TEMPLATE` directive mirroring Qwen 3.6 ChatML in
12
+ Ollama Go-template form, so Ollama's tool-capability detector sees
13
+ `.Tools` / `.ToolCalls` references. After `make build`, `ollama show
14
+ janus-27b` now lists `tools` and `thinking` under Capabilities, and
15
+ both `/api/chat` and `/v1/chat/completions` accept a `tools` array
16
+ (previously rejected with `does not support tools`). Same template as
17
+ the 35B sibling — both share the Qwen 3.6 chat format. Verified
18
+ end-to-end with `examples/ollama_chat.py:tool_round_trip` (model
19
+ emits a `<tool_call>`, helper executes the stub, model produces final
20
+ answer). README "Tool / function calling" rewritten and the
21
+ corresponding Known-limitations bullet removed.
22
  - `Modelfile`: added explicit `PARAMETER stop` directives for `<|im_end|>`,
23
  `<|endoftext|>`, and `<|im_start|>`. Ollama was only picking up
24
  `<|im_end|>` from the GGUF metadata, so when the model emitted
Modelfile CHANGED
@@ -1,9 +1,9 @@
1
  # Janus-27B — Ollama wrapper around Qwen 3.6 27B (dense)
2
  #
3
- # Text-only. Vision via Ollama is currently broken for this architecture
4
- # (ollama/ollama#15898 — the vendored llama.cpp fork is missing the
5
- # qwen35 arch entries). Use llama.cpp directly for image input, or wait
6
- # for the fix. See the Vision section in README.md.
7
  #
8
  # This repo does not redistribute weights. Edit the FROM line below to
9
  # point at a local Qwen 3.6 27B GGUF, then:
@@ -20,6 +20,64 @@
20
 
21
  FROM ./Qwen3.6-27B-Q4_K_M.gguf
22
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  # Sampling tuned for reasoning + general use. See README "Recommended sampling"
24
  # for creative/RP alternatives.
25
  PARAMETER temperature 0.6
 
1
  # Janus-27B — Ollama wrapper around Qwen 3.6 27B (dense)
2
  #
3
+ # Text + tool calling. Vision via Ollama is currently broken for this
4
+ # architecture (ollama/ollama#15898 — the vendored llama.cpp fork is
5
+ # missing the qwen35 arch entries). Use llama.cpp directly for image
6
+ # input, or wait for the fix. See the Vision section in README.md.
7
  #
8
  # This repo does not redistribute weights. Edit the FROM line below to
9
  # point at a local Qwen 3.6 27B GGUF, then:
 
20
 
21
  FROM ./Qwen3.6-27B-Q4_K_M.gguf
22
 
23
+ # Chat template — Qwen 3.6 ChatML in Ollama Go-template form, with the
24
+ # tool-calling blocks Ollama's capability detector looks for. Without a
25
+ # TEMPLATE that references .Tools and .ToolCalls, /api/chat and
26
+ # /v1/chat/completions reject any request carrying a `tools` array with
27
+ # `<model> does not support tools`. Same template as the 35B sibling —
28
+ # both share the Qwen 3.6 chat format.
29
+ TEMPLATE """{{- $lastUserIdx := -1 -}}
30
+ {{- range $idx, $msg := .Messages -}}
31
+ {{- if eq $msg.Role "user" }}{{ $lastUserIdx = $idx }}{{ end -}}
32
+ {{- end }}
33
+ {{- if or .System .Tools }}<|im_start|>system
34
+ {{ if .System }}{{ .System }}
35
+
36
+ {{ end }}
37
+ {{- if .Tools }}# Tools
38
+
39
+ You may call one or more functions to assist with the user query.
40
+
41
+ You are provided with function signatures within <tools></tools> XML tags:
42
+ <tools>
43
+ {{- range .Tools }}
44
+ {"type": "function", "function": {{ .Function }}}
45
+ {{- end }}
46
+ </tools>
47
+
48
+ For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
49
+ <tool_call>
50
+ {"name": <function-name>, "arguments": <args-json-object>}
51
+ </tool_call>
52
+ {{- end -}}<|im_end|>
53
+ {{ end }}
54
+ {{- range $i, $_ := .Messages }}
55
+ {{- $last := eq (len (slice $.Messages $i)) 1 -}}
56
+ {{- if eq .Role "user" }}<|im_start|>user
57
+ {{ .Content }}<|im_end|>
58
+ {{ else if eq .Role "assistant" }}<|im_start|>assistant
59
+ {{ if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) -}}
60
+ <think>{{ .Thinking }}</think>
61
+ {{ end -}}
62
+ {{ if .Content }}{{ .Content }}{{ end }}
63
+ {{- if .ToolCalls }}
64
+ {{- range .ToolCalls }}
65
+ <tool_call>
66
+ {"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
67
+ </tool_call>
68
+ {{- end }}
69
+ {{- end }}{{ if not $last }}<|im_end|>
70
+ {{ end }}
71
+ {{- else if eq .Role "tool" }}<|im_start|>user
72
+ <tool_response>
73
+ {{ .Content }}
74
+ </tool_response><|im_end|>
75
+ {{ end }}
76
+ {{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
77
+ <think>
78
+ {{ end }}
79
+ {{- end }}"""
80
+
81
  # Sampling tuned for reasoning + general use. See README "Recommended sampling"
82
  # for creative/RP alternatives.
83
  PARAMETER temperature 0.6
README.md CHANGED
@@ -306,40 +306,40 @@ client doesn't.
306
 
307
  #### Tool / function calling
308
 
309
- Qwen 3.6's chat template uses Qwen's XML format:
 
 
 
 
 
 
310
 
311
  ```text
312
  <tool_call>
313
- <function=get_current_weather>
314
- <parameter=city>Paris</parameter>
315
- <parameter=unit>celsius</parameter>
316
- </function>
317
  </tool_call>
318
  ```
319
 
320
- > **Tool calling via Ollama is currently disabled for this Modelfile.**
321
- > Both `/api/chat` and `/v1/chat/completions` reject requests with
322
- > `"<model> does not support tools"` because Ollama's tool-capability
323
- > detection requires an explicit Modelfile `TEMPLATE` directive
324
- > containing tool-jinja blocks, and we currently fall back to the
325
- > trivial `{{ .Prompt }}` template (the GGUF's embedded jinja isn't
326
- > picked up by Ollama's detector). Plain chat, streaming, and
327
- > system-prompt overrides all work — only the `tools` array is
328
- > rejected.
329
- >
330
- > If you need tool calling, use **llama.cpp** / **llama-cpp-python**
331
- > directly (they read the GGUF's embedded chat template), or write a
332
- > Modelfile `TEMPLATE` mirroring the official Qwen 3.6 chat template.
333
- > The reference Python helper `examples/ollama_chat.py:tool_round_trip`
334
- > is shipped for documentation but raises `HTTPError 400` against
335
- > Ollama until the above is fixed.
336
 
337
  ## Known limitations
338
 
339
  - **Slower per token than the 35B-A3B sibling.** Dense 27B beats sparse 35B/3B-active on steps-per-second benchmarks because every parameter contributes; if you optimize for tokens-per-second, the MoE wins.
340
  - **No mmproj in this release**, and **vision via Ollama is broken upstream** (qwen35/qwen35moe arch entries missing from Ollama's vendored llama.cpp fork — see the [Vision](#vision) section). For image input use llama.cpp directly until that's fixed.
341
  - **Q4_K_M quality loss** is real. Use Q5_K_M or Q6_K if you have the VRAM (~20-22 GB).
342
- - **Tool calling via Ollama is currently disabled** because the Modelfile has no `TEMPLATE` directive exposing the Qwen 3.6 tool-jinja blocks; Ollama returns `does not support tools` for any request with a `tools` array. Use llama.cpp directly for tool calling, or contribute a Modelfile `TEMPLATE`. See [Tool / function calling](#tool--function-calling).
343
  - **No formal evaluation in this card.** Numbers above are estimates.
344
 
345
  ## Related models
 
306
 
307
  #### Tool / function calling
308
 
309
+ The Modelfile ships with a `TEMPLATE` directive that exposes Qwen 3.6's
310
+ tool-calling blocks to Ollama. After `make build`, `ollama show
311
+ janus-27b` lists `tools` (and `thinking`) under **Capabilities**, and
312
+ both `/api/chat` and `/v1/chat/completions` accept a `tools` array.
313
+
314
+ The template prompts the model to emit tool calls as JSON inside
315
+ `<tool_call>` XML tags — the format Ollama's tool-call extractor parses:
316
 
317
  ```text
318
  <tool_call>
319
+ {"name": "get_current_weather", "arguments": {"city": "Paris", "unit": "celsius"}}
 
 
 
320
  </tool_call>
321
  ```
322
 
323
+ (The Qwen 3.6 base was trained on a more verbose XML form with
324
+ `<function=...>` / `<parameter=...>` blocks; the JSON-in-XML envelope
325
+ above is what Ollama's parser understands and what the sibling 35B
326
+ Modelfile uses in production.)
327
+
328
+ End-to-end exercise:
329
+
330
+ ```bash
331
+ python examples/ollama_chat.py # section 3 runs a real round-trip now
332
+ ```
333
+
334
+ If you'd rather drive llama.cpp / llama-cpp-python directly (no Ollama
335
+ in the loop) they read the GGUF's embedded jinja and accept either
336
+ format.
 
 
337
 
338
  ## Known limitations
339
 
340
  - **Slower per token than the 35B-A3B sibling.** Dense 27B beats sparse 35B/3B-active on steps-per-second benchmarks because every parameter contributes; if you optimize for tokens-per-second, the MoE wins.
341
  - **No mmproj in this release**, and **vision via Ollama is broken upstream** (qwen35/qwen35moe arch entries missing from Ollama's vendored llama.cpp fork — see the [Vision](#vision) section). For image input use llama.cpp directly until that's fixed.
342
  - **Q4_K_M quality loss** is real. Use Q5_K_M or Q6_K if you have the VRAM (~20-22 GB).
 
343
  - **No formal evaluation in this card.** Numbers above are estimates.
344
 
345
  ## Related models
examples/ollama_chat.py CHANGED
@@ -109,15 +109,7 @@ def fake_weather(city: str, unit: str) -> str:
109
 
110
 
111
  def tool_round_trip(prompt: str) -> str:
112
- """Single-shot tool call: model -> tool -> model -> final answer.
113
-
114
- NOTE: Currently fails against Ollama with HTTPError 400
115
- "<model> does not support tools" because the project Modelfile has
116
- no TEMPLATE directive exposing the Qwen 3.6 tool-jinja blocks. The
117
- function is shipped as a reference for the request shape — wire it
118
- against llama-cpp-python or a custom-templated Modelfile to actually
119
- run it. See README "Tool / function calling".
120
- """
121
  history: list[dict[str, Any]] = [{"role": "user", "content": prompt}]
122
  r = requests.post(
123
  f"{HOST}/api/chat",
@@ -191,14 +183,7 @@ def _demo() -> None:
191
  print()
192
 
193
  print("\n=== 3. tool round-trip ===")
194
- try:
195
- print(tool_round_trip("What is the weather in Paris in celsius?"))
196
- except requests.HTTPError as e:
197
- if e.response is not None and "does not support tools" in e.response.text:
198
- print("[skip] Ollama refuses tools for this Modelfile (no TEMPLATE).")
199
- print(" See README 'Tool / function calling' for context.")
200
- else:
201
- raise
202
 
203
  print("\n=== 4. OpenAI-compat ===")
204
  print(openai_chat("Say 'OpenAI endpoint OK' and nothing else."))
 
109
 
110
 
111
  def tool_round_trip(prompt: str) -> str:
112
+ """Single-shot tool call: model -> tool -> model -> final answer."""
 
 
 
 
 
 
 
 
113
  history: list[dict[str, Any]] = [{"role": "user", "content": prompt}]
114
  r = requests.post(
115
  f"{HOST}/api/chat",
 
183
  print()
184
 
185
  print("\n=== 3. tool round-trip ===")
186
+ print(tool_round_trip("What is the weather in Paris in celsius?"))
 
 
 
 
 
 
 
187
 
188
  print("\n=== 4. OpenAI-compat ===")
189
  print(openai_chat("Say 'OpenAI endpoint OK' and nothing else."))