FoolDev commited on
Commit
33458f7
·
1 Parent(s): 70c2f62

Add HF Ollama bridge files (template/system/params) + fix mmproj filename collision

Browse files

The HF Ollama bridge does NOT read Modelfile (per docs at
https://huggingface.co/docs/hub/en/ollama). When users do
'ollama run hf.co/FoolDev/janus-27b' the bridge generates a manifest
from three root-level files: 'template' (Ollama Go format), 'system'
(plain text), and 'params' (JSON). Without those files HF auto-converts
the GGUF's embedded jinja chat template to Ollama Go format, and that
conversion is buggy: produces '{{ if .Prompt }} .Prompt }}<|im_end|>'
(missing user-role wrapper, malformed value substitution), corrupted
stop tokens including the literal string '.Prompt }}<|im_end|>', and
no .Tools/.ToolCalls blocks — so 'ollama show hf.co/FoolDev/janus-27b'
reports only the 'completion' capability and rejects any /api/chat or
/v1/chat/completions request carrying a tools array.

Added template, system, params at repo root (mirrors the Modelfile's
TEMPLATE/SYSTEM/PARAMETER directives). Both routes now wire .Tools
and tool calling works end-to-end on either path.

Also renamed scripts/fetch_mmproj.sh -> scripts/fetch_vision.sh: HF's
Ollama bridge was filename-pattern-matching mmproj* anywhere in the
repo and shipping the 2028-byte bash script as the
application/vnd.ollama.image.projector layer. When Ollama tried to
load that 'projector' as a GGUF it failed the magic-bytes check —
'Error: invalid file magic' on every ollama show / ollama run.
Renaming breaks the pattern match; projector layer drops from the
manifest. Updated Makefile mmproj target and README references to
point at the new name.

README updated: 'What's here' table lists template/system/params;
'Quick start' / 'Local apps' / 'Chat template' sections corrected to
say HF bridge uses the three files (not Modelfile).

Files changed (7) hide show
  1. CHANGELOG.md +28 -0
  2. Makefile +1 -1
  3. README.md +25 -14
  4. params +12 -0
  5. scripts/{fetch_mmproj.sh → fetch_vision.sh} +3 -3
  6. system +10 -0
  7. template +51 -0
CHANGELOG.md CHANGED
@@ -7,6 +7,34 @@ and documentation**, not the underlying base model.
7
 
8
  ## [Unreleased]
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ### Changed
11
  - README "Tool / function calling" section: split into explicit
12
  Ollama-path and embedded-jinja-path subsections. The two loader
 
7
 
8
  ## [Unreleased]
9
 
10
+ ### Added
11
+ - Root-level `template`, `system`, and `params` files for HF's Ollama
12
+ bridge. The bridge generates Ollama manifests at request time from
13
+ these three files (NOT from `Modelfile` — confirmed against
14
+ https://huggingface.co/docs/hub/en/ollama). Without them, `ollama
15
+ run hf.co/FoolDev/janus-27b` got an auto-generated manifest with
16
+ the broken `{{ if .Prompt }} .Prompt }}<|im_end|>` template
17
+ (Ollama's faulty Go-template conversion of the GGUF's embedded
18
+ jinja), corrupted stop tokens (`".Prompt }}<|im_end|>"` bleed),
19
+ and no `.Tools` / `.ToolCalls` blocks — so the published Ollama
20
+ tag advertised `completion` only, rejected any request with a
21
+ `tools` array, and was actually broken to load (see "Fixed" below
22
+ re: the projector layer). The three files mirror the `Modelfile`'s
23
+ `TEMPLATE` / `SYSTEM` / `PARAMETER` directives; both routes wire
24
+ tool calling correctly. Edit them together when changing one.
25
+
26
+ ### Fixed
27
+ - Renamed `scripts/fetch_mmproj.sh` → `scripts/fetch_vision.sh`. HF's
28
+ Ollama bridge was filename-pattern-matching `mmproj*` anywhere in
29
+ the repo and shipping `scripts/fetch_mmproj.sh` (a 2028-byte bash
30
+ script) as the `application/vnd.ollama.image.projector` layer. When
31
+ Ollama tried to load that "projector" as a GGUF, it failed the
32
+ magic-bytes check and `ollama show` / `ollama run` produced
33
+ `Error: invalid file magic`. Renaming the script breaks the pattern
34
+ match and the projector layer is no longer added to the manifest.
35
+ Updated `Makefile` (`mmproj` target) and README references to point
36
+ at the new name.
37
+
38
  ### Changed
39
  - README "Tool / function calling" section: split into explicit
40
  Ollama-path and embedded-jinja-path subsections. The two loader
Makefile CHANGED
@@ -46,7 +46,7 @@ bench: ## Measure tok/s using Ollama's eval timing (3 prompts).
46
  MODEL=$(MODEL) ./scripts/bench.sh
47
 
48
  mmproj: ## Fetch the vision projector for llama.cpp (Ollama vision is broken upstream).
49
- ./scripts/fetch_mmproj.sh $(PRECISION)
50
 
51
  check: ## Lint shell + python files; block dot-pattern footgun.
52
  ./scripts/check.sh
 
46
  MODEL=$(MODEL) ./scripts/bench.sh
47
 
48
  mmproj: ## Fetch the vision projector for llama.cpp (Ollama vision is broken upstream).
49
+ ./scripts/fetch_vision.sh $(PRECISION)
50
 
51
  check: ## Lint shell + python files; block dot-pattern footgun.
52
  ./scripts/check.sh
README.md CHANGED
@@ -108,12 +108,13 @@ The 27B is **dense**: every parameter participates in every forward pass. It's s
108
  | File | Use |
109
  |---|---|
110
  | `banner.svg` / `banner.png` | Repo header, Tokyo Night themed |
111
- | `Modelfile` | Ollama wrapper around the upstream Qwen 3.6 27B GGUF (Q4_K_M) |
 
112
  | `examples/` | Ready-to-run Python clients for Ollama, Transformers, and llama-cpp-python |
113
  | `scripts/build.sh` | One-shot helper: pulls a GGUF and runs `ollama create` for you |
114
  | `scripts/smoke_test.sh` | Verifies an Ollama daemon + model, runs a round-trip, and asserts no chat-template tokens leak into the response |
115
  | `scripts/bench.sh` | Measures real tok/s using Ollama's `eval_count` / `eval_duration` metadata over a 3-prompt mix (run `make bench`) |
116
- | `scripts/fetch_mmproj.sh` | Pulls the vision projector for llama.cpp (Ollama vision is broken upstream — see [Vision](#vision)) |
117
  | `scripts/check.sh` | Local lint: `bash -n`, `pyflakes`, `py_compile`, footgun-grep |
118
  | `scripts/install-hooks.sh` | Installs `check.sh` as a git pre-commit hook |
119
  | `Makefile` | Convenience wrapper — `make help` lists targets |
@@ -133,8 +134,9 @@ ollama run hf.co/FoolDev/janus-27b:Q3_K_S # tighter quant
133
 
134
  For other quants or local builds, pull from
135
  [`unsloth/Qwen3.6-27B-GGUF`](https://huggingface.co/unsloth/Qwen3.6-27B-GGUF)
136
- and `make build QUANT=...` the Modelfile here is the same one Ollama
137
- applies in either path.
 
138
 
139
  If you want the safetensors for `transformers`, fetch them from [`Qwen/Qwen3.6-27B`](https://huggingface.co/Qwen/Qwen3.6-27B).
140
 
@@ -193,7 +195,7 @@ local app — point it at this repo and pick a quant.
193
 
194
  | App | How to load this model |
195
  |---|---|
196
- | **Ollama** | `ollama run hf.co/FoolDev/janus-27b` (or `:Q3_K_S`). Pulls the GGUF + Modelfile (TEMPLATE, sampling, stop tokens, tool calling) in one step. |
197
  | **LM Studio** | Search → `FoolDev/janus-27b` → pick `Janus-27B.Q4_K_M.gguf` or `Janus-27B.Q3_K_S.gguf`. Uses the GGUF's embedded jinja chat template (Qwen 3.6 ChatML); set the system prompt manually from the `SYSTEM` block in this repo's `Modelfile`. |
198
  | **Jan** | Hub → "Import from Hugging Face" → `FoolDev/janus-27b`. Same template behavior as LM Studio. |
199
  | **llama.cpp** | `hf download FoolDev/janus-27b Janus-27B.Q4_K_M.gguf --local-dir .` then `llama-server -m Janus-27B.Q4_K_M.gguf` (or `llama-cli`, `llama-mtmd-cli` for vision via the upstream `mmproj-F16.gguf`). |
@@ -201,10 +203,12 @@ local app — point it at this repo and pick a quant.
201
  | **Open WebUI / KoboldCpp / text-generation-webui** | Standard llama.cpp loader path — point at the GGUF, use the embedded chat template. |
202
 
203
  For the full Vision (image input) loader matrix, see [Vision](#vision).
204
- Tool calling currently works in **Ollama** (via this repo's Modelfile
205
- TEMPLATE) and **llama.cpp / llama-cpp-python** (via the GGUF's embedded
206
- jinja). Other apps' tool-calling support depends on whether they read
207
- the embedded template or require an external schema.
 
 
208
 
209
  ### Inference (OpenAI-compatible)
210
 
@@ -322,11 +326,18 @@ templates directly (llama.cpp, llama-cpp-python, LM Studio) handle the
322
  plain-conversation formatting automatically.
323
 
324
  Ollama is the exception: its conversion of the embedded jinja loses the
325
- `.Tools` / `.ToolCalls` blocks Ollama's capability detector requires, so
326
- the `Modelfile` in this repo overrides the template with an Ollama-Go
327
- version that wires tool calling correctly. Use the bundled `Modelfile`
328
- (via `make build` or `ollama run hf.co/FoolDev/janus-27b`) and tools
329
- will work end-to-end on `/api/chat` and `/v1/chat/completions`.
 
 
 
 
 
 
 
330
 
331
  #### Plain conversation
332
 
 
108
  | File | Use |
109
  |---|---|
110
  | `banner.svg` / `banner.png` | Repo header, Tokyo Night themed |
111
+ | `Modelfile` | Ollama wrapper around the bundled Qwen 3.6 27B GGUF used by `make build` / `ollama create` for **local** builds |
112
+ | `template`, `system`, `params` | Used by HF's Ollama bridge when users `ollama run hf.co/FoolDev/janus-27b` directly (the bridge does **not** read `Modelfile` — see [HF Ollama docs](https://huggingface.co/docs/hub/en/ollama)). Mirrors the `Modelfile`'s template / system prompt / sampling params. |
113
  | `examples/` | Ready-to-run Python clients for Ollama, Transformers, and llama-cpp-python |
114
  | `scripts/build.sh` | One-shot helper: pulls a GGUF and runs `ollama create` for you |
115
  | `scripts/smoke_test.sh` | Verifies an Ollama daemon + model, runs a round-trip, and asserts no chat-template tokens leak into the response |
116
  | `scripts/bench.sh` | Measures real tok/s using Ollama's `eval_count` / `eval_duration` metadata over a 3-prompt mix (run `make bench`) |
117
+ | `scripts/fetch_vision.sh` | Pulls the vision projector (`mmproj-F16.gguf`) for llama.cpp (Ollama vision is broken upstream — see [Vision](#vision)). Renamed from `fetch_mmproj.sh` because HF's Ollama bridge auto-indexed the script as a vision projector layer (filename pattern match). |
118
  | `scripts/check.sh` | Local lint: `bash -n`, `pyflakes`, `py_compile`, footgun-grep |
119
  | `scripts/install-hooks.sh` | Installs `check.sh` as a git pre-commit hook |
120
  | `Makefile` | Convenience wrapper — `make help` lists targets |
 
134
 
135
  For other quants or local builds, pull from
136
  [`unsloth/Qwen3.6-27B-GGUF`](https://huggingface.co/unsloth/Qwen3.6-27B-GGUF)
137
+ and `make build QUANT=...`. The local-build path applies this repo's
138
+ `Modelfile`; the `hf.co/...` path applies the root-level `template`,
139
+ `system`, and `params` files (kept in sync with the `Modelfile`).
140
 
141
  If you want the safetensors for `transformers`, fetch them from [`Qwen/Qwen3.6-27B`](https://huggingface.co/Qwen/Qwen3.6-27B).
142
 
 
195
 
196
  | App | How to load this model |
197
  |---|---|
198
+ | **Ollama** | `ollama run hf.co/FoolDev/janus-27b` (or `:Q3_K_S`). Pulls the GGUF + the root-level `template` / `system` / `params` files in one step (HF's Ollama bridge ingests these three files; it does **not** read `Modelfile`). For local builds, `make build` uses `Modelfile`, which is kept in sync. |
199
  | **LM Studio** | Search → `FoolDev/janus-27b` → pick `Janus-27B.Q4_K_M.gguf` or `Janus-27B.Q3_K_S.gguf`. Uses the GGUF's embedded jinja chat template (Qwen 3.6 ChatML); set the system prompt manually from the `SYSTEM` block in this repo's `Modelfile`. |
200
  | **Jan** | Hub → "Import from Hugging Face" → `FoolDev/janus-27b`. Same template behavior as LM Studio. |
201
  | **llama.cpp** | `hf download FoolDev/janus-27b Janus-27B.Q4_K_M.gguf --local-dir .` then `llama-server -m Janus-27B.Q4_K_M.gguf` (or `llama-cli`, `llama-mtmd-cli` for vision via the upstream `mmproj-F16.gguf`). |
 
203
  | **Open WebUI / KoboldCpp / text-generation-webui** | Standard llama.cpp loader path — point at the GGUF, use the embedded chat template. |
204
 
205
  For the full Vision (image input) loader matrix, see [Vision](#vision).
206
+ Tool calling currently works in **Ollama** (via the root-level
207
+ `template` file when pulling from `hf.co/...`, or via the `Modelfile`
208
+ TEMPLATE when building locally) and **llama.cpp / llama-cpp-python**
209
+ (via the GGUF's embedded jinja). Other apps' tool-calling support
210
+ depends on whether they read the embedded template or require an
211
+ external schema.
212
 
213
  ### Inference (OpenAI-compatible)
214
 
 
326
  plain-conversation formatting automatically.
327
 
328
  Ollama is the exception: its conversion of the embedded jinja loses the
329
+ `.Tools` / `.ToolCalls` blocks Ollama's capability detector requires.
330
+ Two paths fix this, depending on how you pull the model:
331
+
332
+ - **`ollama run hf.co/FoolDev/janus-27b`** HF's Ollama bridge applies
333
+ the root-level `template` / `system` / `params` files in this repo
334
+ (the bridge does **not** read `Modelfile`).
335
+ - **`make build` / `ollama create janus-27b -f Modelfile`** — uses the
336
+ `Modelfile`'s `TEMPLATE` block.
337
+
338
+ Both routes wire `.Tools` / `.ToolCalls` and tools work end-to-end on
339
+ `/api/chat` and `/v1/chat/completions`. The two configurations are
340
+ kept in sync: edit them together if you change one.
341
 
342
  #### Plain conversation
343
 
params ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "temperature": 0.6,
3
+ "top_p": 0.95,
4
+ "top_k": 20,
5
+ "repeat_penalty": 1.05,
6
+ "num_ctx": 16384,
7
+ "stop": [
8
+ "<|im_end|>",
9
+ "<|endoftext|>",
10
+ "<|im_start|>"
11
+ ]
12
+ }
scripts/{fetch_mmproj.sh → fetch_vision.sh} RENAMED
@@ -8,9 +8,9 @@
8
  # it (see README Vision section, ollama/ollama#15898).
9
  #
10
  # Usage:
11
- # ./scripts/fetch_mmproj.sh # default: F16, ~927 MB
12
- # ./scripts/fetch_mmproj.sh BF16 # ~931 MB
13
- # ./scripts/fetch_mmproj.sh F32 # ~1.8 GB
14
  #
15
  # Requires: huggingface-cli (or hf).
16
  set -euo pipefail
 
8
  # it (see README Vision section, ollama/ollama#15898).
9
  #
10
  # Usage:
11
+ # ./scripts/fetch_vision.sh # default: F16, ~927 MB
12
+ # ./scripts/fetch_vision.sh BF16 # ~931 MB
13
+ # ./scripts/fetch_vision.sh F32 # ~1.8 GB
14
  #
15
  # Requires: huggingface-cli (or hf).
16
  set -euo pipefail
system ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ You are Janus, a precise and capable assistant for reasoning, writing, coding, and long-form dialogue.
2
+
3
+ Behavior rules:
4
+ - Answer the user's actual request directly.
5
+ - Be accurate, complete, and structured.
6
+ - Think before answering, but do not get stuck in repetitive loops or meta-commentary.
7
+ - If the request is ambiguous or incomplete, state what is missing and make the smallest reasonable assumption needed to continue.
8
+ - If the user wants creative writing, preserve tone, continuity, and character consistency.
9
+ - If the user wants analysis or technical help, prefer concrete steps, examples, and decisions over fluff.
10
+ - Finish with a usable answer, not just planning.
template ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {{- $lastUserIdx := -1 -}}
2
+ {{- range $idx, $msg := .Messages -}}
3
+ {{- if eq $msg.Role "user" }}{{ $lastUserIdx = $idx }}{{ end -}}
4
+ {{- end }}
5
+ {{- if or .System .Tools }}<|im_start|>system
6
+ {{ if .System }}{{ .System }}
7
+
8
+ {{ end }}
9
+ {{- if .Tools }}# Tools
10
+
11
+ You may call one or more functions to assist with the user query.
12
+
13
+ You are provided with function signatures within <tools></tools> XML tags:
14
+ <tools>
15
+ {{- range .Tools }}
16
+ {"type": "function", "function": {{ .Function }}}
17
+ {{- end }}
18
+ </tools>
19
+
20
+ For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
21
+ <tool_call>
22
+ {"name": <function-name>, "arguments": <args-json-object>}
23
+ </tool_call>
24
+ {{- end -}}<|im_end|>
25
+ {{ end }}
26
+ {{- range $i, $_ := .Messages }}
27
+ {{- $last := eq (len (slice $.Messages $i)) 1 -}}
28
+ {{- if eq .Role "user" }}<|im_start|>user
29
+ {{ .Content }}<|im_end|>
30
+ {{ else if eq .Role "assistant" }}<|im_start|>assistant
31
+ {{ if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) -}}
32
+ <think>{{ .Thinking }}</think>
33
+ {{ end -}}
34
+ {{ if .Content }}{{ .Content }}{{ end }}
35
+ {{- if .ToolCalls }}
36
+ {{- range .ToolCalls }}
37
+ <tool_call>
38
+ {"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
39
+ </tool_call>
40
+ {{- end }}
41
+ {{- end }}{{ if not $last }}<|im_end|>
42
+ {{ end }}
43
+ {{- else if eq .Role "tool" }}<|im_start|>user
44
+ <tool_response>
45
+ {{ .Content }}
46
+ </tool_response><|im_end|>
47
+ {{ end }}
48
+ {{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
49
+ <think>
50
+ {{ end }}
51
+ {{- end }}