FoolDev Claude Opus 4.7 commited on
Commit
4208793
Β·
1 Parent(s): bec5589

README/examples: tighten Ollama-vision failure mode description

Browse files

Previously implied that `ollama create` itself returns the
`unknown model architecture` error. Empirically that's wrong on
Ollama 0.22 against the dense qwen35 27B + mmproj-F16: `ollama create`
succeeds, `ollama show` reports the `vision` capability with a CLIP
projector attached, and the architecture error only fires from the
runner on the first inference request β€” at which point it blocks text
inference too (matches the upstream issue's "blocks ALL inference"
phrasing).

Reworded both the loader-compat table row and the examples/README
"Why not Ollama?" note to describe what actually happens. Also flipped
the now-stale "Text only" qualifier on examples/ollama_chat.py to
"Text + tool calling" to match the Modelfile.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Files changed (2) hide show
  1. README.md +1 -1
  2. examples/README.md +7 -4
README.md CHANGED
@@ -223,7 +223,7 @@ This repo intentionally does not redistribute either.
223
  |---|---|---|---|
224
  | **llama.cpp** (`llama-mtmd-cli`, `llama-server --mmproj`) | βœ… | βœ… | Reference path. Upstream has the `qwen35`/`qwen35moe` arch entries. |
225
  | **llama-cpp-python** | βœ… | βœ… | See `examples/llama_cpp_vision.py`. |
226
- | **Ollama 0.22** | βœ… | ❌ | Vendored llama.cpp fork is missing the architecture entries. Attaching `mmproj` via `FROM` *or* `ADAPTER` returns `unknown model architecture: 'qwen35moe'` (and the same for the dense `qwen35`). See [ollama/ollama#15898](https://github.com/ollama/ollama/issues/15898). Will work once that PR lands. |
227
  | **LM Studio** | βœ… | βœ… (last tested) | Uses upstream llama.cpp directly. |
228
 
229
  ### Vision via llama.cpp
 
223
  |---|---|---|---|
224
  | **llama.cpp** (`llama-mtmd-cli`, `llama-server --mmproj`) | βœ… | βœ… | Reference path. Upstream has the `qwen35`/`qwen35moe` arch entries. |
225
  | **llama-cpp-python** | βœ… | βœ… | See `examples/llama_cpp_vision.py`. |
226
+ | **Ollama 0.22** | βœ… | ❌ | Vendored llama.cpp fork is missing the `qwen35` / `qwen35moe` architecture entries. `ollama create` accepts a dual-`FROM` (text + mmproj) and `ollama show` even reports `vision` capability β€” but the **first inference request** fails with `error loading model architecture: unknown model architecture: 'qwen35'` (or `'qwen35moe'`), and once mmproj is attached this blocks text inference too. See [ollama/ollama#15898](https://github.com/ollama/ollama/issues/15898). Will work once that PR lands. |
227
  | **LM Studio** | βœ… | βœ… (last tested) | Uses upstream llama.cpp directly. |
228
 
229
  ### Vision via llama.cpp
examples/README.md CHANGED
@@ -4,7 +4,7 @@ Three minimal entry points. Pick the one that matches how you run models.
4
 
5
  | File | Backend | When to use |
6
  |---|---|---|
7
- | `ollama_chat.py` | Ollama HTTP API | You already have `ollama serve` running and the `janus-27b` model created from the project `Modelfile`. **Text only** β€” vision via Ollama is broken upstream for this arch. |
8
  | `transformers_quickstart.py` | Hugging Face Transformers | You want to run the upstream safetensors (`Qwen/Qwen3.6-27B`) on GPU, optionally in 4-bit via bitsandbytes. |
9
  | `llama_cpp_quickstart.py` | llama-cpp-python | You want to invoke a local GGUF directly without a daemon (CI, batch jobs, scripts). Text only. |
10
  | `llama_cpp_vision.py` | llama-cpp-python + mmproj | **Image input.** Loads a text GGUF + `mmproj-F16.gguf` and answers questions about an image. The only working vision path right now. |
@@ -63,8 +63,11 @@ python llama_cpp_vision.py \
63
  --prompt "Describe this image."
64
  ```
65
 
66
- Why not Ollama? Ollama 0.22's vendored llama.cpp is missing the `qwen35`
67
- architecture entries needed to attach an mmproj β€” `FROM` and `ADAPTER`
68
- both fail with `unknown model architecture: 'qwen35moe'`. Tracked in
 
 
 
69
  [ollama/ollama#15898](https://github.com/ollama/ollama/issues/15898).
70
  Until that's fixed, llama.cpp / llama-cpp-python is the working path.
 
4
 
5
  | File | Backend | When to use |
6
  |---|---|---|
7
+ | `ollama_chat.py` | Ollama HTTP API | You already have `ollama serve` running and the `janus-27b` model created from the project `Modelfile`. **Text + tool calling** β€” vision via Ollama is broken upstream for this arch. |
8
  | `transformers_quickstart.py` | Hugging Face Transformers | You want to run the upstream safetensors (`Qwen/Qwen3.6-27B`) on GPU, optionally in 4-bit via bitsandbytes. |
9
  | `llama_cpp_quickstart.py` | llama-cpp-python | You want to invoke a local GGUF directly without a daemon (CI, batch jobs, scripts). Text only. |
10
  | `llama_cpp_vision.py` | llama-cpp-python + mmproj | **Image input.** Loads a text GGUF + `mmproj-F16.gguf` and answers questions about an image. The only working vision path right now. |
 
63
  --prompt "Describe this image."
64
  ```
65
 
66
+ Why not Ollama? Ollama 0.22's vendored llama.cpp is missing the
67
+ `qwen35` / `qwen35moe` architecture entries. `ollama create` accepts
68
+ the dual-`FROM` and `ollama show` reports `vision` capability, but the
69
+ first inference call fails with `error loading model architecture:
70
+ unknown model architecture: 'qwen35'` (verified empirically against
71
+ the dense 27B + `mmproj-F16.gguf`). Tracked in
72
  [ollama/ollama#15898](https://github.com/ollama/ollama/issues/15898).
73
  Until that's fixed, llama.cpp / llama-cpp-python is the working path.