FoolDev Claude Opus 4.7 commited on
Commit
124302d
Β·
1 Parent(s): e191854

fix(bench.sh): case-insensitive model lookup + Q3_K_S Vulkan run 2 (11.7 tok/s)

Browse files

Surfaced by running `make build QUANT=Q3_K_S` end-to-end for the
first time tonight to validate the friction-free path. Two real
findings:

1. **bench.sh case-sensitivity bug.** `make bench MODEL=thanatos-27b`
errored with "Model 'thanatos-27b' not found" even though
`ollama show thanatos-27b` resolved cleanly and `ollama run
thanatos-27b` worked. Root cause: Ollama 0.24 displays the tag
in /api/tags as `Thanatos-27B:latest` (capitalised) regardless
of the case passed to `ollama create`, but bench.sh's jq check
used `startswith($m)` which is case-sensitive. smoke_test.sh
already uses `ascii_downcase` for the same check (line 54);
bench.sh was the lone holdout. Aligned bench.sh to the same
pattern with an inline comment explaining the Ollama 0.24
quirk.

2. **Q3_K_S Vulkan run 2 data point.** First fresh Q3_K_S bench
since the original Modelfile reference. 11.70 tok/s aggregate
(8009 tokens / 684.0 s; 12.23 / 12.12 / 11.66 short/medium/
long). 4.9% below run 1's 12.31 β€” within the Β±20% noise band
the README hardware section warns about. Slightly longer
per-prompt outputs this run (8009 vs 6182 tokens) plus
late-in-session thermal pressure on the Strix Halo iGPU
explain the gap. Confirms `make build QUANT=Q3_K_S` β†’
unsloth/Qwen3.6-27B-GGUF β†’ ollama create β†’ bench is a working
end-to-end path on this box.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Files changed (2) hide show
  1. Modelfile +11 -1
  2. scripts/bench.sh +6 -1
Modelfile CHANGED
@@ -136,8 +136,18 @@ Behavior rules:
136
  # Radeon 8060S iGPU, 32 GB unified, gfx1151, OLLAMA_FLASH_ATTENTION=1,
137
  # OLLAMA_KV_CACHE_TYPE=q8_0, num_ctx 16384, 3-prompt mix):
138
  # Vulkan (OLLAMA_VULKAN=1):
139
- # Q3_K_S β†’ 12.31 tok/s aggregate
140
  # (6182 tokens / 501.9 s; 12.67 / 12.55 / 12.25 short/medium/long)
 
 
 
 
 
 
 
 
 
 
141
  # Q4_K_M β†’ 9.31 tok/s aggregate (run 1)
142
  # (5356 tokens / 574.9 s; 9.48 / 9.43 / 9.28 short/medium/long)
143
  # Q4_K_M β†’ 9.19 tok/s aggregate (run 2, 2026-05-19 afternoon)
 
136
  # Radeon 8060S iGPU, 32 GB unified, gfx1151, OLLAMA_FLASH_ATTENTION=1,
137
  # OLLAMA_KV_CACHE_TYPE=q8_0, num_ctx 16384, 3-prompt mix):
138
  # Vulkan (OLLAMA_VULKAN=1):
139
+ # Q3_K_S β†’ 12.31 tok/s aggregate (run 1)
140
  # (6182 tokens / 501.9 s; 12.67 / 12.55 / 12.25 short/medium/long)
141
+ # Q3_K_S β†’ 11.70 tok/s aggregate (run 2, 2026-05-19 evening)
142
+ # (8009 tokens / 684.0 s; 12.23 / 12.12 / 11.66 short/medium/long)
143
+ # Second run measured against `thanatos-27b:latest` built via
144
+ # `make build QUANT=Q3_K_S` β€” i.e. unsloth/Qwen3.6-27B-GGUF's
145
+ # qwen35-stamped Q3_K_S, the friction-free path the README
146
+ # points users at. Aggregate is 4.9% below run 1 (within
147
+ # the Β±20% noise band) β€” slightly longer per-prompt outputs
148
+ # this run (8009 vs 6182 tokens) likely contribute the
149
+ # difference, plus late-in-session thermal pressure on the
150
+ # Strix Halo iGPU. The friction-free unsloth path works.
151
  # Q4_K_M β†’ 9.31 tok/s aggregate (run 1)
152
  # (5356 tokens / 574.9 s; 9.48 / 9.43 / 9.28 short/medium/long)
153
  # Q4_K_M β†’ 9.19 tok/s aggregate (run 2, 2026-05-19 afternoon)
scripts/bench.sh CHANGED
@@ -37,7 +37,12 @@ if ! TAGS="$(curl -fsS "${HOST}/api/tags")"; then
37
  red "[!] Ollama not reachable at ${HOST}"
38
  exit 1
39
  fi
40
- if ! jq -e --arg m "${MODEL}" '.models[] | select(.name | startswith($m))' >/dev/null <<<"${TAGS}"; then
 
 
 
 
 
41
  red "[!] Model '${MODEL}' not found. Build it first: ./scripts/build.sh"
42
  exit 1
43
  fi
 
37
  red "[!] Ollama not reachable at ${HOST}"
38
  exit 1
39
  fi
40
+ # Match case-insensitively: Ollama 0.24's API tag list preserves the
41
+ # case of whatever `general.name` it inferred at create time, which
42
+ # can differ from the case the user passed to `ollama create` / typed
43
+ # into `ollama run`. Both `ollama show <lower>` and `ollama show
44
+ # <Mixed>` resolve to the same model, so the bench check should too.
45
+ if ! jq -e --arg m "${MODEL}" '.models[] | select(.name | ascii_downcase | startswith($m | ascii_downcase))' >/dev/null <<<"${TAGS}"; then
46
  red "[!] Model '${MODEL}' not found. Build it first: ./scripts/build.sh"
47
  exit 1
48
  fi