FoolDev Claude Opus 4.7 commited on
Commit
2b2ba03
Β·
1 Parent(s): 25d5454

docs: fix four README defects surfaced by fresh-eyes audit

Browse files

1. **"Local apps" llama.cpp row recipe was broken.** The cell told
users to `hf download FoolDev/Thanatos-27B Thanatos-27B.Q4_K_M.gguf`
then `llama-server -m Thanatos-27B.Q4_K_M.gguf` β€” but that bundle
is qwen36-stamped post-973d7ef, so llama-server immediately
errors with `unknown model architecture: 'qwen36'`. The intro
paragraph above the table noted the qwen36 caveat but the recipe
itself didn't honor it. Rewrote to either rebadge in place via
scripts/rename_arch.py OR (cleaner) pull the qwen35-stamped
`Qwen3.6-27B-Q4_K_M.gguf` from unsloth directly.

2. **History date wrong.** Said `v0.6.0 (e1f78fa, 2026-05-18)` but
`git log e1f78fa` shows 2026-05-19 14:38 UTC. Corrected.

3. **History flip count wrong.** Said "flipped between the two
stamps three times" β€” actual count is five stamp changes (three
landings on qwen36 at e1f78fa / 07fa120 / 973d7ef, two on
qwen35 at 964e418 / 72259c1). Split the round-trip bullet into
its two constituent flips and corrected the lede.

4. **examples/README.md heal-hf framing stale.** Said "If you
pulled before commit `964e418`" β€” that text predated the 3rd
round trip (973d7ef). Current state: every fresh `ollama pull`
of this repo's bundle needs `make heal-hf`. Rewrote to say so
and point at the main README's Stamp choice section.

Bug class: stale text the round-trip residue cleanup (c1c4dfd) and
the qwen36 re-alignment (cee14f4) didn't fully sweep. None blocked
working users, but the llama.cpp recipe in particular would have
sent anyone trying that path straight into the qwen36 error.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Files changed (2) hide show
  1. README.md +21 -20
  2. examples/README.md +8 -4
README.md CHANGED
@@ -251,26 +251,27 @@ the tensor data is byte-identical across both stamps.
251
 
252
  ### History
253
 
254
- The bundle has now flipped between the two stamps three times,
255
- each time after weighing the friction-vs-honesty tradeoff anew:
256
-
257
- - **v0.6.0 (e1f78fa, 2026-05-18):** initial qwen35 β†’ qwen36
258
- stamp, on the theory that qwen35 was a loader stand-in
259
- awaiting proper Qwen 3.6 support. Upstream audit later
260
- showed that theory was mistaken (see above).
261
- - **2026-05-19 morning (964e418):** flipped back to qwen35
 
262
  after daily friction outweighed version-specificity for that
263
- iteration; doc workaround narrative collapsed
264
- (`83022eb`).
265
- - **2026-05-19 evening (07fa120, reverted `72259c1`):** brief
266
- ~1-hour re-flip to qwen36 during a fresh-pull integration
267
- test; reverted because the live friction was worse than the
268
- doc prose suggested.
269
- - **2026-05-19 evening, again (`973d7ef`):** flipped to qwen36
270
- one more time, after the upstream-evidence audit had been
271
- shipped and the friction was a known quantity. Project owner
272
- prefers the version-specific stamp despite the audit
273
- conclusion. **This is the current state.**
274
 
275
  Tensor data was byte-identical across all stamps; only the
276
  `general.architecture` KV (and namespaced KV keys) flipped.
@@ -350,7 +351,7 @@ caveat applies to every row until that's done.
350
  | **Ollama** | `ollama run hf.co/FoolDev/Thanatos-27B` (default Q4_K_M). Pulls the GGUF + the root-level `template` / `system` / `params` files in one step (HF's Ollama bridge ingests these three files; it does **not** read `Modelfile`); fails on first inference, then `make heal-hf` rebadges the cached blob. For other quants, or to bypass the qwen36 block entirely, `make build QUANT=Q3_K_S` downloads from unsloth (qwen35-stamped) and creates a local Ollama tag using the `Modelfile`, which is kept in sync with the bridge files. |
351
  | **LM Studio** | Search β†’ `FoolDev/Thanatos-27B` β†’ pick `Thanatos-27B.Q4_K_M.gguf`. Uses the GGUF's embedded jinja chat template (Qwen 3.6 ChatML); set the system prompt manually from the `SYSTEM` block in this repo's `Modelfile`. |
352
  | **Jan** | Hub β†’ "Import from Hugging Face" β†’ `FoolDev/Thanatos-27B`. Same template behavior as LM Studio. |
353
- | **llama.cpp** | `hf download FoolDev/Thanatos-27B Thanatos-27B.Q4_K_M.gguf --local-dir .` then `llama-server -m Thanatos-27B.Q4_K_M.gguf` (or `llama-cli`, `llama-mtmd-cli` for vision via the upstream `mmproj-F16.gguf`). |
354
  | **llama-cpp-python** | See `examples/llama_cpp_quickstart.py` (text) and `examples/llama_cpp_vision.py` (image input). |
355
  | **Open WebUI / KoboldCpp / text-generation-webui** | Standard llama.cpp loader path β€” point at the GGUF, use the embedded chat template. |
356
 
 
251
 
252
  ### History
253
 
254
+ The bundle has now changed stamps five times (three landings on
255
+ qwen36, two on qwen35), each time after weighing the
256
+ friction-vs-honesty tradeoff anew:
257
+
258
+ - **v0.6.0-era (`e1f78fa`, 2026-05-19 14:38 UTC):** initial qwen35
259
+ β†’ qwen36 stamp, on the theory that qwen35 was a loader stand-in
260
+ awaiting proper Qwen 3.6 support. Upstream audit later showed
261
+ that theory was mistaken (see above).
262
+ - **2026-05-19 afternoon (`964e418`):** flipped back to qwen35
263
  after daily friction outweighed version-specificity for that
264
+ iteration; doc workaround narrative collapsed (`83022eb`).
265
+ - **2026-05-19 evening (`07fa120`):** brief re-flip to qwen36
266
+ during a fresh-pull integration test on Strix Halo.
267
+ - **2026-05-19 evening (`72259c1`, ~1 hour later):** reverted to
268
+ qwen35 again because the live friction was worse than the doc
269
+ prose suggested.
270
+ - **2026-05-19 evening (`973d7ef`):** flipped to qwen36 one more
271
+ time, after the upstream-evidence audit had been shipped and
272
+ the friction was a known quantity. Project owner prefers the
273
+ version-specific stamp despite the audit conclusion. **This
274
+ is the current state.**
275
 
276
  Tensor data was byte-identical across all stamps; only the
277
  `general.architecture` KV (and namespaced KV keys) flipped.
 
351
  | **Ollama** | `ollama run hf.co/FoolDev/Thanatos-27B` (default Q4_K_M). Pulls the GGUF + the root-level `template` / `system` / `params` files in one step (HF's Ollama bridge ingests these three files; it does **not** read `Modelfile`); fails on first inference, then `make heal-hf` rebadges the cached blob. For other quants, or to bypass the qwen36 block entirely, `make build QUANT=Q3_K_S` downloads from unsloth (qwen35-stamped) and creates a local Ollama tag using the `Modelfile`, which is kept in sync with the bridge files. |
352
  | **LM Studio** | Search β†’ `FoolDev/Thanatos-27B` β†’ pick `Thanatos-27B.Q4_K_M.gguf`. Uses the GGUF's embedded jinja chat template (Qwen 3.6 ChatML); set the system prompt manually from the `SYSTEM` block in this repo's `Modelfile`. |
353
  | **Jan** | Hub β†’ "Import from Hugging Face" β†’ `FoolDev/Thanatos-27B`. Same template behavior as LM Studio. |
354
+ | **llama.cpp** | The bundled GGUF is qwen36-stamped, so `llama-server -m Thanatos-27B.Q4_K_M.gguf` errors with `unknown model architecture: 'qwen36'`. Either rebadge first (`python3 scripts/rename_arch.py --from-arch qwen36 --to-arch qwen35 Thanatos-27B.Q4_K_M.gguf Thanatos-27B.Q4_K_M.qwen35.gguf`), or β€” cleaner β€” `hf download unsloth/Qwen3.6-27B-GGUF Qwen3.6-27B-Q4_K_M.gguf --local-dir .` to get the qwen35-stamped GGUF directly, then `llama-server -m Qwen3.6-27B-Q4_K_M.gguf` (or `llama-cli`, `llama-mtmd-cli` for vision via the upstream `mmproj-F16.gguf`). |
355
  | **llama-cpp-python** | See `examples/llama_cpp_quickstart.py` (text) and `examples/llama_cpp_vision.py` (image input). |
356
  | **Open WebUI / KoboldCpp / text-generation-webui** | Standard llama.cpp loader path β€” point at the GGUF, use the embedded chat template. |
357
 
examples/README.md CHANGED
@@ -29,10 +29,14 @@ pip install requests
29
  MODEL=hf.co/FoolDev/Thanatos-27B python ollama_chat.py
30
  ```
31
 
32
- If you pulled before commit `964e418` (the qwen35 re-stamp) and
33
- still have the broken qwen36 blob in your Ollama store, run
34
- `cd .. && make heal-hf` once to rebadge it in place. Fresh pulls
35
- after the re-stamp go straight through.
 
 
 
 
36
 
37
  For a non-bundled quant (e.g. Q3_K_S ~12 GB, Q5_K_M ~20 GB),
38
  `make build QUANT=...` downloads from `unsloth/Qwen3.6-27B-GGUF`
 
29
  MODEL=hf.co/FoolDev/Thanatos-27B python ollama_chat.py
30
  ```
31
 
32
+ The bundled GGUF is currently `qwen36`-stamped (HF commit
33
+ `973d7ef`), so the `ollama pull` above fails on first inference
34
+ with `unable to load model`. Run `cd .. && make heal-hf` once to
35
+ rebadge the cached blob in place (qwen36 β†’ qwen35, metadata-only,
36
+ ~5 s) β€” the same tag then loads. Every fresh `ollama pull` of
37
+ this repo's bundle needs the heal step until the project flips
38
+ back to qwen35 (see the main README's "Stamp choice" section for
39
+ why this repo stamps qwen36 deliberately).
40
 
41
  For a non-bundled quant (e.g. Q3_K_S ~12 GB, Q5_K_M ~20 GB),
42
  `make build QUANT=...` downloads from `unsloth/Qwen3.6-27B-GGUF`