FoolDev Claude Opus 4.7 commited on
Commit
c336f44
Β·
1 Parent(s): 7063e20

docs(examples): wire heal-hf into the Ollama setup path

Browse files

`examples/README.md`'s "easiest path" told users to `ollama pull
hf.co/FoolDev/Thanatos-27B` then run `ollama_chat.py` against the
pulled tag. That path is broken today: the pull succeeds, but the
first inference call returns the qwen36 500 (`unable to load
model: <blob>`) because the bundle is qwen36-stamped β€” exactly the
failure mode `make heal-hf` now repairs.

Section rewritten:

- Lead with the pull + `cd .. && make heal-hf && cd examples`
recovery (one extra command after pull, rebadges the pulled
blob in store so the same `hf.co/...` tag becomes loadable).
- Add the `make load-bundle` alternative that bypasses the HF pull
entirely by loading this repo's bundled GGUF directly and
building a local `thanatos-27b` tag β€” relevant for users who'd
rather not heal-after-pull.
- Add the `make build QUANT=...` alternative for non-bundled quants
(Q3_K_S, Q5_K_M, etc.) that downloads from
`unsloth/Qwen3.6-27B-GGUF` (qwen35-stamped, loads today) and
builds the same local tag.

All three paths set MODEL explicitly so `ollama_chat.py` runs
against the correct tag (`hf.co/...` for the heal path,
`thanatos-27b` for the load-bundle / build paths).

Closes the last user-facing place where the heal-hf workaround
wasn't documented β€” main README, Architecture, Quick start, and now
examples/README all point at it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Files changed (2) hide show
  1. CHANGELOG.md +11 -0
  2. examples/README.md +22 -6
CHANGELOG.md CHANGED
@@ -44,6 +44,17 @@ and documentation**, not the underlying base model.
44
  README warns about. Modelfile hardware notes updated.
45
 
46
  ### Changed
 
 
 
 
 
 
 
 
 
 
 
47
  - README "Architecture" section + Quick start option A:
48
  - Architecture body now notes that neither `ggml-org/llama.cpp`
49
  nor `ollama/ollama` has an open PR or tracking issue for a
 
44
  README warns about. Modelfile hardware notes updated.
45
 
46
  ### Changed
47
+ - `examples/README.md` Ollama setup section: the previous "easiest
48
+ path" told users to `ollama pull hf.co/FoolDev/Thanatos-27B` then
49
+ run `ollama_chat.py` against the pulled tag β€” which fails today
50
+ because the bundle is qwen36-stamped (the pull succeeds; the
51
+ first inference call returns the qwen36 500). Section rewritten
52
+ to lead with the pull + `make heal-hf` recovery (one extra
53
+ command after pull, rebadges the blob in place so the same
54
+ `hf.co/...` tag loads), and adds the `make load-bundle` /
55
+ `make build QUANT=...` alternatives as the bypass-the-bundle
56
+ paths that don't touch the HF pull at all. All three set MODEL
57
+ explicitly so the right tag is used.
58
  - README "Architecture" section + Quick start option A:
59
  - Architecture body now notes that neither `ggml-org/llama.cpp`
60
  nor `ollama/ollama` has an open PR or tracking issue for a
examples/README.md CHANGED
@@ -19,11 +19,16 @@ from the `Modelfile` / bridge files.
19
 
20
  ### Ollama
21
 
22
- Easiest path β€” pull straight from HF (gets the bundled Q4_K_M GGUF +
23
- this repo's Modelfile in one step):
 
 
 
 
24
 
25
  ```bash
26
  ollama pull hf.co/FoolDev/Thanatos-27B # 17 GB Q4_K_M (only bundled quant)
 
27
  pip install requests
28
  MODEL=hf.co/FoolDev/Thanatos-27B python ollama_chat.py
29
  ```
@@ -31,12 +36,23 @@ MODEL=hf.co/FoolDev/Thanatos-27B python ollama_chat.py
31
  For the smaller-footprint Q3_K_S (~12 GB) or other quants, build
32
  locally instead β€” see the parent repo's `make build QUANT=...` flow.
33
 
34
- Or build locally from this repo (uses the bundled `Thanatos-27B.Q4_K_M.gguf`,
35
- no edits required):
 
 
36
 
37
  ```bash
38
- cd .. && make build && cd examples
39
- python ollama_chat.py
 
 
 
 
 
 
 
 
 
40
  ```
41
 
42
  For a quant the repo doesn't bundle (e.g. Q5_K_M), `make build` will
 
19
 
20
  ### Ollama
21
 
22
+ Pull straight from HF (gets the bundled Q4_K_M GGUF + this repo's
23
+ root-level `template` / `system` / `params` files via HF's Ollama
24
+ bridge), then heal the bundle's `qwen36` arch stamp so Ollama can
25
+ actually load it. The pull itself succeeds; the first inference
26
+ fails with `unable to load model` until heal β€” see parent README's
27
+ [Architecture](../README.md#architecture) for why:
28
 
29
  ```bash
30
  ollama pull hf.co/FoolDev/Thanatos-27B # 17 GB Q4_K_M (only bundled quant)
31
+ cd .. && make heal-hf && cd examples # rebadges the pulled blob qwen36 -> qwen35 in store
32
  pip install requests
33
  MODEL=hf.co/FoolDev/Thanatos-27B python ollama_chat.py
34
  ```
 
36
  For the smaller-footprint Q3_K_S (~12 GB) or other quants, build
37
  locally instead β€” see the parent repo's `make build QUANT=...` flow.
38
 
39
+ Or build locally from this repo without going through the HF pull
40
+ (uses the bundled `Thanatos-27B.Q4_K_M.gguf` via `make load-bundle`,
41
+ which handles the qwen36 β†’ qwen35 rebadge in one shot and creates a
42
+ local `thanatos-27b` tag):
43
 
44
  ```bash
45
+ cd .. && make load-bundle && cd examples
46
+ MODEL=thanatos-27b python ollama_chat.py
47
+ ```
48
+
49
+ For a non-bundled quant (e.g. Q5_K_M, Q3_K_S), `make build QUANT=...`
50
+ downloads a qwen35-stamped GGUF from `unsloth/Qwen3.6-27B-GGUF` and
51
+ creates the same local tag β€” bypasses the bundle and the heal entirely:
52
+
53
+ ```bash
54
+ cd .. && make build QUANT=Q5_K_M && cd examples
55
+ MODEL=thanatos-27b python ollama_chat.py
56
  ```
57
 
58
  For a quant the repo doesn't bundle (e.g. Q5_K_M), `make build` will