FoolDev Claude Opus 4.7 commited on
Commit
7197abd
·
1 Parent(s): 73e905b

Rename back: Thanatos-27B-Heretic → Thanatos-27B (HF repo also renamed)

Browse files

The HF repo was renamed back to FoolDev/Thanatos-27B via the HF UI
(serves a 307 from the prior -Heretic name). With the base also
reverted to vanilla Qwen/Qwen3.6-27B in 73e905b, the -Heretic
suffix had no remaining justification.

- Bulk renamed Thanatos-27B-Heretic → Thanatos-27B and
thanatos-27b-heretic → thanatos-27b across README, Modelfile,
scripts, examples, CITATION.cff, Makefile, .gitignore.
- banner.svg: dropped the -HERETIC tspan, leaving THANATOS-27B
wordmark. banner.png re-rasterized at 2× via rsvg-convert.
- README "Note on the name" callout removed (name and base are
aligned again).
- CITATION.cff: dropped the trailing parenthetical about the
reverted Heretic swap.
- Makefile clean target deduped (the sed produced two identical
Thanatos-27B.*.qwen[0-9]*.gguf entries).
- Git remote re-pointed to git@hf.co:FoolDev/Thanatos-27B.
- CHANGELOG history retained as-is; entries below the new top
one still reference Thanatos-27B-Heretic as a record of work
done under that name.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

CHANGELOG.md CHANGED
@@ -7,6 +7,29 @@ and documentation**, not the underlying base model.
7
 
8
  ## [Unreleased]
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ### Reverted (base swap to Heretic v2 — name kept, base back to vanilla Qwen)
11
  - **Undone the `Qwen/Qwen3.6-27B` → `llmfan46/Qwen3.6-27B-uncensored-heretic-v2`
12
  base swap** that shipped in `16e1ddd` and was polished in
 
7
 
8
  ## [Unreleased]
9
 
10
+ ### Changed (project name reverted: Thanatos-27B-Heretic → Thanatos-27B)
11
+ - **HF repo renamed back to `FoolDev/Thanatos-27B`** via the HF UI
12
+ (HF serves a 307 redirect from `FoolDev/Thanatos-27B-Heretic` to
13
+ the canonical name). Now that the base is also reverted to
14
+ vanilla `Qwen/Qwen3.6-27B`, the `-Heretic` suffix had no
15
+ remaining justification.
16
+ - **Bulk-renamed `Thanatos-27B-Heretic` → `Thanatos-27B`** and
17
+ `thanatos-27b-heretic` → `thanatos-27b` across README,
18
+ Modelfile, scripts, examples, CITATION.cff, Makefile,
19
+ .gitignore. CHANGELOG history entries below this one retain
20
+ their original wording (they document actions taken under the
21
+ Heretic name, before this revert).
22
+ - **Banner wordmark** — dropped the `-HERETIC` tspan from
23
+ banner.svg, leaving `THANATOS-27B`. banner.png re-rasterized at
24
+ 2× via rsvg-convert.
25
+ - **README "Note on the name" callout removed** — no longer
26
+ applicable since name and base are aligned again.
27
+ - **CITATION.cff abstract** — dropped the trailing parenthetical
28
+ about the reverted Heretic swap.
29
+ - **Local git remote re-pointed** from
30
+ `git@hf.co:FoolDev/Thanatos-27B-Heretic` to
31
+ `git@hf.co:FoolDev/Thanatos-27B`.
32
+
33
  ### Reverted (base swap to Heretic v2 — name kept, base back to vanilla Qwen)
34
  - **Undone the `Qwen/Qwen3.6-27B` → `llmfan46/Qwen3.6-27B-uncensored-heretic-v2`
35
  base swap** that shipped in `16e1ddd` and was polished in
CITATION.cff CHANGED
@@ -1,14 +1,14 @@
1
  cff-version: 1.2.0
2
- title: "Thanatos-27B-Heretic: A Dense Distillation Wrapper for Qwen 3.6 27B"
3
  message: "If you use this model card or its accompanying files, please cite as below."
4
  type: software
5
  authors:
6
  - name: FoolDev
7
  website: "https://huggingface.co/FoolDev"
8
- repository-code: "https://huggingface.co/FoolDev/Thanatos-27B-Heretic"
9
- url: "https://huggingface.co/FoolDev/Thanatos-27B-Heretic"
10
  abstract: >-
11
- Thanatos-27B-Heretic is a personal repackaging of the dense Qwen 3.6 27B base
12
  model with Claude Opus 4.7 in the reasoning teacher slot. The
13
  repository ships an Ollama Modelfile, sampling defaults, usage
14
  examples, and a single ready-to-run GGUF (Q4_K_M ~17 GB) so the HF
@@ -16,9 +16,6 @@ abstract: >-
16
  quants (Q3_K_S, Q5_K_M, Q6_K, etc.) and the upstream safetensors
17
  (Qwen/Qwen3.6-27B) are pulled from upstream
18
  (unsloth/Qwen3.6-27B-GGUF) on demand rather than redistributed.
19
- (The repo carries the `-Heretic` suffix from a prior swap to
20
- llmfan46/Qwen3.6-27B-uncensored-heretic-v2 that was reverted;
21
- current base is vanilla Qwen 3.6 27B.)
22
  keywords:
23
  - qwen
24
  - qwen3.6
 
1
  cff-version: 1.2.0
2
+ title: "Thanatos-27B: A Dense Distillation Wrapper for Qwen 3.6 27B"
3
  message: "If you use this model card or its accompanying files, please cite as below."
4
  type: software
5
  authors:
6
  - name: FoolDev
7
  website: "https://huggingface.co/FoolDev"
8
+ repository-code: "https://huggingface.co/FoolDev/Thanatos-27B"
9
+ url: "https://huggingface.co/FoolDev/Thanatos-27B"
10
  abstract: >-
11
+ Thanatos-27B is a personal repackaging of the dense Qwen 3.6 27B base
12
  model with Claude Opus 4.7 in the reasoning teacher slot. The
13
  repository ships an Ollama Modelfile, sampling defaults, usage
14
  examples, and a single ready-to-run GGUF (Q4_K_M ~17 GB) so the HF
 
16
  quants (Q3_K_S, Q5_K_M, Q6_K, etc.) and the upstream safetensors
17
  (Qwen/Qwen3.6-27B) are pulled from upstream
18
  (unsloth/Qwen3.6-27B-GGUF) on demand rather than redistributed.
 
 
 
19
  keywords:
20
  - qwen
21
  - qwen3.6
Makefile CHANGED
@@ -1,11 +1,11 @@
1
- # Thanatos-27B-Heretic convenience Makefile.
2
  #
3
  # All work is delegated to scripts/* — this file just gives common
4
  # operations short, discoverable names.
5
  #
6
  # Variables you can override on the command line:
7
  # QUANT GGUF quant suffix (default: Q4_K_M)
8
- # TAG Ollama model tag (default: thanatos-27b-heretic)
9
  # GGUF_PATH path to existing GGUF (skip the download)
10
  # MODEL model tag for smoke (default: $(TAG))
11
  #
@@ -19,7 +19,7 @@
19
  # make clean
20
 
21
  QUANT ?= Q4_K_M
22
- TAG ?= thanatos-27b-heretic
23
  MODEL ?= $(TAG)
24
 
25
  .DEFAULT_GOAL := help
@@ -43,7 +43,7 @@ build: ## Download qwen35-stamped GGUF from unsloth and run 'ollama create' (lo
43
  load-bundle: ## Load THIS repo's bundled GGUF into a local Ollama tag (smudge LFS + ollama create).
44
  TAG=$(TAG) ./scripts/load_bundle.sh
45
 
46
- heal-hf: ## Heal an already-pulled hf.co/FoolDev/Thanatos-27B-Heretic tag in-store (rebadge blob + manifest digest).
47
  ./scripts/heal_hf_pull.sh
48
 
49
  smoke: ## Verify the model is reachable and round-trips.
@@ -69,6 +69,6 @@ hooks: ## Install scripts/check.sh as the git pre-commit hook.
69
 
70
  clean: ## Remove local GGUF copies and ephemeral caches in this repo.
71
  @echo "[*] removing local GGUFs and ephemeral caches in $$PWD"
72
- @rm -f ./Qwen3.6-27B-*.gguf ./mmproj-*.gguf ./Thanatos-27B.*.qwen[0-9]*.gguf ./Thanatos-27B-Heretic.*.qwen[0-9]*.gguf
73
  @rm -rf ./.cache __pycache__ examples/__pycache__
74
  @echo "[+] clean"
 
1
+ # Thanatos-27B convenience Makefile.
2
  #
3
  # All work is delegated to scripts/* — this file just gives common
4
  # operations short, discoverable names.
5
  #
6
  # Variables you can override on the command line:
7
  # QUANT GGUF quant suffix (default: Q4_K_M)
8
+ # TAG Ollama model tag (default: thanatos-27b)
9
  # GGUF_PATH path to existing GGUF (skip the download)
10
  # MODEL model tag for smoke (default: $(TAG))
11
  #
 
19
  # make clean
20
 
21
  QUANT ?= Q4_K_M
22
+ TAG ?= thanatos-27b
23
  MODEL ?= $(TAG)
24
 
25
  .DEFAULT_GOAL := help
 
43
  load-bundle: ## Load THIS repo's bundled GGUF into a local Ollama tag (smudge LFS + ollama create).
44
  TAG=$(TAG) ./scripts/load_bundle.sh
45
 
46
+ heal-hf: ## Heal an already-pulled hf.co/FoolDev/Thanatos-27B tag in-store (rebadge blob + manifest digest).
47
  ./scripts/heal_hf_pull.sh
48
 
49
  smoke: ## Verify the model is reachable and round-trips.
 
69
 
70
  clean: ## Remove local GGUF copies and ephemeral caches in this repo.
71
  @echo "[*] removing local GGUFs and ephemeral caches in $$PWD"
72
+ @rm -f ./Qwen3.6-27B-*.gguf ./mmproj-*.gguf ./Thanatos-27B.*.qwen[0-9]*.gguf
73
  @rm -rf ./.cache __pycache__ examples/__pycache__
74
  @echo "[+] clean"
Modelfile CHANGED
@@ -1,4 +1,4 @@
1
- # Thanatos-27B-Heretic — Ollama wrapper around Qwen 3.6 27B (dense)
2
  #
3
  # Text + tool calling. Vision via Ollama is currently broken for this
4
  # architecture (ollama/ollama#15898 — the qwen35 arch entries are in
@@ -10,7 +10,7 @@
10
  # stamped `general.architecture: 'qwen35'` — the upstream-canonical
11
  # arch entry every released llama.cpp / Ollama loads under for the
12
  # Qwen 3.5 / 3.6 hybrid SSM + attention family. `ollama create
13
- # thanatos-27b-heretic -f Modelfile && ollama run thanatos-27b-heretic` loads it
14
  # directly. See README "Architecture" for the full stamp history
15
  # (eight flips between qwen35 and qwen36, settled on qwen35 at
16
  # `e03e10e` after the 4th qwen36 round trip had its friction
 
1
+ # Thanatos-27B — Ollama wrapper around Qwen 3.6 27B (dense)
2
  #
3
  # Text + tool calling. Vision via Ollama is currently broken for this
4
  # architecture (ollama/ollama#15898 — the qwen35 arch entries are in
 
10
  # stamped `general.architecture: 'qwen35'` — the upstream-canonical
11
  # arch entry every released llama.cpp / Ollama loads under for the
12
  # Qwen 3.5 / 3.6 hybrid SSM + attention family. `ollama create
13
+ # thanatos-27b -f Modelfile && ollama run thanatos-27b` loads it
14
  # directly. See README "Architecture" for the full stamp history
15
  # (eight flips between qwen35 and qwen36, settled on qwen35 at
16
  # `e03e10e` after the 4th qwen36 round trip had its friction
README.md CHANGED
@@ -45,7 +45,7 @@ library_name: transformers
45
  pipeline_tag: image-text-to-text
46
  ---
47
 
48
- <img src="https://huggingface.co/FoolDev/Thanatos-27B-Heretic/resolve/main/banner.svg" alt="Thanatos-27B-Heretic banner" width="100%" />
49
 
50
  [![License](https://img.shields.io/badge/License-Apache_2.0-7aa2f7?style=flat&labelColor=1a1b26)](https://opensource.org/licenses/Apache-2.0)
51
  [![Base Model](https://img.shields.io/badge/Base-Qwen3.6--27B-bb9af7?style=flat&labelColor=1a1b26)](https://huggingface.co/Qwen/Qwen3.6-27B)
@@ -53,7 +53,7 @@ pipeline_tag: image-text-to-text
53
  [![Sibling](https://img.shields.io/badge/Sibling-Janus--35B-7dcfff?style=flat&labelColor=1a1b26)](https://huggingface.co/FoolDev/Janus-35B)
54
  [![Buy me a coffee](https://img.shields.io/badge/%E2%98%95%20Buy_me_a_coffee-e0af68?style=flat&logo=buymeacoffee&logoColor=1a1b26&labelColor=1a1b26)](https://buymeacoffee.com/cardoffoolm)
55
 
56
- # Thanatos-27B-Heretic
57
 
58
  > **Dense Reasoning. Friendlier Footprint.**
59
  > *Qwen 3.6 27B (dense) repackaged with Claude Opus 4.7 in the teacher slot.*
@@ -62,11 +62,6 @@ pipeline_tag: image-text-to-text
62
 
63
  A personal sibling to [`FoolDev/Janus-35B`](https://huggingface.co/FoolDev/Janus-35B). Same teacher (Claude Opus 4.7), same dataset family, but built on the **dense** [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) base instead of the 35B-A3B MoE. Smaller, easier to deploy, no expert-routing surprises.
64
 
65
- > **Note on the name.** The repo carries the `-Heretic` suffix from a
66
- > prior swap to `llmfan46/Qwen3.6-27B-uncensored-heretic-v2` that was
67
- > reverted. The current base is the vanilla `Qwen/Qwen3.6-27B`; the
68
- > name string and HF repo URL are kept for continuity.
69
-
70
  ## TL;DR
71
 
72
  One-liner via Hugging Face (pulls a GGUF + this repo's root-level
@@ -75,7 +70,7 @@ template — HF's Ollama bridge ingests those three files, not
75
  `Modelfile`):
76
 
77
  ```bash
78
- ollama run hf.co/FoolDev/Thanatos-27B-Heretic # ~17 GB Q4_K_M, qwen35-stamped, loads on stock Ollama
79
  ```
80
 
81
  If you pulled the bundle during any of the qwen36 windows on the
@@ -96,7 +91,7 @@ The 35B-A3B is a sparse mixture-of-experts model: 35B parameters total but only
96
 
97
  The 27B is **dense**: every parameter participates in every forward pass. It's slower per token than 35B-A3B — on a Ryzen AI Max+ 395 / Radeon 8060S iGPU the dense 27B at Q3_K_S clocks ~10 tok/s, versus ~27 tok/s for the MoE 35B at ~Q4 (`make bench`, 3-prompt mix) — but the working set fits comfortably on commodity GPUs and avoids the MoE-specific load-balance failure modes.
98
 
99
- | | Thanatos-27B-Heretic (this) | [Janus-35B](https://huggingface.co/FoolDev/Janus-35B) |
100
  |---|---|---|
101
  | Architecture | Dense transformer | MoE 256 experts, 8 active |
102
  | Total params | 27 B | 35 B |
@@ -118,11 +113,11 @@ The 27B is **dense**: every parameter participates in every forward pass. It's s
118
  | `banner.svg` / `banner.png` | Repo header, Tokyo Night themed |
119
  | `dense-flow.svg` / `dense-flow.png` | Architecture diagram: 64-layer hybrid attention stack with animated forward-pass pulse (SVG); static frame fallback (PNG) |
120
  | `Modelfile` | Ollama wrapper around the bundled Qwen 3.6 27B GGUF — used by `make build` / `ollama create` for **local** builds |
121
- | `template`, `system`, `params` | Used by HF's Ollama bridge when users `ollama run hf.co/FoolDev/Thanatos-27B-Heretic` directly (the bridge does **not** read `Modelfile` — see [HF Ollama docs](https://huggingface.co/docs/hub/en/ollama)). Mirrors the `Modelfile`'s template / system prompt / sampling params. |
122
  | `examples/` | Ready-to-run Python clients for Ollama, Transformers, and llama-cpp-python |
123
  | `scripts/build.sh` | Pulls a qwen35-stamped GGUF from `unsloth/Qwen3.6-27B-GGUF` and runs `ollama create` (loads on today's llama.cpp / Ollama; see `make build`) |
124
  | `scripts/load_bundle.sh` | One-shot path from *this repo's* bundle → loadable local Ollama tag (smudges LFS pointer via `hf download` if needed, runs `ollama create`; see `make load-bundle`). Carries a qwen36 → qwen35 rebadge branch for legacy pre-rename checkouts — no-op on the current qwen35-stamped bundle. |
125
- | `scripts/heal_hf_pull.sh` | Legacy recovery for users who pulled `hf.co/FoolDev/Thanatos-27B-Heretic` (or the pre-rename `FoolDev/Thanatos-27B`) *before* the latest qwen35 re-stamp and still have a qwen36-stamped blob in their local Ollama store: rebadges the blob qwen36 → qwen35 and rewrites the manifest's model-layer digest so the same tag becomes loadable in place. See `make heal-hf`. Idempotent and a no-op on tags already on qwen35 — fresh pulls don't need it. |
126
  | `scripts/smoke_test.sh` | Verifies an Ollama daemon + model, runs a round-trip, asserts no chat-template tokens leak into the response. With `TOOLS_TEST=1`, also exercises an end-to-end tool-call round-trip and checks the response shape |
127
  | `scripts/bench.sh` | Measures real tok/s using Ollama's `eval_count` / `eval_duration` metadata over a 3-prompt mix (run `make bench`) |
128
  | `scripts/fetch_vision.sh` | Pulls the vision projector (`mmproj-F16.gguf`) for llama.cpp (Ollama vision is broken upstream — see [Vision](#vision)). Renamed from `fetch_mmproj.sh` because HF's Ollama bridge auto-indexed the script as a vision projector layer (filename pattern match). |
@@ -138,7 +133,7 @@ The 27B is **dense**: every parameter participates in every forward pass. It's s
138
  For 16 GB GPUs / unified-memory laptops, `make build QUANT=Q3_K_S`
139
  downloads the smaller ~12 GB Q3_K_S quant from
140
  `unsloth/Qwen3.6-27B-GGUF` (qwen35-stamped, loads directly) and
141
- creates a local `thanatos-27b-heretic` Ollama tag. Does not redistribute
142
  via this repo. For other quants use `make build QUANT=...`. The
143
  local-build path applies this repo's `Modelfile`; the `hf.co/...`
144
  path applies the root-level `template`, `system`, and `params`
@@ -149,7 +144,7 @@ If you want the safetensors for `transformers`, fetch them from [`Qwen/Qwen3.6-2
149
  ## Architecture
150
 
151
  <p align="left">
152
- <img src="https://huggingface.co/FoolDev/Thanatos-27B-Heretic/resolve/main/dense-flow.svg" alt="animated dense forward-pass visualization: 64-layer hybrid attention stack with a pulse traversing left-to-right, illuminating Gated DeltaNet (purple) and Gated Attention (cyan) layers in turn" width="800" />
153
  </p>
154
 
155
  - Qwen 3.6 dense, 27B parameters, 64 transformer layers
@@ -206,7 +201,7 @@ There is no PR or tracking issue for a `qwen36` arch entry in
206
  `qwen35` already loads the model the upstream code path was
207
  designed to load.
208
 
209
- `ollama run hf.co/FoolDev/Thanatos-27B-Heretic` and `llama-server -m
210
  Thanatos-27B.Q4_K_M.gguf` both load directly on current stock
211
  loaders.
212
 
@@ -280,21 +275,21 @@ Three paths:
280
  ```bash
281
  # A. Pull straight from HF (gets the bundled Q4_K_M GGUF + the
282
  # root-level template / system / params files in one step):
283
- ollama run hf.co/FoolDev/Thanatos-27B-Heretic # 17 GB Q4_K_M, qwen35-stamped
284
 
285
- # B. Build a local `thanatos-27b-heretic` tag from THIS repo's bundle
286
  # (LFS smudge if needed, then `ollama create`). Useful if you
287
  # want a bare local tag rather than the `hf.co/...` path:
288
- make load-bundle # creates local tag thanatos-27b-heretic
289
- ollama run thanatos-27b-heretic
290
 
291
  # C. Bypass the bundle: download a qwen35-stamped GGUF from unsloth
292
  # and build locally. Loads on every current llama.cpp / Ollama.
293
- make build # Q4_K_M -> thanatos-27b-heretic
294
  make build QUANT=Q3_K_S # 12 GB smaller quant
295
  make build QUANT=Q5_K_M # 20 GB higher quality
296
  make build GGUF_PATH=~/models/Qwen3.6-27B-Q4_K_M.gguf # skip download
297
- ollama run thanatos-27b-heretic
298
  ```
299
 
300
  Under the hood, `make build` calls `scripts/build.sh`, which downloads the
@@ -302,7 +297,7 @@ GGUF if missing (set `GGUF_PATH` to point at one you already have) and
302
  runs `ollama create` with the matching `Modelfile`.
303
 
304
  If you'd rather do it by hand: edit the `FROM` line in `Modelfile` and
305
- run `ollama create thanatos-27b-heretic -f Modelfile && ollama run thanatos-27b-heretic`.
306
 
307
  Confirm everything works:
308
 
@@ -317,10 +312,10 @@ python examples/ollama_chat.py # full demo: chat, streaming, tools, OpenAI-
317
 
318
  | App | How to load this model |
319
  |---|---|
320
- | **Ollama** | `ollama run hf.co/FoolDev/Thanatos-27B-Heretic` (default Q4_K_M). Pulls the GGUF + the root-level `template` / `system` / `params` files in one step (HF's Ollama bridge ingests these three files; it does **not** read `Modelfile`). For other quants, `make build QUANT=Q3_K_S` downloads from unsloth and creates a local Ollama tag using the `Modelfile`, which is kept in sync with the bridge files. |
321
- | **LM Studio** | Search → `FoolDev/Thanatos-27B-Heretic` → pick `Thanatos-27B.Q4_K_M.gguf`. Uses the GGUF's embedded jinja chat template (Qwen 3.6 ChatML); set the system prompt manually from the `SYSTEM` block in this repo's `Modelfile`. |
322
- | **Jan** | Hub → "Import from Hugging Face" → `FoolDev/Thanatos-27B-Heretic`. Same template behavior as LM Studio. |
323
- | **llama.cpp** | `hf download FoolDev/Thanatos-27B-Heretic Thanatos-27B.Q4_K_M.gguf --local-dir .` then `llama-server -m Thanatos-27B.Q4_K_M.gguf` (or `llama-cli`, `llama-mtmd-cli` for vision via the upstream `mmproj-F16.gguf`). |
324
  | **llama-cpp-python** | See `examples/llama_cpp_quickstart.py` (text) and `examples/llama_cpp_vision.py` (image input). |
325
  | **Open WebUI / KoboldCpp / text-generation-webui** | Standard llama.cpp loader path — point at the GGUF, use the embedded chat template. |
326
 
@@ -338,7 +333,7 @@ external schema.
338
  curl -s http://localhost:11434/v1/chat/completions \
339
  -H 'Content-Type: application/json' \
340
  -d '{
341
- "model": "thanatos-27b-heretic",
342
  "messages": [
343
  {"role": "system", "content": "You are Thanatos, a precise reasoning assistant."},
344
  {"role": "user", "content": "Explain the Burrows-Wheeler transform in 200 words."}
@@ -472,10 +467,10 @@ Ollama is the exception: its conversion of the embedded jinja loses the
472
  `.Tools` / `.ToolCalls` blocks Ollama's capability detector requires.
473
  Two paths fix this, depending on how you pull the model:
474
 
475
- - **`ollama run hf.co/FoolDev/Thanatos-27B-Heretic`** — HF's Ollama bridge applies
476
  the root-level `template` / `system` / `params` files in this repo
477
  (the bridge does **not** read `Modelfile`).
478
- - **`make build` / `ollama create thanatos-27b-heretic -f Modelfile`** — uses the
479
  `Modelfile`'s `TEMPLATE` block.
480
 
481
  Both routes wire `.Tools` / `.ToolCalls` and tools work end-to-end on
@@ -518,7 +513,7 @@ the model adapts to whichever shape the system prompt prescribes.
518
  **Ollama path** (this repo's `Modelfile`). The `TEMPLATE` directive
519
  prompts the model to emit JSON-in-XML, the form Ollama's tool-call
520
  extractor parses into a structured `tool_calls` array. After
521
- `make build`, `ollama show thanatos-27b-heretic` lists `tools` and `thinking`
522
  under **Capabilities**, and both `/api/chat` and `/v1/chat/completions`
523
  accept a `tools` array.
524
 
 
45
  pipeline_tag: image-text-to-text
46
  ---
47
 
48
+ <img src="https://huggingface.co/FoolDev/Thanatos-27B/resolve/main/banner.svg" alt="Thanatos-27B banner" width="100%" />
49
 
50
  [![License](https://img.shields.io/badge/License-Apache_2.0-7aa2f7?style=flat&labelColor=1a1b26)](https://opensource.org/licenses/Apache-2.0)
51
  [![Base Model](https://img.shields.io/badge/Base-Qwen3.6--27B-bb9af7?style=flat&labelColor=1a1b26)](https://huggingface.co/Qwen/Qwen3.6-27B)
 
53
  [![Sibling](https://img.shields.io/badge/Sibling-Janus--35B-7dcfff?style=flat&labelColor=1a1b26)](https://huggingface.co/FoolDev/Janus-35B)
54
  [![Buy me a coffee](https://img.shields.io/badge/%E2%98%95%20Buy_me_a_coffee-e0af68?style=flat&logo=buymeacoffee&logoColor=1a1b26&labelColor=1a1b26)](https://buymeacoffee.com/cardoffoolm)
55
 
56
+ # Thanatos-27B
57
 
58
  > **Dense Reasoning. Friendlier Footprint.**
59
  > *Qwen 3.6 27B (dense) repackaged with Claude Opus 4.7 in the teacher slot.*
 
62
 
63
  A personal sibling to [`FoolDev/Janus-35B`](https://huggingface.co/FoolDev/Janus-35B). Same teacher (Claude Opus 4.7), same dataset family, but built on the **dense** [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) base instead of the 35B-A3B MoE. Smaller, easier to deploy, no expert-routing surprises.
64
 
 
 
 
 
 
65
  ## TL;DR
66
 
67
  One-liner via Hugging Face (pulls a GGUF + this repo's root-level
 
70
  `Modelfile`):
71
 
72
  ```bash
73
+ ollama run hf.co/FoolDev/Thanatos-27B # ~17 GB Q4_K_M, qwen35-stamped, loads on stock Ollama
74
  ```
75
 
76
  If you pulled the bundle during any of the qwen36 windows on the
 
91
 
92
  The 27B is **dense**: every parameter participates in every forward pass. It's slower per token than 35B-A3B — on a Ryzen AI Max+ 395 / Radeon 8060S iGPU the dense 27B at Q3_K_S clocks ~10 tok/s, versus ~27 tok/s for the MoE 35B at ~Q4 (`make bench`, 3-prompt mix) — but the working set fits comfortably on commodity GPUs and avoids the MoE-specific load-balance failure modes.
93
 
94
+ | | Thanatos-27B (this) | [Janus-35B](https://huggingface.co/FoolDev/Janus-35B) |
95
  |---|---|---|
96
  | Architecture | Dense transformer | MoE 256 experts, 8 active |
97
  | Total params | 27 B | 35 B |
 
113
  | `banner.svg` / `banner.png` | Repo header, Tokyo Night themed |
114
  | `dense-flow.svg` / `dense-flow.png` | Architecture diagram: 64-layer hybrid attention stack with animated forward-pass pulse (SVG); static frame fallback (PNG) |
115
  | `Modelfile` | Ollama wrapper around the bundled Qwen 3.6 27B GGUF — used by `make build` / `ollama create` for **local** builds |
116
+ | `template`, `system`, `params` | Used by HF's Ollama bridge when users `ollama run hf.co/FoolDev/Thanatos-27B` directly (the bridge does **not** read `Modelfile` — see [HF Ollama docs](https://huggingface.co/docs/hub/en/ollama)). Mirrors the `Modelfile`'s template / system prompt / sampling params. |
117
  | `examples/` | Ready-to-run Python clients for Ollama, Transformers, and llama-cpp-python |
118
  | `scripts/build.sh` | Pulls a qwen35-stamped GGUF from `unsloth/Qwen3.6-27B-GGUF` and runs `ollama create` (loads on today's llama.cpp / Ollama; see `make build`) |
119
  | `scripts/load_bundle.sh` | One-shot path from *this repo's* bundle → loadable local Ollama tag (smudges LFS pointer via `hf download` if needed, runs `ollama create`; see `make load-bundle`). Carries a qwen36 → qwen35 rebadge branch for legacy pre-rename checkouts — no-op on the current qwen35-stamped bundle. |
120
+ | `scripts/heal_hf_pull.sh` | Legacy recovery for users who pulled `hf.co/FoolDev/Thanatos-27B` (or the pre-rename `FoolDev/Thanatos-27B`) *before* the latest qwen35 re-stamp and still have a qwen36-stamped blob in their local Ollama store: rebadges the blob qwen36 → qwen35 and rewrites the manifest's model-layer digest so the same tag becomes loadable in place. See `make heal-hf`. Idempotent and a no-op on tags already on qwen35 — fresh pulls don't need it. |
121
  | `scripts/smoke_test.sh` | Verifies an Ollama daemon + model, runs a round-trip, asserts no chat-template tokens leak into the response. With `TOOLS_TEST=1`, also exercises an end-to-end tool-call round-trip and checks the response shape |
122
  | `scripts/bench.sh` | Measures real tok/s using Ollama's `eval_count` / `eval_duration` metadata over a 3-prompt mix (run `make bench`) |
123
  | `scripts/fetch_vision.sh` | Pulls the vision projector (`mmproj-F16.gguf`) for llama.cpp (Ollama vision is broken upstream — see [Vision](#vision)). Renamed from `fetch_mmproj.sh` because HF's Ollama bridge auto-indexed the script as a vision projector layer (filename pattern match). |
 
133
  For 16 GB GPUs / unified-memory laptops, `make build QUANT=Q3_K_S`
134
  downloads the smaller ~12 GB Q3_K_S quant from
135
  `unsloth/Qwen3.6-27B-GGUF` (qwen35-stamped, loads directly) and
136
+ creates a local `thanatos-27b` Ollama tag. Does not redistribute
137
  via this repo. For other quants use `make build QUANT=...`. The
138
  local-build path applies this repo's `Modelfile`; the `hf.co/...`
139
  path applies the root-level `template`, `system`, and `params`
 
144
  ## Architecture
145
 
146
  <p align="left">
147
+ <img src="https://huggingface.co/FoolDev/Thanatos-27B/resolve/main/dense-flow.svg" alt="animated dense forward-pass visualization: 64-layer hybrid attention stack with a pulse traversing left-to-right, illuminating Gated DeltaNet (purple) and Gated Attention (cyan) layers in turn" width="800" />
148
  </p>
149
 
150
  - Qwen 3.6 dense, 27B parameters, 64 transformer layers
 
201
  `qwen35` already loads the model the upstream code path was
202
  designed to load.
203
 
204
+ `ollama run hf.co/FoolDev/Thanatos-27B` and `llama-server -m
205
  Thanatos-27B.Q4_K_M.gguf` both load directly on current stock
206
  loaders.
207
 
 
275
  ```bash
276
  # A. Pull straight from HF (gets the bundled Q4_K_M GGUF + the
277
  # root-level template / system / params files in one step):
278
+ ollama run hf.co/FoolDev/Thanatos-27B # 17 GB Q4_K_M, qwen35-stamped
279
 
280
+ # B. Build a local `thanatos-27b` tag from THIS repo's bundle
281
  # (LFS smudge if needed, then `ollama create`). Useful if you
282
  # want a bare local tag rather than the `hf.co/...` path:
283
+ make load-bundle # creates local tag thanatos-27b
284
+ ollama run thanatos-27b
285
 
286
  # C. Bypass the bundle: download a qwen35-stamped GGUF from unsloth
287
  # and build locally. Loads on every current llama.cpp / Ollama.
288
+ make build # Q4_K_M -> thanatos-27b
289
  make build QUANT=Q3_K_S # 12 GB smaller quant
290
  make build QUANT=Q5_K_M # 20 GB higher quality
291
  make build GGUF_PATH=~/models/Qwen3.6-27B-Q4_K_M.gguf # skip download
292
+ ollama run thanatos-27b
293
  ```
294
 
295
  Under the hood, `make build` calls `scripts/build.sh`, which downloads the
 
297
  runs `ollama create` with the matching `Modelfile`.
298
 
299
  If you'd rather do it by hand: edit the `FROM` line in `Modelfile` and
300
+ run `ollama create thanatos-27b -f Modelfile && ollama run thanatos-27b`.
301
 
302
  Confirm everything works:
303
 
 
312
 
313
  | App | How to load this model |
314
  |---|---|
315
+ | **Ollama** | `ollama run hf.co/FoolDev/Thanatos-27B` (default Q4_K_M). Pulls the GGUF + the root-level `template` / `system` / `params` files in one step (HF's Ollama bridge ingests these three files; it does **not** read `Modelfile`). For other quants, `make build QUANT=Q3_K_S` downloads from unsloth and creates a local Ollama tag using the `Modelfile`, which is kept in sync with the bridge files. |
316
+ | **LM Studio** | Search → `FoolDev/Thanatos-27B` → pick `Thanatos-27B.Q4_K_M.gguf`. Uses the GGUF's embedded jinja chat template (Qwen 3.6 ChatML); set the system prompt manually from the `SYSTEM` block in this repo's `Modelfile`. |
317
+ | **Jan** | Hub → "Import from Hugging Face" → `FoolDev/Thanatos-27B`. Same template behavior as LM Studio. |
318
+ | **llama.cpp** | `hf download FoolDev/Thanatos-27B Thanatos-27B.Q4_K_M.gguf --local-dir .` then `llama-server -m Thanatos-27B.Q4_K_M.gguf` (or `llama-cli`, `llama-mtmd-cli` for vision via the upstream `mmproj-F16.gguf`). |
319
  | **llama-cpp-python** | See `examples/llama_cpp_quickstart.py` (text) and `examples/llama_cpp_vision.py` (image input). |
320
  | **Open WebUI / KoboldCpp / text-generation-webui** | Standard llama.cpp loader path — point at the GGUF, use the embedded chat template. |
321
 
 
333
  curl -s http://localhost:11434/v1/chat/completions \
334
  -H 'Content-Type: application/json' \
335
  -d '{
336
+ "model": "thanatos-27b",
337
  "messages": [
338
  {"role": "system", "content": "You are Thanatos, a precise reasoning assistant."},
339
  {"role": "user", "content": "Explain the Burrows-Wheeler transform in 200 words."}
 
467
  `.Tools` / `.ToolCalls` blocks Ollama's capability detector requires.
468
  Two paths fix this, depending on how you pull the model:
469
 
470
+ - **`ollama run hf.co/FoolDev/Thanatos-27B`** — HF's Ollama bridge applies
471
  the root-level `template` / `system` / `params` files in this repo
472
  (the bridge does **not** read `Modelfile`).
473
+ - **`make build` / `ollama create thanatos-27b -f Modelfile`** — uses the
474
  `Modelfile`'s `TEMPLATE` block.
475
 
476
  Both routes wire `.Tools` / `.ToolCalls` and tools work end-to-end on
 
513
  **Ollama path** (this repo's `Modelfile`). The `TEMPLATE` directive
514
  prompts the model to emit JSON-in-XML, the form Ollama's tool-call
515
  extractor parses into a structured `tool_calls` array. After
516
+ `make build`, `ollama show thanatos-27b` lists `tools` and `thinking`
517
  under **Capabilities**, and both `/api/chat` and `/v1/chat/completions`
518
  accept a `tools` array.
519
 
banner.png CHANGED
banner.svg CHANGED
examples/README.md CHANGED
@@ -1,10 +1,10 @@
1
- # Thanatos-27B-Heretic examples
2
 
3
  Four minimal entry points. Pick the one that matches how you run models.
4
 
5
  | File | Backend | When to use |
6
  |---|---|---|
7
- | `ollama_chat.py` | Ollama HTTP API | You already have `ollama serve` running and the `thanatos-27b-heretic` model created from the project `Modelfile`. **Text + tool calling** — vision via Ollama is broken upstream for this arch. |
8
  | `transformers_quickstart.py` | Hugging Face Transformers | You want to run the upstream safetensors (`Qwen/Qwen3.6-27B`) on GPU, optionally in 4-bit via bitsandbytes. |
9
  | `llama_cpp_quickstart.py` | llama-cpp-python | You want to invoke a local GGUF directly without a daemon (CI, batch jobs, scripts). Text only. |
10
  | `llama_cpp_vision.py` | llama-cpp-python + mmproj | **Image input.** Loads a text GGUF + `mmproj-F16.gguf` and answers questions about an image. The only working vision path right now. |
@@ -24,9 +24,9 @@ root-level `template` / `system` / `params` files via HF's Ollama
24
  bridge):
25
 
26
  ```bash
27
- ollama pull hf.co/FoolDev/Thanatos-27B-Heretic # 17 GB Q4_K_M (only bundled quant)
28
  pip install requests
29
- MODEL=hf.co/FoolDev/Thanatos-27B-Heretic python ollama_chat.py
30
  ```
31
 
32
  If you pulled before the latest qwen35 re-stamp (HF commit
@@ -38,11 +38,11 @@ through.
38
 
39
  For a non-bundled quant (e.g. Q3_K_S ~12 GB, Q5_K_M ~20 GB),
40
  `make build QUANT=...` downloads from `unsloth/Qwen3.6-27B-GGUF`
41
- and creates a local `thanatos-27b-heretic` tag:
42
 
43
  ```bash
44
  cd .. && make build QUANT=Q3_K_S && cd examples
45
- MODEL=thanatos-27b-heretic python ollama_chat.py
46
  ```
47
 
48
  Or build a local tag from this repo's bundled GGUF without going
@@ -50,7 +50,7 @@ through the HF pull:
50
 
51
  ```bash
52
  cd .. && make load-bundle && cd examples
53
- MODEL=thanatos-27b-heretic python ollama_chat.py
54
  ```
55
 
56
  For a quant the repo doesn't bundle (e.g. Q5_K_M), `make build` will
 
1
+ # Thanatos-27B examples
2
 
3
  Four minimal entry points. Pick the one that matches how you run models.
4
 
5
  | File | Backend | When to use |
6
  |---|---|---|
7
+ | `ollama_chat.py` | Ollama HTTP API | You already have `ollama serve` running and the `thanatos-27b` model created from the project `Modelfile`. **Text + tool calling** — vision via Ollama is broken upstream for this arch. |
8
  | `transformers_quickstart.py` | Hugging Face Transformers | You want to run the upstream safetensors (`Qwen/Qwen3.6-27B`) on GPU, optionally in 4-bit via bitsandbytes. |
9
  | `llama_cpp_quickstart.py` | llama-cpp-python | You want to invoke a local GGUF directly without a daemon (CI, batch jobs, scripts). Text only. |
10
  | `llama_cpp_vision.py` | llama-cpp-python + mmproj | **Image input.** Loads a text GGUF + `mmproj-F16.gguf` and answers questions about an image. The only working vision path right now. |
 
24
  bridge):
25
 
26
  ```bash
27
+ ollama pull hf.co/FoolDev/Thanatos-27B # 17 GB Q4_K_M (only bundled quant)
28
  pip install requests
29
+ MODEL=hf.co/FoolDev/Thanatos-27B python ollama_chat.py
30
  ```
31
 
32
  If you pulled before the latest qwen35 re-stamp (HF commit
 
38
 
39
  For a non-bundled quant (e.g. Q3_K_S ~12 GB, Q5_K_M ~20 GB),
40
  `make build QUANT=...` downloads from `unsloth/Qwen3.6-27B-GGUF`
41
+ and creates a local `thanatos-27b` tag:
42
 
43
  ```bash
44
  cd .. && make build QUANT=Q3_K_S && cd examples
45
+ MODEL=thanatos-27b python ollama_chat.py
46
  ```
47
 
48
  Or build a local tag from this repo's bundled GGUF without going
 
50
 
51
  ```bash
52
  cd .. && make load-bundle && cd examples
53
+ MODEL=thanatos-27b python ollama_chat.py
54
  ```
55
 
56
  For a quant the repo doesn't bundle (e.g. Q5_K_M), `make build` will
examples/llama_cpp_quickstart.py CHANGED
@@ -1,6 +1,6 @@
1
  #!/usr/bin/env python3
2
  """
3
- Thanatos-27B-Heretic — llama-cpp-python quickstart.
4
 
5
  Skip Ollama entirely and call the GGUF directly through llama-cpp-python.
6
  Useful for batch jobs, CI, or environments where you don't want a daemon.
 
1
  #!/usr/bin/env python3
2
  """
3
+ Thanatos-27B — llama-cpp-python quickstart.
4
 
5
  Skip Ollama entirely and call the GGUF directly through llama-cpp-python.
6
  Useful for batch jobs, CI, or environments where you don't want a daemon.
examples/llama_cpp_vision.py CHANGED
@@ -1,6 +1,6 @@
1
  #!/usr/bin/env python3
2
  """
3
- Thanatos-27B-Heretic — vision (image-text-to-text) via llama-cpp-python.
4
 
5
  Why this script exists:
6
  Ollama's Go engine has the qwen35 / qwen35moe arch entries (text
 
1
  #!/usr/bin/env python3
2
  """
3
+ Thanatos-27B — vision (image-text-to-text) via llama-cpp-python.
4
 
5
  Why this script exists:
6
  Ollama's Go engine has the qwen35 / qwen35moe arch entries (text
examples/ollama_chat.py CHANGED
@@ -1,17 +1,17 @@
1
  #!/usr/bin/env python3
2
  """
3
- Thanatos-27B-Heretic — Ollama chat examples.
4
 
5
  Prerequisites (pick one):
6
 
7
  A. From the bundled GGUFs (default flow):
8
  $ make build # uses Thanatos-27B.Q4_K_M.gguf
9
  # or:
10
- $ ollama create thanatos-27b-heretic -f ../Modelfile
11
 
12
  B. Pull straight from HF (Q4_K_M is the only bundled quant):
13
- $ ollama run hf.co/FoolDev/Thanatos-27B-Heretic
14
- # then set MODEL=hf.co/FoolDev/Thanatos-27B-Heretic below
15
 
16
  Then:
17
  $ ollama serve # usually already running
@@ -39,7 +39,7 @@ from typing import Any, Iterator
39
 
40
  import requests
41
 
42
- MODEL = os.environ.get("MODEL", "thanatos-27b-heretic")
43
  HOST = os.environ.get("HOST", "http://localhost:11434")
44
 
45
  _THINK_RE = re.compile(r"<think>.*?</think>\s*", re.DOTALL)
 
1
  #!/usr/bin/env python3
2
  """
3
+ Thanatos-27B — Ollama chat examples.
4
 
5
  Prerequisites (pick one):
6
 
7
  A. From the bundled GGUFs (default flow):
8
  $ make build # uses Thanatos-27B.Q4_K_M.gguf
9
  # or:
10
+ $ ollama create thanatos-27b -f ../Modelfile
11
 
12
  B. Pull straight from HF (Q4_K_M is the only bundled quant):
13
+ $ ollama run hf.co/FoolDev/Thanatos-27B
14
+ # then set MODEL=hf.co/FoolDev/Thanatos-27B below
15
 
16
  Then:
17
  $ ollama serve # usually already running
 
39
 
40
  import requests
41
 
42
+ MODEL = os.environ.get("MODEL", "thanatos-27b")
43
  HOST = os.environ.get("HOST", "http://localhost:11434")
44
 
45
  _THINK_RE = re.compile(r"<think>.*?</think>\s*", re.DOTALL)
examples/transformers_quickstart.py CHANGED
@@ -1,9 +1,9 @@
1
  #!/usr/bin/env python3
2
  """
3
- Thanatos-27B-Heretic — Hugging Face Transformers quickstart.
4
 
5
  Loads the upstream Qwen 3.6 27B safetensors directly and runs a single
6
- chat turn using its embedded chat template. Thanatos-27B-Heretic is a
7
  *wrapper* around that base, so for the transformers route there is nothing
8
  to download from this repo — point at Qwen/Qwen3.6-27B and apply the same
9
  system prompt the Modelfile uses.
 
1
  #!/usr/bin/env python3
2
  """
3
+ Thanatos-27B — Hugging Face Transformers quickstart.
4
 
5
  Loads the upstream Qwen 3.6 27B safetensors directly and runs a single
6
+ chat turn using its embedded chat template. Thanatos-27B is a
7
  *wrapper* around that base, so for the transformers route there is nothing
8
  to download from this repo — point at Qwen/Qwen3.6-27B and apply the same
9
  system prompt the Modelfile uses.
scripts/bench.sh CHANGED
@@ -1,5 +1,5 @@
1
  #!/usr/bin/env bash
2
- # Thanatos-27B-Heretic — tok/s benchmark via Ollama.
3
  #
4
  # Reads timing from Ollama's /api/chat response metadata (eval_count and
5
  # eval_duration are authoritative — no client-side stopwatch noise) and
@@ -7,14 +7,14 @@
7
  # number generalises a bit beyond a single shape.
8
  #
9
  # Usage:
10
- # ./scripts/bench.sh # uses MODEL=thanatos-27b-heretic
11
- # MODEL=thanatos-27b-heretic ./scripts/bench.sh
12
  # HOST=http://localhost:11434 ./scripts/bench.sh
13
  #
14
  # Requires: curl, jq, a running Ollama daemon with the model created.
15
  set -euo pipefail
16
 
17
- MODEL="${MODEL:-thanatos-27b-heretic}"
18
  HOST="${HOST:-http://localhost:11434}"
19
 
20
  red() { printf "\033[31m%s\033[0m\n" "$*" >&2; }
 
1
  #!/usr/bin/env bash
2
+ # Thanatos-27B — tok/s benchmark via Ollama.
3
  #
4
  # Reads timing from Ollama's /api/chat response metadata (eval_count and
5
  # eval_duration are authoritative — no client-side stopwatch noise) and
 
7
  # number generalises a bit beyond a single shape.
8
  #
9
  # Usage:
10
+ # ./scripts/bench.sh # uses MODEL=thanatos-27b
11
+ # MODEL=thanatos-27b ./scripts/bench.sh
12
  # HOST=http://localhost:11434 ./scripts/bench.sh
13
  #
14
  # Requires: curl, jq, a running Ollama daemon with the model created.
15
  set -euo pipefail
16
 
17
+ MODEL="${MODEL:-thanatos-27b}"
18
  HOST="${HOST:-http://localhost:11434}"
19
 
20
  red() { printf "\033[31m%s\033[0m\n" "$*" >&2; }
scripts/build.sh CHANGED
@@ -1,5 +1,5 @@
1
  #!/usr/bin/env bash
2
- # Thanatos-27B-Heretic — fetch a Qwen 3.6 27B GGUF and build the Ollama model.
3
  #
4
  # Usage:
5
  # ./scripts/build.sh # default: Q4_K_M
@@ -28,7 +28,7 @@ ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
28
  GGUF_PATH="${GGUF_PATH:-${ROOT}/${GGUF_NAME}}"
29
 
30
  MODELFILE="${ROOT}/Modelfile"
31
- TAG="${TAG:-thanatos-27b-heretic}"
32
 
33
  echo "[*] repo: ${REPO_ID}"
34
  echo "[*] quant: ${QUANT}"
@@ -96,4 +96,4 @@ ollama create "${TAG}" -f "${TMP_MODELFILE}"
96
  echo
97
  echo "[+] Done. Try it:"
98
  echo " ollama run ${TAG}"
99
- echo " python ${ROOT}/examples/ollama_chat.py # update MODEL constant if not 'thanatos-27b-heretic'"
 
1
  #!/usr/bin/env bash
2
+ # Thanatos-27B — fetch a Qwen 3.6 27B GGUF and build the Ollama model.
3
  #
4
  # Usage:
5
  # ./scripts/build.sh # default: Q4_K_M
 
28
  GGUF_PATH="${GGUF_PATH:-${ROOT}/${GGUF_NAME}}"
29
 
30
  MODELFILE="${ROOT}/Modelfile"
31
+ TAG="${TAG:-thanatos-27b}"
32
 
33
  echo "[*] repo: ${REPO_ID}"
34
  echo "[*] quant: ${QUANT}"
 
96
  echo
97
  echo "[+] Done. Try it:"
98
  echo " ollama run ${TAG}"
99
+ echo " python ${ROOT}/examples/ollama_chat.py # update MODEL constant if not 'thanatos-27b'"
scripts/check.sh CHANGED
@@ -1,5 +1,5 @@
1
  #!/usr/bin/env bash
2
- # Thanatos-27B-Heretic — repo-local sanity checks.
3
  #
4
  # Runs everything that's cheap and catches a real-world bug we've already hit:
5
  #
 
1
  #!/usr/bin/env bash
2
+ # Thanatos-27B — repo-local sanity checks.
3
  #
4
  # Runs everything that's cheap and catches a real-world bug we've already hit:
5
  #
scripts/check_bridge_sync.py CHANGED
@@ -1,13 +1,13 @@
1
  #!/usr/bin/env python3
2
  """
3
- Thanatos-27B-Heretic — verify Modelfile and HF Ollama bridge files stay in sync.
4
 
5
  The repo ships two parallel Ollama configurations:
6
 
7
  - ``Modelfile`` is consumed by the local-build path (``ollama create -f Modelfile``).
8
  It contains ``TEMPLATE`` / ``SYSTEM`` / ``PARAMETER`` directives.
9
  - ``template`` / ``system`` / ``params`` at the repo root are consumed by HF's
10
- Ollama bridge when users ``ollama run hf.co/FoolDev/Thanatos-27B-Heretic`` directly. HF
11
  does NOT read the Modelfile (per https://huggingface.co/docs/hub/en/ollama).
12
 
13
  If the two configurations drift apart, ``hf.co/...`` users and ``make build``
 
1
  #!/usr/bin/env python3
2
  """
3
+ Thanatos-27B — verify Modelfile and HF Ollama bridge files stay in sync.
4
 
5
  The repo ships two parallel Ollama configurations:
6
 
7
  - ``Modelfile`` is consumed by the local-build path (``ollama create -f Modelfile``).
8
  It contains ``TEMPLATE`` / ``SYSTEM`` / ``PARAMETER`` directives.
9
  - ``template`` / ``system`` / ``params`` at the repo root are consumed by HF's
10
+ Ollama bridge when users ``ollama run hf.co/FoolDev/Thanatos-27B`` directly. HF
11
  does NOT read the Modelfile (per https://huggingface.co/docs/hub/en/ollama).
12
 
13
  If the two configurations drift apart, ``hf.co/...`` users and ``make build``
scripts/fetch_vision.sh CHANGED
@@ -1,5 +1,5 @@
1
  #!/usr/bin/env bash
2
- # Thanatos-27B-Heretic — fetch the vision projector (mmproj) for image input.
3
  #
4
  # Why this is separate from build.sh:
5
  # build.sh is for the Ollama text path. The mmproj is only useful for
 
1
  #!/usr/bin/env bash
2
+ # Thanatos-27B — fetch the vision projector (mmproj) for image input.
3
  #
4
  # Why this is separate from build.sh:
5
  # build.sh is for the Ollama text path. The mmproj is only useful for
scripts/heal_hf_pull.sh CHANGED
@@ -1,10 +1,10 @@
1
  #!/usr/bin/env bash
2
- # Thanatos-27B-Heretic — heal a previously pulled HF-bridge tag whose bundled
3
  # GGUF is `qwen36`-stamped (legacy v0.6.0-era pulls before `964e418`,
4
  # 3rd-round-trip-era pulls between `973d7ef` and `978798f`, or
5
  # 5th-round-trip-era pulls between `ae67ed1` and `e03e10e`).
6
  #
7
- # Fresh pulls of `ollama run hf.co/FoolDev/Thanatos-27B-Heretic` now get the
8
  # qwen35-stamped bundle and load directly — this script is the
9
  # recovery path for users who pulled a qwen36-stamped blob into
10
  # their local Ollama store during one of the qwen36 windows
@@ -13,7 +13,7 @@
13
  # It rebadges the HF-bridge tag's model blob in-place (qwen36 ->
14
  # qwen35, metadata-only, byte-identical tensors) and rewrites the
15
  # manifest's model-layer digest to point at the new blob. After
16
- # running, the cached `hf.co/FoolDev/Thanatos-27B-Heretic` tag loads.
17
  #
18
  # Idempotent: a tag already on qwen35 / qwen35moe is left untouched.
19
  # The current bundle is qwen35-stamped so this script is a no-op for
@@ -22,13 +22,13 @@
22
  #
23
  # Usage:
24
  # ./scripts/heal_hf_pull.sh # default tag
25
- # TAG=hf.co/FoolDev/Thanatos-27B-Heretic:Q4_K_M ./scripts/heal_hf_pull.sh
26
  #
27
  # Requires: ollama, jq, python3 with the `gguf` package, sha256sum.
28
  set -euo pipefail
29
 
30
  ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
31
- TAG="${TAG:-hf.co/FoolDev/Thanatos-27B-Heretic:Q4_K_M}"
32
  OLLAMA_MODELS="${OLLAMA_MODELS:-${HOME}/.ollama/models}"
33
 
34
  red() { printf "\033[31m%s\033[0m\n" "$*"; }
@@ -50,7 +50,7 @@ done
50
 
51
  # `ollama show --modelfile` writes a FROM line with the absolute blob path.
52
  # Reliable regardless of which case variant the user pulled with
53
- # (hf.co's 307 lets `Thanatos-27B-Heretic` and `thanatos-27b-heretic` both resolve to the
54
  # canonical repo, and ollama stores the manifest under whichever case
55
  # was first registered).
56
  #
@@ -79,8 +79,8 @@ blue "[*] blob: ${MODEL_BLOB}"
79
  # referenced from exactly one tag in the heal scenario — fresh HF pull
80
  # of a single :Q4_K_M tag — but if someone has multiple tags pointing
81
  # at the same blob, we filter down to the one matching ${TAG}.
82
- TAG_PATH="${TAG#hf.co/}" # FoolDev/Thanatos-27B-Heretic:Q4_K_M
83
- NAMESPACE_PATH="${TAG_PATH%:*}" # FoolDev/Thanatos-27B-Heretic
84
  TAG_FILE="${TAG_PATH##*:}" # Q4_K_M
85
 
86
  MANIFEST="$(find "${OLLAMA_MODELS}/manifests/hf.co" \
 
1
  #!/usr/bin/env bash
2
+ # Thanatos-27B — heal a previously pulled HF-bridge tag whose bundled
3
  # GGUF is `qwen36`-stamped (legacy v0.6.0-era pulls before `964e418`,
4
  # 3rd-round-trip-era pulls between `973d7ef` and `978798f`, or
5
  # 5th-round-trip-era pulls between `ae67ed1` and `e03e10e`).
6
  #
7
+ # Fresh pulls of `ollama run hf.co/FoolDev/Thanatos-27B` now get the
8
  # qwen35-stamped bundle and load directly — this script is the
9
  # recovery path for users who pulled a qwen36-stamped blob into
10
  # their local Ollama store during one of the qwen36 windows
 
13
  # It rebadges the HF-bridge tag's model blob in-place (qwen36 ->
14
  # qwen35, metadata-only, byte-identical tensors) and rewrites the
15
  # manifest's model-layer digest to point at the new blob. After
16
+ # running, the cached `hf.co/FoolDev/Thanatos-27B` tag loads.
17
  #
18
  # Idempotent: a tag already on qwen35 / qwen35moe is left untouched.
19
  # The current bundle is qwen35-stamped so this script is a no-op for
 
22
  #
23
  # Usage:
24
  # ./scripts/heal_hf_pull.sh # default tag
25
+ # TAG=hf.co/FoolDev/Thanatos-27B:Q4_K_M ./scripts/heal_hf_pull.sh
26
  #
27
  # Requires: ollama, jq, python3 with the `gguf` package, sha256sum.
28
  set -euo pipefail
29
 
30
  ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
31
+ TAG="${TAG:-hf.co/FoolDev/Thanatos-27B:Q4_K_M}"
32
  OLLAMA_MODELS="${OLLAMA_MODELS:-${HOME}/.ollama/models}"
33
 
34
  red() { printf "\033[31m%s\033[0m\n" "$*"; }
 
50
 
51
  # `ollama show --modelfile` writes a FROM line with the absolute blob path.
52
  # Reliable regardless of which case variant the user pulled with
53
+ # (hf.co's 307 lets `Thanatos-27B` and `thanatos-27b` both resolve to the
54
  # canonical repo, and ollama stores the manifest under whichever case
55
  # was first registered).
56
  #
 
79
  # referenced from exactly one tag in the heal scenario — fresh HF pull
80
  # of a single :Q4_K_M tag — but if someone has multiple tags pointing
81
  # at the same blob, we filter down to the one matching ${TAG}.
82
+ TAG_PATH="${TAG#hf.co/}" # FoolDev/Thanatos-27B:Q4_K_M
83
+ NAMESPACE_PATH="${TAG_PATH%:*}" # FoolDev/Thanatos-27B
84
  TAG_FILE="${TAG_PATH##*:}" # Q4_K_M
85
 
86
  MANIFEST="$(find "${OLLAMA_MODELS}/manifests/hf.co" \
scripts/install-hooks.sh CHANGED
@@ -1,5 +1,5 @@
1
  #!/usr/bin/env bash
2
- # Thanatos-27B-Heretic — install scripts/check.sh as a git pre-commit hook.
3
  #
4
  # Idempotent. Re-runs are safe.
5
  set -euo pipefail
 
1
  #!/usr/bin/env bash
2
+ # Thanatos-27B — install scripts/check.sh as a git pre-commit hook.
3
  #
4
  # Idempotent. Re-runs are safe.
5
  set -euo pipefail
scripts/load_bundle.sh CHANGED
@@ -1,5 +1,5 @@
1
  #!/usr/bin/env bash
2
- # Thanatos-27B-Heretic — load this repo's bundle into Ollama as a local tag.
3
  #
4
  # The bundled GGUF (Thanatos-27B.Q4_K_M.gguf) is qwen35-stamped and
5
  # loads directly on stock llama.cpp / Ollama. This script is the
@@ -15,13 +15,13 @@
15
  # 3. Run `ollama create <tag> -f <temp Modelfile pointing at the
16
  # resolved bundle>`.
17
  #
18
- # Useful if you want a bare local tag (`thanatos-27b-heretic`) rather than
19
- # the `hf.co/FoolDev/Thanatos-27B-Heretic` path. The legacy qwen36 rebadge
20
  # branch is kept for anyone working from a pre-e03e10e checkout.
21
  #
22
  # Usage:
23
- # ./scripts/load_bundle.sh # default tag: thanatos-27b-heretic
24
- # TAG=thanatos-27b-heretic-bundle ./scripts/load_bundle.sh
25
  # BUNDLE=/path/to/Thanatos-27B.Q4_K_M.gguf ./scripts/load_bundle.sh
26
  #
27
  # Requires: ollama, python3 with the `gguf` package, hf (if the bundle
@@ -30,8 +30,8 @@ set -euo pipefail
30
 
31
  ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
32
  BUNDLE="${BUNDLE:-${ROOT}/Thanatos-27B.Q4_K_M.gguf}"
33
- TAG="${TAG:-thanatos-27b-heretic}"
34
- REPO_ID="${REPO_ID:-FoolDev/Thanatos-27B-Heretic}"
35
  MODELFILE="${ROOT}/Modelfile"
36
 
37
  red() { printf "\033[31m%s\033[0m\n" "$*"; }
 
1
  #!/usr/bin/env bash
2
+ # Thanatos-27B — load this repo's bundle into Ollama as a local tag.
3
  #
4
  # The bundled GGUF (Thanatos-27B.Q4_K_M.gguf) is qwen35-stamped and
5
  # loads directly on stock llama.cpp / Ollama. This script is the
 
15
  # 3. Run `ollama create <tag> -f <temp Modelfile pointing at the
16
  # resolved bundle>`.
17
  #
18
+ # Useful if you want a bare local tag (`thanatos-27b`) rather than
19
+ # the `hf.co/FoolDev/Thanatos-27B` path. The legacy qwen36 rebadge
20
  # branch is kept for anyone working from a pre-e03e10e checkout.
21
  #
22
  # Usage:
23
+ # ./scripts/load_bundle.sh # default tag: thanatos-27b
24
+ # TAG=thanatos-27b-bundle ./scripts/load_bundle.sh
25
  # BUNDLE=/path/to/Thanatos-27B.Q4_K_M.gguf ./scripts/load_bundle.sh
26
  #
27
  # Requires: ollama, python3 with the `gguf` package, hf (if the bundle
 
30
 
31
  ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
32
  BUNDLE="${BUNDLE:-${ROOT}/Thanatos-27B.Q4_K_M.gguf}"
33
+ TAG="${TAG:-thanatos-27b}"
34
+ REPO_ID="${REPO_ID:-FoolDev/Thanatos-27B}"
35
  MODELFILE="${ROOT}/Modelfile"
36
 
37
  red() { printf "\033[31m%s\033[0m\n" "$*"; }
scripts/smoke_test.sh CHANGED
@@ -1,5 +1,5 @@
1
  #!/usr/bin/env bash
2
- # Thanatos-27B-Heretic — smoke test against a running Ollama daemon.
3
  #
4
  # Verifies:
5
  # 1. The Ollama server is reachable.
@@ -14,11 +14,11 @@
14
  # Usage:
15
  # ./scripts/smoke_test.sh # fast checks only
16
  # TOOLS_TEST=1 ./scripts/smoke_test.sh # add tool-call round-trip
17
- # MODEL=hf.co/FoolDev/Thanatos-27B-Heretic:Q4_K_M ./scripts/smoke_test.sh
18
  # HOST=http://localhost:11434 ./scripts/smoke_test.sh
19
  set -euo pipefail
20
 
21
- MODEL="${MODEL:-thanatos-27b-heretic}"
22
  HOST="${HOST:-http://localhost:11434}"
23
  PROMPT="${PROMPT:-Reply with the single word: OK}"
24
 
@@ -46,9 +46,9 @@ green "[+] server reachable"
46
 
47
  # 2. Model present? Match case-insensitively: Ollama 0.24 normalizes
48
  # model names at lookup but preserves whatever case was first registered
49
- # on disk (e.g. `make load-bundle` may produce `Thanatos-27B-Heretic:latest`
50
- # even when invoked with TAG=thanatos-27b-heretic, if an earlier session left a
51
- # Thanatos-27B-Heretic manifest dir behind). The exact tag the user typed is
52
  # still valid for `ollama run` — the comparison just needs to be
53
  # case-folded to match.
54
  if ! curl -fsS "${HOST}/api/tags" | jq -e --arg m "${MODEL}" '.models[] | select((.name | ascii_downcase) | startswith($m | ascii_downcase))' >/dev/null; then
 
1
  #!/usr/bin/env bash
2
+ # Thanatos-27B — smoke test against a running Ollama daemon.
3
  #
4
  # Verifies:
5
  # 1. The Ollama server is reachable.
 
14
  # Usage:
15
  # ./scripts/smoke_test.sh # fast checks only
16
  # TOOLS_TEST=1 ./scripts/smoke_test.sh # add tool-call round-trip
17
+ # MODEL=hf.co/FoolDev/Thanatos-27B:Q4_K_M ./scripts/smoke_test.sh
18
  # HOST=http://localhost:11434 ./scripts/smoke_test.sh
19
  set -euo pipefail
20
 
21
+ MODEL="${MODEL:-thanatos-27b}"
22
  HOST="${HOST:-http://localhost:11434}"
23
  PROMPT="${PROMPT:-Reply with the single word: OK}"
24
 
 
46
 
47
  # 2. Model present? Match case-insensitively: Ollama 0.24 normalizes
48
  # model names at lookup but preserves whatever case was first registered
49
+ # on disk (e.g. `make load-bundle` may produce `Thanatos-27B:latest`
50
+ # even when invoked with TAG=thanatos-27b, if an earlier session left a
51
+ # Thanatos-27B manifest dir behind). The exact tag the user typed is
52
  # still valid for `ollama run` — the comparison just needs to be
53
  # case-folded to match.
54
  if ! curl -fsS "${HOST}/api/tags" | jq -e --arg m "${MODEL}" '.models[] | select((.name | ascii_downcase) | startswith($m | ascii_downcase))' >/dev/null; then
scripts/verify_arch.py CHANGED
@@ -1,6 +1,6 @@
1
  #!/usr/bin/env python3
2
  """
3
- Thanatos-27B-Heretic — verify the README "Architecture" forward-pass bullets
4
  against the actual GGUF metadata.
5
 
6
  Reads either the qwen35- or qwen36-stamped bundle (or any GGUF that
@@ -69,8 +69,8 @@ def main() -> int:
69
  return 2
70
  root = Path(__file__).resolve().parent.parent
71
  default_paths = [
72
- root / "Thanatos-27B-Heretic.Q4_K_M.qwen35.gguf",
73
- root / "Thanatos-27B-Heretic.Q4_K_M.qwen36.gguf",
74
  root / "Thanatos-27B.Q4_K_M.gguf",
75
  ]
76
  if len(sys.argv) == 2:
@@ -78,7 +78,7 @@ def main() -> int:
78
  else:
79
  path = next((p for p in default_paths if p.exists() and p.stat().st_size > 1024), None)
80
  if path is None:
81
- print("[!] no Thanatos-27B-Heretic GGUF found in repo root; pass a path explicitly", file=sys.stderr)
82
  return 2
83
 
84
  print(f"[*] reading: {path}")
 
1
  #!/usr/bin/env python3
2
  """
3
+ Thanatos-27B — verify the README "Architecture" forward-pass bullets
4
  against the actual GGUF metadata.
5
 
6
  Reads either the qwen35- or qwen36-stamped bundle (or any GGUF that
 
69
  return 2
70
  root = Path(__file__).resolve().parent.parent
71
  default_paths = [
72
+ root / "Thanatos-27B.Q4_K_M.qwen35.gguf",
73
+ root / "Thanatos-27B.Q4_K_M.qwen36.gguf",
74
  root / "Thanatos-27B.Q4_K_M.gguf",
75
  ]
76
  if len(sys.argv) == 2:
 
78
  else:
79
  path = next((p for p in default_paths if p.exists() and p.stat().st_size > 1024), None)
80
  if path is None:
81
+ print("[!] no Thanatos-27B GGUF found in repo root; pass a path explicitly", file=sys.stderr)
82
  return 2
83
 
84
  print(f"[*] reading: {path}")