FoolDev Claude Opus 4.7 commited on
Commit
16e1ddd
Β·
1 Parent(s): 9cf363e

Rename to Thanatos-Heretic-27B and swap base to llmfan46 Heretic v2

Browse files

Project rename Thanatos-27B -> Thanatos-Heretic-27B (Ollama tag
thanatos-heretic-27b) and immediate-base swap from Qwen/Qwen3.6-27B
to llmfan46/Qwen3.6-27B-uncensored-heretic-v2 (an uncensored Heretic
abliteration of the same Qwen 3.6 27B dense arch).

Docs + Modelfile + scripts only β€” bundled Thanatos-27B.Q4_K_M.gguf
LFS pointer unchanged. The blob is still the legacy pre-Heretic
Qwen quant; README "Bundled blob status" callout + Known Limitations
warn users until the rebundle ships.

- scripts/build.sh: REPO_ID -> llmfan46 Heretic GGUF, filename
pattern Qwen3.6-27B-uncensored-heretic-v2-${QUANT}.gguf, default
TAG thanatos-heretic-27b. Q3_K_S replaced by Q3_K_M throughout
(Heretic repo doesn't publish Q3_K_S).
- scripts/fetch_vision.sh: PRECISION=BF16, REPO_ID -> llmfan46,
FILE_NAME=Qwen3.6-27B-mmproj-BF16.gguf. Unsloth's mmproj-F16.gguf
documented as a reference fallback.
- README: tagline, base_model frontmatter, badge, Vision section,
Related models, Credits, hardware/quick-start tables all flipped
to the Heretic lineage. Architecture section unchanged β€” Heretic
v2 is qwen35-stamped like vanilla Qwen 3.6 27B.
- CHANGELOG: top entry documents the rename + base swap; historical
entries below intentionally left referring to Thanatos-27B as
they happened on the old repo identity.

HF repo migration (new FoolDev/Thanatos-Heretic-27B repo + remote
re-point + old-repo migration notice) and Heretic re-quantization
rebundle are separate follow-ups.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

CHANGELOG.md CHANGED
@@ -7,6 +7,85 @@ and documentation**, not the underlying base model.
7
 
8
  ## [Unreleased]
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ### Changed (5th round trip β€” qwen36 β†’ qwen35, retested next-day)
11
  - **Bundle re-stamped `general.architecture: 'qwen36'` β†’ `'qwen35'`**
12
  in `hf upload` commit `e03e10e` (HF), 2026-05-20 midday β€” 8
 
7
 
8
  ## [Unreleased]
9
 
10
+ ### Changed (project rename + base swap to Heretic v2)
11
+ - **Renamed project `Thanatos-27B` β†’ `Thanatos-Heretic-27B`** and
12
+ **swapped immediate base from `Qwen/Qwen3.6-27B` (vanilla) β†’
13
+ `llmfan46/Qwen3.6-27B-uncensored-heretic-v2`** (an uncensored
14
+ Heretic-style abliteration of the dense Qwen 3.6 27B base).
15
+ README, Modelfile preamble, `CITATION.cff`, all scripts, and
16
+ all examples now refer to `Thanatos-Heretic-27B` /
17
+ `thanatos-heretic-27b` (lowercase Ollama tag) and pull GGUFs
18
+ from `llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF`.
19
+ Architecture is unchanged (still Qwen 3.6 dense 27B,
20
+ `qwen35`-stamped, hybrid SSM+attention stack) β€” only the
21
+ weights' finetune lineage moves.
22
+ - **`base_model:` frontmatter** flipped to
23
+ `llmfan46/Qwen3.6-27B-uncensored-heretic-v2`;
24
+ `base_model_relation: finetune` added; `heretic` and
25
+ `uncensored` tags appended. `library_name: transformers` stays
26
+ for HF Hub placement (snippet trap accepted as before;
27
+ `config.json` is still intentionally absent).
28
+ - **`scripts/build.sh`** now points `REPO_ID` at
29
+ `llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF` and uses the
30
+ filename pattern `Qwen3.6-27B-uncensored-heretic-v2-${QUANT}.gguf`.
31
+ Default `TAG` is `thanatos-heretic-27b`. Note: no `Q3_K_S` in
32
+ the Heretic GGUF repo β€” use `Q3_K_M` for the smallest practical
33
+ quant (`Modelfile` preamble and README hardware/quick-start
34
+ tables updated accordingly).
35
+ - **`scripts/fetch_vision.sh`** defaults flipped to
36
+ `PRECISION=BF16` and
37
+ `REPO_ID=llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF`
38
+ (`Qwen3.6-27B-mmproj-BF16.gguf`, ~931 MB). Unsloth's
39
+ `mmproj-F16.gguf` is documented as a reference fallback for
40
+ users who want the F16/F32 variants.
41
+ - **Bundled blob status:** the in-repo
42
+ `Thanatos-27B.Q4_K_M.gguf` LFS pointer is unchanged β€” still the
43
+ legacy pre-Heretic Qwen 3.6 27B Q4_K_M quant
44
+ (`5ed60d0af4650a854b1755bd392f9aef4872643dc25a254bc68043fa638392a0`).
45
+ Behaves identically to vanilla Qwen 3.6 27B for now. Heretic v2
46
+ re-quantization + rebundle (file rename to
47
+ `Thanatos-Heretic-27B.Q4_K_M.gguf` + LFS swap) is a separate
48
+ follow-up; users wanting actual Heretic behavior today should
49
+ use the local-build path (`make build`).
50
+ - **HF repo migration:** the local git remote still points at
51
+ `huggingface.co/FoolDev/Thanatos-27B`. A new HF repo at
52
+ `FoolDev/Thanatos-Heretic-27B` needs to be created and the
53
+ remote re-pointed before the next push. Migration notice on the
54
+ old `FoolDev/Thanatos-27B` model card is pending.
55
+ - **CHANGELOG history left intact:** entries below this one still
56
+ reference `Thanatos-27B` and the bundled-blob saga as they
57
+ happened on the old repo identity. Historical, not retconned.
58
+
59
+ ### Changed (HF tag-surface cleanup β€” `general.tags` strip + `config.json` drop)
60
+ - **Stripped `general.tags` KV from the bundled GGUF** (`9cc78e7`,
61
+ 2026-05-20). Drops the upstream-baked `unsloth` and
62
+ `image-text-to-text` tags that `llama.cpp`'s converter copies
63
+ into GGUFs from `unsloth/Qwen3.6-27B-GGUF`; both surfaced on
64
+ the HF model page and obscured this card's positioning.
65
+ Tensors byte-identical; only the `general.tags` KV is gone.
66
+ - **Dropped `config.json`** (`5302d10`, 2026-05-20) to suppress
67
+ HF's tag auto-detector surfacing `qwen3_5` in the repo header
68
+ β€” the detector reads `architectures` from `config.json`.
69
+ Consequence: `AutoModelForCausalLM.from_pretrained(
70
+ "FoolDev/Thanatos-27B")` no longer works on its own.
71
+ `examples/transformers_quickstart.py` and the README
72
+ transformers note now point users at upstream
73
+ `Qwen/Qwen3.6-27B` directly (tensors byte-identical, so the
74
+ result is the same model). `library_name: transformers` stays
75
+ in the model-card metadata for Hub placement.
76
+
77
+ ### Reverted (safetensors mirror experiment)
78
+ - **Mirrored Qwen/Qwen3.6-27B's safetensors set into this repo
79
+ (`b420378`, 2026-05-20), reverted within the day** (`50f6684`
80
+ + `9cf363e`, 2026-05-21). 15 sharded `.safetensors` + tokenizer
81
+ + processor configs (~58 GB) were briefly added so users
82
+ wanting GGUF + safetensors in one place could skip a second
83
+ `hf download`; reverted on reflection. Transformers users
84
+ continue to pull from upstream `Qwen/Qwen3.6-27B`. `.gitignore`
85
+ whitelist for the Qwen sharded naming pattern (`0c5bee4`) was
86
+ removed alongside the mirror; `*.safetensors` block rule is
87
+ back to baseline.
88
+
89
  ### Changed (5th round trip β€” qwen36 β†’ qwen35, retested next-day)
90
  - **Bundle re-stamped `general.architecture: 'qwen36'` β†’ `'qwen35'`**
91
  in `hf upload` commit `e03e10e` (HF), 2026-05-20 midday β€” 8
CITATION.cff CHANGED
@@ -1,21 +1,22 @@
1
  cff-version: 1.2.0
2
- title: "Thanatos-27B: A Dense Distillation Wrapper for Qwen 3.6 27B"
3
  message: "If you use this model card or its accompanying files, please cite as below."
4
  type: software
5
  authors:
6
  - name: FoolDev
7
  website: "https://huggingface.co/FoolDev"
8
- repository-code: "https://huggingface.co/FoolDev/Thanatos-27B"
9
- url: "https://huggingface.co/FoolDev/Thanatos-27B"
10
  abstract: >-
11
- Thanatos-27B is a personal repackaging of the dense Qwen 3.6 27B base model
12
- with Claude Opus 4.7 in the reasoning teacher slot. The repository ships
13
- an Ollama Modelfile, sampling defaults, usage examples, and a single
14
- ready-to-run GGUF (Q4_K_M ~17 GB) so the HF "Use this model" widget
15
- surfaces a one-liner Ollama snippet. Other quants (Q3_K_S, Q5_K_M,
16
- Q6_K, etc.) and the upstream safetensors (Qwen/Qwen3.6-27B) are
17
- pulled from upstream (unsloth/Qwen3.6-27B-GGUF) on demand rather
18
- than redistributed.
 
19
  keywords:
20
  - qwen
21
  - qwen3.6
@@ -23,10 +24,17 @@ keywords:
23
  - distillation
24
  - reasoning
25
  - llm
 
 
26
  license: Apache-2.0
27
  references:
28
  - type: software
29
- title: "Qwen3.6-27B"
 
 
 
 
 
30
  authors:
31
  - name: Alibaba Qwen Team
32
  url: "https://huggingface.co/Qwen/Qwen3.6-27B"
 
1
  cff-version: 1.2.0
2
+ title: "Thanatos-Heretic-27B: A Dense Distillation Wrapper for llmfan46's Qwen 3.6 27B Uncensored Heretic v2"
3
  message: "If you use this model card or its accompanying files, please cite as below."
4
  type: software
5
  authors:
6
  - name: FoolDev
7
  website: "https://huggingface.co/FoolDev"
8
+ repository-code: "https://huggingface.co/FoolDev/Thanatos-Heretic-27B"
9
+ url: "https://huggingface.co/FoolDev/Thanatos-Heretic-27B"
10
  abstract: >-
11
+ Thanatos-Heretic-27B is a personal repackaging of llmfan46's uncensored
12
+ Heretic v2 finetune of Qwen 3.6 27B (dense), with Claude Opus 4.7 in
13
+ the reasoning teacher slot. The repository ships an Ollama Modelfile,
14
+ sampling defaults, usage examples, and a single ready-to-run GGUF
15
+ (Q4_K_M ~17 GB) so the HF "Use this model" widget surfaces a one-liner
16
+ Ollama snippet. Other quants (Q3_K_M, Q5_K_M, Q6_K, etc.) and the
17
+ Heretic safetensors are pulled from upstream
18
+ (llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF and the matching
19
+ non-GGUF repo) on demand rather than redistributed.
20
  keywords:
21
  - qwen
22
  - qwen3.6
 
24
  - distillation
25
  - reasoning
26
  - llm
27
+ - heretic
28
+ - uncensored
29
  license: Apache-2.0
30
  references:
31
  - type: software
32
+ title: "Qwen3.6-27B-uncensored-heretic-v2 (immediate base)"
33
+ authors:
34
+ - name: llmfan46
35
+ url: "https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2"
36
+ - type: software
37
+ title: "Qwen3.6-27B (upstream base)"
38
  authors:
39
  - name: Alibaba Qwen Team
40
  url: "https://huggingface.co/Qwen/Qwen3.6-27B"
Makefile CHANGED
@@ -1,11 +1,11 @@
1
- # Thanatos-27B convenience Makefile.
2
  #
3
  # All work is delegated to scripts/* β€” this file just gives common
4
  # operations short, discoverable names.
5
  #
6
  # Variables you can override on the command line:
7
  # QUANT GGUF quant suffix (default: Q4_K_M)
8
- # TAG Ollama model tag (default: thanatos-27b)
9
  # GGUF_PATH path to existing GGUF (skip the download)
10
  # MODEL model tag for smoke (default: $(TAG))
11
  #
@@ -19,7 +19,7 @@
19
  # make clean
20
 
21
  QUANT ?= Q4_K_M
22
- TAG ?= thanatos-27b
23
  MODEL ?= $(TAG)
24
 
25
  .DEFAULT_GOAL := help
@@ -43,7 +43,7 @@ build: ## Download qwen35-stamped GGUF from unsloth and run 'ollama create' (lo
43
  load-bundle: ## Load THIS repo's bundled GGUF into a local Ollama tag (smudge LFS + ollama create).
44
  TAG=$(TAG) ./scripts/load_bundle.sh
45
 
46
- heal-hf: ## Heal an already-pulled hf.co/FoolDev/Thanatos-27B tag in-store (rebadge blob + manifest digest).
47
  ./scripts/heal_hf_pull.sh
48
 
49
  smoke: ## Verify the model is reachable and round-trips.
@@ -69,6 +69,6 @@ hooks: ## Install scripts/check.sh as the git pre-commit hook.
69
 
70
  clean: ## Remove local GGUF copies and ephemeral caches in this repo.
71
  @echo "[*] removing local GGUFs and ephemeral caches in $$PWD"
72
- @rm -f ./Qwen3.6-27B-*.gguf ./mmproj-*.gguf ./Thanatos-27B.*.qwen[0-9]*.gguf
73
  @rm -rf ./.cache __pycache__ examples/__pycache__
74
  @echo "[+] clean"
 
1
+ # Thanatos-Heretic-27B convenience Makefile.
2
  #
3
  # All work is delegated to scripts/* β€” this file just gives common
4
  # operations short, discoverable names.
5
  #
6
  # Variables you can override on the command line:
7
  # QUANT GGUF quant suffix (default: Q4_K_M)
8
+ # TAG Ollama model tag (default: thanatos-heretic-27b)
9
  # GGUF_PATH path to existing GGUF (skip the download)
10
  # MODEL model tag for smoke (default: $(TAG))
11
  #
 
19
  # make clean
20
 
21
  QUANT ?= Q4_K_M
22
+ TAG ?= thanatos-heretic-27b
23
  MODEL ?= $(TAG)
24
 
25
  .DEFAULT_GOAL := help
 
43
  load-bundle: ## Load THIS repo's bundled GGUF into a local Ollama tag (smudge LFS + ollama create).
44
  TAG=$(TAG) ./scripts/load_bundle.sh
45
 
46
+ heal-hf: ## Heal an already-pulled hf.co/FoolDev/Thanatos-Heretic-27B tag in-store (rebadge blob + manifest digest).
47
  ./scripts/heal_hf_pull.sh
48
 
49
  smoke: ## Verify the model is reachable and round-trips.
 
69
 
70
  clean: ## Remove local GGUF copies and ephemeral caches in this repo.
71
  @echo "[*] removing local GGUFs and ephemeral caches in $$PWD"
72
+ @rm -f ./Qwen3.6-27B-*.gguf ./mmproj-*.gguf ./Thanatos-Heretic-27B.*.qwen[0-9]*.gguf
73
  @rm -rf ./.cache __pycache__ examples/__pycache__
74
  @echo "[+] clean"
Modelfile CHANGED
@@ -1,4 +1,4 @@
1
- # Thanatos-27B β€” Ollama wrapper around Qwen 3.6 27B (dense)
2
  #
3
  # Text + tool calling. Vision via Ollama is currently broken for this
4
  # architecture (ollama/ollama#15898 β€” the qwen35 arch entries are in
@@ -10,21 +10,22 @@
10
  # stamped `general.architecture: 'qwen35'` β€” the upstream-canonical
11
  # arch entry every released llama.cpp / Ollama loads under for the
12
  # Qwen 3.5 / 3.6 hybrid SSM + attention family. `ollama create
13
- # thanatos-27b -f Modelfile && ollama run thanatos-27b` loads it
14
  # directly. See README "Architecture" for the full stamp history
15
  # (eight flips between qwen35 and qwen36, settled on qwen35 at
16
  # `e03e10e` after the 4th qwen36 round trip had its friction
17
  # re-tested in a fresh next-day session).
18
  #
19
- # For other quants (Q3_K_S, Q5_K_M, Q6_K, etc.), `make build QUANT=Q3_K_S`
20
- # downloads the chosen quant from unsloth/Qwen3.6-27B-GGUF and patches
21
- # FROM in a temp Modelfile copy. The Q3_K_S used to ship in this repo;
22
- # it was removed so HF's Ollama bridge picks Q4_K_M as the default
23
- # `:latest` tag instead of Q3_K_S (alphabetically-first heuristic).
24
  #
25
  # Other GGUF sources (use with `make build GGUF_PATH=...`):
26
- # https://huggingface.co/unsloth/Qwen3.6-27B-GGUF
27
- # https://huggingface.co/rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled-GGUF
 
28
 
29
  FROM ./Thanatos-27B.Q4_K_M.gguf
30
 
@@ -140,14 +141,14 @@ Behavior rules:
140
  # (6182 tokens / 501.9 s; 12.67 / 12.55 / 12.25 short/medium/long)
141
  # Q3_K_S β†’ 11.70 tok/s aggregate (run 2, 2026-05-19 evening)
142
  # (8009 tokens / 684.0 s; 12.23 / 12.12 / 11.66 short/medium/long)
143
- # Second run measured against `thanatos-27b:latest` built via
144
- # `make build QUANT=Q3_K_S` β€” i.e. unsloth/Qwen3.6-27B-GGUF's
145
- # qwen35-stamped Q3_K_S, the friction-free path the README
146
- # points users at. Aggregate is 4.9% below run 1 (within
147
- # the Β±20% noise band) β€” slightly longer per-prompt outputs
148
- # this run (8009 vs 6182 tokens) likely contribute the
149
- # difference, plus late-in-session thermal pressure on the
150
- # Strix Halo iGPU. The friction-free unsloth path works.
151
  # Q4_K_M β†’ 9.31 tok/s aggregate (run 1)
152
  # (5356 tokens / 574.9 s; 9.48 / 9.43 / 9.28 short/medium/long)
153
  # Q4_K_M β†’ 9.19 tok/s aggregate (run 2, 2026-05-19 afternoon)
 
1
+ # Thanatos-Heretic-27B β€” Ollama wrapper around Qwen 3.6 27B (dense)
2
  #
3
  # Text + tool calling. Vision via Ollama is currently broken for this
4
  # architecture (ollama/ollama#15898 β€” the qwen35 arch entries are in
 
10
  # stamped `general.architecture: 'qwen35'` β€” the upstream-canonical
11
  # arch entry every released llama.cpp / Ollama loads under for the
12
  # Qwen 3.5 / 3.6 hybrid SSM + attention family. `ollama create
13
+ # thanatos-heretic-27b -f Modelfile && ollama run thanatos-heretic-27b` loads it
14
  # directly. See README "Architecture" for the full stamp history
15
  # (eight flips between qwen35 and qwen36, settled on qwen35 at
16
  # `e03e10e` after the 4th qwen36 round trip had its friction
17
  # re-tested in a fresh next-day session).
18
  #
19
+ # For other quants (Q3_K_M, Q5_K_M, Q6_K, etc.), `make build QUANT=Q3_K_M`
20
+ # downloads the chosen quant from llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF
21
+ # (filename pattern Qwen3.6-27B-uncensored-heretic-v2-${QUANT}.gguf) and
22
+ # patches FROM in a temp Modelfile copy. Note: no Q3_K_S in this repo;
23
+ # use Q3_K_M for the smallest practical quant.
24
  #
25
  # Other GGUF sources (use with `make build GGUF_PATH=...`):
26
+ # https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF # primary (this repo's default)
27
+ # https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-GGUF # MTP head preserved
28
+ # https://huggingface.co/unsloth/Qwen3.6-27B-GGUF # vanilla Qwen 3.6 (pre-Heretic)
29
 
30
  FROM ./Thanatos-27B.Q4_K_M.gguf
31
 
 
141
  # (6182 tokens / 501.9 s; 12.67 / 12.55 / 12.25 short/medium/long)
142
  # Q3_K_S β†’ 11.70 tok/s aggregate (run 2, 2026-05-19 evening)
143
  # (8009 tokens / 684.0 s; 12.23 / 12.12 / 11.66 short/medium/long)
144
+ # Second run measured against a `thanatos-27b:latest` (pre-rename)
145
+ # built via `make build QUANT=Q3_K_S` against the then-current
146
+ # unsloth/Qwen3.6-27B-GGUF source. Aggregate is 4.9% below
147
+ # run 1 (within the Β±20% noise band) β€” slightly longer
148
+ # per-prompt outputs this run (8009 vs 6182 tokens) likely
149
+ # contribute the difference, plus late-in-session thermal
150
+ # pressure on the Strix Halo iGPU.
151
+ # (Heretic v2 base is not benched here yet; rebundle pending.)
152
  # Q4_K_M β†’ 9.31 tok/s aggregate (run 1)
153
  # (5356 tokens / 574.9 s; 9.48 / 9.43 / 9.28 short/medium/long)
154
  # Q4_K_M β†’ 9.19 tok/s aggregate (run 2, 2026-05-19 afternoon)
README.md CHANGED
@@ -1,7 +1,8 @@
1
  ---
2
  license: apache-2.0
3
  base_model:
4
- - Qwen/Qwen3.6-27B
 
5
  datasets:
6
  - crownelius/Creative_Writing_ShareGPT_Enhanced
7
  - microsoft/rStar-Coder
@@ -40,26 +41,28 @@ tags:
40
  - agent
41
  - gguf
42
  - ollama
 
 
43
  library_name: transformers
44
  pipeline_tag: image-text-to-text
45
  ---
46
 
47
- <img src="https://huggingface.co/FoolDev/Thanatos-27B/resolve/main/banner.svg" alt="Thanatos-27B banner" width="100%" />
48
 
49
  [![License](https://img.shields.io/badge/License-Apache_2.0-7aa2f7?style=flat&labelColor=1a1b26)](https://opensource.org/licenses/Apache-2.0)
50
- [![Base Model](https://img.shields.io/badge/Base-Qwen3.6--27B-bb9af7?style=flat&labelColor=1a1b26)](https://huggingface.co/Qwen/Qwen3.6-27B)
51
  [![Architecture](https://img.shields.io/badge/Arch-Dense_27B-ff9e64?style=flat&labelColor=1a1b26)](#architecture)
52
  [![Sibling](https://img.shields.io/badge/Sibling-Janus--35B-7dcfff?style=flat&labelColor=1a1b26)](https://huggingface.co/FoolDev/Janus-35B)
53
  [![Buy me a coffee](https://img.shields.io/badge/%E2%98%95%20Buy_me_a_coffee-e0af68?style=flat&logo=buymeacoffee&logoColor=1a1b26&labelColor=1a1b26)](https://buymeacoffee.com/cardoffoolm)
54
 
55
- # Thanatos-27B
56
 
57
- > **Dense Reasoning. Friendlier Footprint.**
58
- > *Qwen 3.6 27B (dense) repackaged with Claude Opus 4.7 in the teacher slot.*
59
 
60
- **`Architecture:`** `Qwen 3.6 27B (Dense)` | **`Parameters:`** `27B` | **`Teacher:`** `Claude Opus 4.7` | **`Type:`** `Distilled LLM`
61
 
62
- A personal sibling to [`FoolDev/Janus-35B`](https://huggingface.co/FoolDev/Janus-35B). Same teacher (Claude Opus 4.7), same dataset family, but built on the **dense** [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) base instead of the 35B-A3B MoE. Smaller, easier to deploy, no expert-routing surprises.
63
 
64
  ## TL;DR
65
 
@@ -69,18 +72,28 @@ template β€” HF's Ollama bridge ingests those three files, not
69
  `Modelfile`):
70
 
71
  ```bash
72
- ollama run hf.co/FoolDev/Thanatos-27B # ~17 GB Q4_K_M, qwen35-stamped, loads on stock Ollama
73
  ```
74
 
75
- If you pulled the bundle during any of the qwen36 windows on
76
- 2026-05-19/20 (most recently between `ae67ed1` and `e03e10e`)
77
- the load will 500 on that stale blob β€” `make heal-hf` rebadges
78
- it in place. Fresh pulls after the latest qwen35 re-stamp
79
- (`e03e10e`) go straight through.
80
-
81
- For other quants (Q3_K_S ~12 GB, Q5_K_M ~20 GB, etc.), `make build
 
 
 
 
 
 
 
 
 
82
  QUANT=...` is the simplest path. See [Quick start](#quick-start)
83
- below for the full matrix.
 
84
 
85
  For image input use llama.cpp directly β€” Ollama vision is broken for
86
  this architecture upstream (see [Vision](#vision)).
@@ -89,9 +102,9 @@ this architecture upstream (see [Vision](#vision)).
89
 
90
  The 35B-A3B is a sparse mixture-of-experts model: 35B parameters total but only ~3B active per token. That makes it fast at inference but **memory-hungry at load time** β€” the full 35B has to live in VRAM/RAM even though only 3B is doing useful work each step.
91
 
92
- The 27B is **dense**: every parameter participates in every forward pass. It's slower per token than 35B-A3B β€” on a Ryzen AI Max+ 395 / Radeon 8060S iGPU the dense 27B at Q3_K_S clocks ~10 tok/s, versus ~27 tok/s for the MoE 35B at ~Q4 (`make bench`, 3-prompt mix) β€” but the working set fits comfortably on commodity GPUs and avoids the MoE-specific load-balance failure modes.
93
 
94
- | | Thanatos-27B (this) | [Janus-35B](https://huggingface.co/FoolDev/Janus-35B) |
95
  |---|---|---|
96
  | Architecture | Dense transformer | MoE 256 experts, 8 active |
97
  | Total params | 27 B | 35 B |
@@ -99,7 +112,7 @@ The 27B is **dense**: every parameter participates in every forward pass. It's s
99
  | Layers | 64 | 40 |
100
  | Hidden size | 5120 | 2048 |
101
  | Q4_K_M GGUF size | ~17 GB (bundled) | ~19 GB (bundled) |
102
- | Q3_K_S GGUF size | ~12 GB (build locally via `make build QUANT=Q3_K_S`) | n/a |
103
  | Min host memory @ Q4 / 8K ctx | ~22 GB | ~38 GB |
104
  | Multimodal (text path) | Yes | Yes |
105
  | Multimodal (vision via Ollama) | Broken upstream β€” see below | Broken upstream |
@@ -111,15 +124,15 @@ The 27B is **dense**: every parameter participates in every forward pass. It's s
111
  | File | Use |
112
  |---|---|
113
  | `banner.svg` / `banner.png` | Repo header, Tokyo Night themed |
114
- | `Modelfile` | Ollama wrapper around the bundled Qwen 3.6 27B GGUF β€” used by `make build` / `ollama create` for **local** builds |
115
- | `template`, `system`, `params` | Used by HF's Ollama bridge when users `ollama run hf.co/FoolDev/Thanatos-27B` directly (the bridge does **not** read `Modelfile` β€” see [HF Ollama docs](https://huggingface.co/docs/hub/en/ollama)). Mirrors the `Modelfile`'s template / system prompt / sampling params. |
116
  | `examples/` | Ready-to-run Python clients for Ollama, Transformers, and llama-cpp-python |
117
- | `scripts/build.sh` | Pulls a qwen35-stamped GGUF from `unsloth/Qwen3.6-27B-GGUF` and runs `ollama create` (loads on today's llama.cpp / Ollama; see `make build`) |
118
- | `scripts/load_bundle.sh` | One-shot path from *this repo's* bundle β†’ loadable local Ollama tag (smudges LFS pointer via `hf download` if needed, runs `ollama create`; see `make load-bundle`). Carries a qwen36 β†’ qwen35 rebadge branch for legacy v0.6.0-era / 3rd-round-trip-era checkouts β€” no-op on the current qwen35-stamped bundle. |
119
- | `scripts/heal_hf_pull.sh` | Recovery for users who pulled `hf.co/FoolDev/Thanatos-27B` *before* the latest qwen35 re-stamp (`978798f`) and still have a qwen36-stamped blob in their local Ollama store: rebadges the blob qwen36 β†’ qwen35 and rewrites the manifest's model-layer digest so the same tag becomes loadable in place. See `make heal-hf`. Idempotent and a no-op on tags already on qwen35 β€” fresh pulls don't need it. |
120
  | `scripts/smoke_test.sh` | Verifies an Ollama daemon + model, runs a round-trip, asserts no chat-template tokens leak into the response. With `TOOLS_TEST=1`, also exercises an end-to-end tool-call round-trip and checks the response shape |
121
  | `scripts/bench.sh` | Measures real tok/s using Ollama's `eval_count` / `eval_duration` metadata over a 3-prompt mix (run `make bench`) |
122
- | `scripts/fetch_vision.sh` | Pulls the vision projector (`mmproj-F16.gguf`) for llama.cpp (Ollama vision is broken upstream β€” see [Vision](#vision)). Renamed from `fetch_mmproj.sh` because HF's Ollama bridge auto-indexed the script as a vision projector layer (filename pattern match). |
123
  | `scripts/check.sh` | Local lint: `bash -n`, `pyflakes`, `py_compile`, footgun-grep, plus `Modelfile`-vs-bridge-files sync check |
124
  | `scripts/check_bridge_sync.py` | Verifies the `Modelfile` `TEMPLATE` / `SYSTEM` / `PARAMETER` directives stay in sync with the root-level `template` / `system` / `params` files. Run as part of `make check`; called from the pre-commit hook. |
125
  | `scripts/verify_arch.py` | Cross-checks the README "Architecture" forward-pass bullets (layer count, head counts, hidden / FFN dims, RoPE factor, SSM dims, vocab, context) against the actual GGUF metadata keys. Run as `make verify-arch`. Handles both `qwen35`- and `qwen36`-stamped bundles; exit non-zero if any value mismatches. Not part of `make check` because it loads the 17 GB GGUF (LFS smudge required); run on demand. |
@@ -129,21 +142,22 @@ The 27B is **dense**: every parameter participates in every forward pass. It's s
129
  | `CHANGELOG.md` | Versioned tooling/docs changes |
130
  | `README.md` | This file |
131
 
132
- For 16 GB GPUs / unified-memory laptops, `make build QUANT=Q3_K_S`
133
- downloads the smaller ~12 GB Q3_K_S quant from
134
- `unsloth/Qwen3.6-27B-GGUF` (qwen35-stamped, loads directly) and
135
- creates a local `thanatos-27b` Ollama tag. Does not redistribute
136
- via this repo. For other quants use `make build QUANT=...`. The
137
- local-build path applies this repo's `Modelfile`; the `hf.co/...`
138
- path applies the root-level `template`, `system`, and `params`
139
- files (kept in sync with the `Modelfile`).
 
140
 
141
- If you want the safetensors for `transformers`, fetch them from [`Qwen/Qwen3.6-27B`](https://huggingface.co/Qwen/Qwen3.6-27B).
142
 
143
  ## Architecture
144
 
145
  <p align="left">
146
- <img src="https://huggingface.co/FoolDev/Thanatos-27B/resolve/main/dense-flow.svg" alt="animated dense forward-pass visualization: 64-layer hybrid attention stack with a pulse traversing left-to-right, illuminating Gated DeltaNet (purple) and Gated Attention (cyan) layers in turn" width="800" />
147
  </p>
148
 
149
  - Qwen 3.6 dense, 27B parameters, 64 transformer layers
@@ -154,23 +168,30 @@ If you want the safetensors for `transformers`, fetch them from [`Qwen/Qwen3.6-2
154
  - Vocab 248,320 (shared with 35B-A3B sibling)
155
  - 262 144 native context, extensible to ~1 M with YaRN
156
  - Vision + video supported by the **base architecture** via a separate
157
- `mmproj` projector (not redistributed here; pull `mmproj-F16.gguf`
158
- from `unsloth/Qwen3.6-27B-GGUF`). See [Vision](#vision) below for
159
- current loader compatibility.
 
 
 
160
  - Multi-token prediction (MTP) head trained for speculative decoding β€”
161
  present in the upstream `Qwen/Qwen3.6-27B` safetensors and usable via
162
  vLLM (`qwen3_next_mtp`) or SGLang (`--speculative-algo NEXTN`).
163
  **Not usable via llama.cpp / Ollama today**: the GGUF converter
164
  (`convert_hf_to_gguf.py`) explicitly skips MTP tensors for the
165
  `qwen35` / `qwen35moe` arch family ("MTP tensors are not used at
166
- inference yet"), so the bundled GGUF and the unsloth GGUFs ship with
167
- 851 tensors and no MTP head. llama.cpp's MTP support (PR #22673,
168
- merged 2026-05-16) currently covers other architectures only;
169
- tracking that PR's follow-up work for when qwen35 / qwen35moe
170
- consumer support lands. (Earlier README versions claimed MTP was
171
- available without this caveat β€” confirmed empirically via
172
- `gguf.GGUFReader` on both this bundle and `unsloth/Qwen3.6-27B-GGUF`,
173
- 2026-05-19.)
 
 
 
 
174
 
175
  **The bundled GGUF declares `general.architecture: 'qwen35'`** β€” not a
176
  workaround for an unimplemented `qwen36` arch, but the canonical
@@ -186,9 +207,11 @@ stack:
186
  exists in `transformers`; Qwen reuses the 3.5 class names.
187
  - **llama.cpp's converter.** `convert_hf_to_gguf.py` registers
188
  `Qwen3_5ForCausalLM` β†’ `MODEL_ARCH.QWEN35` and
189
- `Qwen3_5MoeForCausalLM` β†’ `MODEL_ARCH.QWEN35MOE`. The unsloth
190
- GGUFs this repo pulls from (`unsloth/Qwen3.6-27B-GGUF`,
191
- `unsloth/Qwen3.6-35B-A3B-GGUF`) inherit those stamps.
 
 
192
  - **llama.cpp's model code.** `src/models/qwen35.cpp` has an
193
  explicit `case 64: type = LLM_TYPE_27B` branch for this model;
194
  `qwen35moe.cpp` has `case 40: type = LLM_TYPE_35B_A3B` for the
@@ -200,7 +223,7 @@ There is no PR or tracking issue for a `qwen36` arch entry in
200
  `qwen35` already loads the model the upstream code path was
201
  designed to load.
202
 
203
- `ollama run hf.co/FoolDev/Thanatos-27B` and `llama-server -m
204
  Thanatos-27B.Q4_K_M.gguf` both load directly on current stock
205
  loaders.
206
 
@@ -257,7 +280,8 @@ the legacy qwen36 β†’ qwen35 in-store rebadge (used by `make
257
  heal-hf` and `make load-bundle`) and any future arch flip:
258
 
259
  ```bash
260
- # qwen36 -> qwen35 (the legacy recovery direction)
 
261
  python3 scripts/rename_arch.py \
262
  --from-arch qwen36 --to-arch qwen35 \
263
  Thanatos-27B.Q4_K_M.qwen36.gguf \
@@ -273,21 +297,23 @@ Three paths:
273
  ```bash
274
  # A. Pull straight from HF (gets the bundled Q4_K_M GGUF + the
275
  # root-level template / system / params files in one step):
276
- ollama run hf.co/FoolDev/Thanatos-27B # 17 GB Q4_K_M, qwen35-stamped
277
 
278
- # B. Build a local `thanatos-27b` tag from THIS repo's bundle
279
  # (LFS smudge if needed, then `ollama create`). Useful if you
280
  # want a bare local tag rather than the `hf.co/...` path:
281
- make load-bundle # creates local tag thanatos-27b
282
- ollama run thanatos-27b
283
-
284
- # C. Bypass the bundle: download a qwen35-stamped GGUF from unsloth
285
- # and build locally. Loads on every current llama.cpp / Ollama.
286
- make build # Q4_K_M -> thanatos-27b
287
- make build QUANT=Q3_K_S # 12 GB smaller quant
 
 
288
  make build QUANT=Q5_K_M # 20 GB higher quality
289
- make build GGUF_PATH=~/models/Qwen3.6-27B-Q4_K_M.gguf # skip download
290
- ollama run thanatos-27b
291
  ```
292
 
293
  Under the hood, `make build` calls `scripts/build.sh`, which downloads the
@@ -295,7 +321,7 @@ GGUF if missing (set `GGUF_PATH` to point at one you already have) and
295
  runs `ollama create` with the matching `Modelfile`.
296
 
297
  If you'd rather do it by hand: edit the `FROM` line in `Modelfile` and
298
- run `ollama create thanatos-27b -f Modelfile && ollama run thanatos-27b`.
299
 
300
  Confirm everything works:
301
 
@@ -310,10 +336,10 @@ python examples/ollama_chat.py # full demo: chat, streaming, tools, OpenAI-
310
 
311
  | App | How to load this model |
312
  |---|---|
313
- | **Ollama** | `ollama run hf.co/FoolDev/Thanatos-27B` (default Q4_K_M). Pulls the GGUF + the root-level `template` / `system` / `params` files in one step (HF's Ollama bridge ingests these three files; it does **not** read `Modelfile`). For other quants, `make build QUANT=Q3_K_S` downloads from unsloth and creates a local Ollama tag using the `Modelfile`, which is kept in sync with the bridge files. |
314
- | **LM Studio** | Search β†’ `FoolDev/Thanatos-27B` β†’ pick `Thanatos-27B.Q4_K_M.gguf`. Uses the GGUF's embedded jinja chat template (Qwen 3.6 ChatML); set the system prompt manually from the `SYSTEM` block in this repo's `Modelfile`. |
315
- | **Jan** | Hub β†’ "Import from Hugging Face" β†’ `FoolDev/Thanatos-27B`. Same template behavior as LM Studio. |
316
- | **llama.cpp** | `hf download FoolDev/Thanatos-27B Thanatos-27B.Q4_K_M.gguf --local-dir .` then `llama-server -m Thanatos-27B.Q4_K_M.gguf` (or `llama-cli`, `llama-mtmd-cli` for vision via the upstream `mmproj-F16.gguf`). |
317
  | **llama-cpp-python** | See `examples/llama_cpp_quickstart.py` (text) and `examples/llama_cpp_vision.py` (image input). |
318
  | **Open WebUI / KoboldCpp / text-generation-webui** | Standard llama.cpp loader path β€” point at the GGUF, use the embedded chat template. |
319
 
@@ -331,7 +357,7 @@ external schema.
331
  curl -s http://localhost:11434/v1/chat/completions \
332
  -H 'Content-Type: application/json' \
333
  -d '{
334
- "model": "thanatos-27b",
335
  "messages": [
336
  {"role": "system", "content": "You are Thanatos, a precise reasoning assistant."},
337
  {"role": "user", "content": "Explain the Burrows-Wheeler transform in 200 words."}
@@ -369,17 +395,21 @@ Behavior rules:
369
 
370
  ## Vision
371
 
372
- The Qwen 3.6 base supports image (and video) input via a separate
373
- `mmproj` projector. The full multimodal stack is:
 
374
 
375
  ```
376
- Qwen3.6-27B-Q4_K_M.gguf (~17 GB, the text decoder)
377
- mmproj-F16.gguf (~927 MB, the vision projector)
378
  ```
379
 
380
  Both files are at
381
- [`unsloth/Qwen3.6-27B-GGUF`](https://huggingface.co/unsloth/Qwen3.6-27B-GGUF).
382
- This repo intentionally does not redistribute either.
 
 
 
383
 
384
  ### Loader compatibility β€” the honest table
385
 
@@ -397,10 +427,11 @@ Three flavors, in order of build-time effort:
397
  ```bash
398
  # A. HTTP via llama-server (always built β€” the easiest path).
399
  # Reconfirmed working 2026-05-19 against llama.cpp 389ff61 + Vulkan
400
- # on a Ryzen AI Max+ 395 / Radeon 8060S iGPU.
 
401
  llama-server \
402
- -m Qwen3.6-27B-Q4_K_M.gguf \
403
- --mmproj mmproj-F16.gguf \
404
  --host 127.0.0.1 --port 8765 -c 8192 -ngl 99
405
  # then POST OpenAI-style chat completions with an image_url content
406
  # block β€” e.g. {"type":"image_url","image_url":{"url":"data:image/jpeg;base64,..."}}
@@ -413,15 +444,15 @@ llama-server \
413
  # produce it β€” a plain `cmake --build build` will. If yours didn't,
414
  # run `cmake --build build --target llama-mtmd-cli`.
415
  llama-mtmd-cli \
416
- -m Qwen3.6-27B-Q4_K_M.gguf \
417
- --mmproj mmproj-F16.gguf \
418
  --image photo.jpg \
419
  -p "Describe this image."
420
 
421
  # C. Python via llama-cpp-python:
422
  python examples/llama_cpp_vision.py \
423
- --gguf /path/to/Qwen3.6-27B-Q4_K_M.gguf \
424
- --mmproj /path/to/mmproj-F16.gguf \
425
  --image /path/to/photo.jpg \
426
  --prompt "What is in this image?"
427
  ```
@@ -439,19 +470,22 @@ The dense 27B is the lighter sibling to Janus-35B and the easier of the two to d
439
  | RTX 3090 / 4090 24 GB | Works, full Q4 offload, ~25-40 tok/s |
440
  | RTX 5090 32 GB | Works, full offload at higher quant (Q5/Q6), ~30-50 tok/s |
441
  | Mac Studio M2/M3 32 GB+ unified | Works, ~15-25 tok/s |
442
- | 32 GB unified-memory laptops (Mac M-series, Ryzen AI Max+, etc.) | Borderline at Q4. `make build QUANT=Q3_K_S` (~12 GB) and trim `num_ctx` for headroom. |
443
 
444
  Most numbers in this table are estimates from comparable models; the
445
  gradient is right but the absolute values will move Β±20% with prompt
446
  shape, KV cache type, and parallel-request count. Measure your own
447
  machine with `make bench` (3-prompt mix, reports tok/s from Ollama's
448
  `eval_count` / `eval_duration` so it's not stopwatch-noisy). Reference
449
- data points on a Ryzen AI Max+ 395 / Radeon 8060S iGPU under Vulkan:
 
 
450
  **~12.3 tok/s at Q3_K_S** and **~9.3 tok/s at Q4_K_M** (3-prompt mix,
451
  steady across short / medium / long prompts), sitting between CPU-only
452
  and a 24 GB discrete card as expected. An earlier ROCm snapshot of the
453
  same Q3_K_S bench gave ~10.1 tok/s β€” Vulkan was the clear winner on
454
- this hardware.
 
455
 
456
  ## Chat template
457
 
@@ -465,10 +499,10 @@ Ollama is the exception: its conversion of the embedded jinja loses the
465
  `.Tools` / `.ToolCalls` blocks Ollama's capability detector requires.
466
  Two paths fix this, depending on how you pull the model:
467
 
468
- - **`ollama run hf.co/FoolDev/Thanatos-27B`** β€” HF's Ollama bridge applies
469
  the root-level `template` / `system` / `params` files in this repo
470
  (the bridge does **not** read `Modelfile`).
471
- - **`make build` / `ollama create thanatos-27b -f Modelfile`** β€” uses the
472
  `Modelfile`'s `TEMPLATE` block.
473
 
474
  Both routes wire `.Tools` / `.ToolCalls` and tools work end-to-end on
@@ -511,7 +545,7 @@ the model adapts to whichever shape the system prompt prescribes.
511
  **Ollama path** (this repo's `Modelfile`). The `TEMPLATE` directive
512
  prompts the model to emit JSON-in-XML, the form Ollama's tool-call
513
  extractor parses into a structured `tool_calls` array. After
514
- `make build`, `ollama show thanatos-27b` lists `tools` and `thinking`
515
  under **Capabilities**, and both `/api/chat` and `/v1/chat/completions`
516
  accept a `tools` array.
517
 
@@ -552,19 +586,25 @@ python examples/ollama_chat.py # section 3 runs a real round-trip
552
  - **No mmproj in this release**, and **vision via Ollama is broken upstream** (the qwen35/qwen35moe arch entries are present in Ollama's Go engine but missing from the C++ llama.cpp fallback Ollama uses when mmproj is attached β€” see the [Vision](#vision) section). For image input use llama.cpp directly until that's fixed.
553
  - **Q4_K_M quality loss** is real. Use Q5_K_M or Q6_K if you have the VRAM (~20-22 GB).
554
  - **No formal evaluation in this card.** Numbers above are estimates.
 
 
555
 
556
  ## Related models
557
 
558
  | Model | Notes |
559
  |---|---|
560
- | [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) | Upstream base, safetensors |
561
- | [unsloth/Qwen3.6-27B-GGUF](https://huggingface.co/unsloth/Qwen3.6-27B-GGUF) | Recommended GGUF source |
 
 
 
562
  | [FoolDev/Janus-35B](https://huggingface.co/FoolDev/Janus-35B) | 35B-A3B MoE sibling. More capacity, more memory pressure. |
563
  | [Crownelius/Crow-9B-HERETIC-4.6](https://huggingface.co/Crownelius/Crow-9B-HERETIC-4.6) | 9B starter model when 27B/35B is too heavy |
564
 
565
  ## Credits
566
 
567
- - Base model: [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) (Alibaba)
 
568
  - Reasoning teacher: Claude Opus 4.7 (Anthropic)
569
  - Distillation lineage and dataset curation: [Crownelius](https://huggingface.co/Crownelius)
570
 
 
1
  ---
2
  license: apache-2.0
3
  base_model:
4
+ - llmfan46/Qwen3.6-27B-uncensored-heretic-v2
5
+ base_model_relation: finetune
6
  datasets:
7
  - crownelius/Creative_Writing_ShareGPT_Enhanced
8
  - microsoft/rStar-Coder
 
41
  - agent
42
  - gguf
43
  - ollama
44
+ - heretic
45
+ - uncensored
46
  library_name: transformers
47
  pipeline_tag: image-text-to-text
48
  ---
49
 
50
+ <img src="https://huggingface.co/FoolDev/Thanatos-Heretic-27B/resolve/main/banner.svg" alt="Thanatos-Heretic-27B banner" width="100%" />
51
 
52
  [![License](https://img.shields.io/badge/License-Apache_2.0-7aa2f7?style=flat&labelColor=1a1b26)](https://opensource.org/licenses/Apache-2.0)
53
+ [![Base Model](https://img.shields.io/badge/Base-Heretic_v2-bb9af7?style=flat&labelColor=1a1b26)](https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2)
54
  [![Architecture](https://img.shields.io/badge/Arch-Dense_27B-ff9e64?style=flat&labelColor=1a1b26)](#architecture)
55
  [![Sibling](https://img.shields.io/badge/Sibling-Janus--35B-7dcfff?style=flat&labelColor=1a1b26)](https://huggingface.co/FoolDev/Janus-35B)
56
  [![Buy me a coffee](https://img.shields.io/badge/%E2%98%95%20Buy_me_a_coffee-e0af68?style=flat&logo=buymeacoffee&logoColor=1a1b26&labelColor=1a1b26)](https://buymeacoffee.com/cardoffoolm)
57
 
58
+ # Thanatos-Heretic-27B
59
 
60
+ > **Dense Reasoning. Friendlier Footprint. Uncensored.**
61
+ > *llmfan46's Heretic v2 abliteration of Qwen 3.6 27B (dense), repackaged with Claude Opus 4.7 in the teacher slot.*
62
 
63
+ **`Architecture:`** `Qwen 3.6 27B (Dense)` | **`Parameters:`** `27B` | **`Base:`** `Heretic v2 (llmfan46)` | **`Teacher:`** `Claude Opus 4.7` | **`Type:`** `Distilled + Abliterated LLM`
64
 
65
+ A personal sibling to [`FoolDev/Janus-35B`](https://huggingface.co/FoolDev/Janus-35B). Same teacher (Claude Opus 4.7), same dataset family, but built on [`llmfan46/Qwen3.6-27B-uncensored-heretic-v2`](https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2) β€” an uncensored Heretic-style abliteration of the dense [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) base β€” instead of the 35B-A3B MoE. Smaller, easier to deploy, no expert-routing surprises, and refusal-trained behavior is dialed back at the base layer.
66
 
67
  ## TL;DR
68
 
 
72
  `Modelfile`):
73
 
74
  ```bash
75
+ ollama run hf.co/FoolDev/Thanatos-Heretic-27B # ~17 GB Q4_K_M, qwen35-stamped, loads on stock Ollama
76
  ```
77
 
78
+ > **Bundled blob status:** the GGUF currently bundled in this repo
79
+ > is the legacy pre-Heretic Qwen 3.6 27B Q4_K_M quant from before
80
+ > the rename. Behaves identically to vanilla Qwen 3.6 27B for now;
81
+ > the Heretic v2 rebundle (from
82
+ > `llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF`) is pending β€”
83
+ > see the top entry of [CHANGELOG](CHANGELOG.md). If you want the
84
+ > Heretic behavior today, use the local-build path below
85
+ > (`make build`), which pulls the Heretic GGUF directly.
86
+
87
+ If you pulled the bundle during any of the qwen36 windows on the
88
+ pre-rename `FoolDev/Thanatos-27B` repo (2026-05-19/20) and still
89
+ have a qwen36-stamped blob in your local Ollama store, `make
90
+ heal-hf` rebadges it in place. Fresh pulls of the new
91
+ `Thanatos-Heretic-27B` repo go straight through.
92
+
93
+ For other quants (Q3_K_M ~12 GB, Q5_K_M ~20 GB, etc.), `make build
94
  QUANT=...` is the simplest path. See [Quick start](#quick-start)
95
+ below for the full matrix. Note: no Q3_K_S in the Heretic GGUF
96
+ repo β€” use Q3_K_M for the smallest practical quant.
97
 
98
  For image input use llama.cpp directly β€” Ollama vision is broken for
99
  this architecture upstream (see [Vision](#vision)).
 
102
 
103
  The 35B-A3B is a sparse mixture-of-experts model: 35B parameters total but only ~3B active per token. That makes it fast at inference but **memory-hungry at load time** β€” the full 35B has to live in VRAM/RAM even though only 3B is doing useful work each step.
104
 
105
+ The 27B is **dense**: every parameter participates in every forward pass. It's slower per token than 35B-A3B β€” on a Ryzen AI Max+ 395 / Radeon 8060S iGPU the dense 27B at Q3_K_S clocks ~10 tok/s, versus ~27 tok/s for the MoE 35B at ~Q4 (`make bench`, 3-prompt mix, measured against the pre-rename Qwen 3.6 bundle; Heretic v2 inherits the same architecture so per-step cost should match) β€” but the working set fits comfortably on commodity GPUs and avoids the MoE-specific load-balance failure modes.
106
 
107
+ | | Thanatos-Heretic-27B (this) | [Janus-35B](https://huggingface.co/FoolDev/Janus-35B) |
108
  |---|---|---|
109
  | Architecture | Dense transformer | MoE 256 experts, 8 active |
110
  | Total params | 27 B | 35 B |
 
112
  | Layers | 64 | 40 |
113
  | Hidden size | 5120 | 2048 |
114
  | Q4_K_M GGUF size | ~17 GB (bundled) | ~19 GB (bundled) |
115
+ | Q3_K_M GGUF size | ~13 GB (build locally via `make build QUANT=Q3_K_M`) | n/a |
116
  | Min host memory @ Q4 / 8K ctx | ~22 GB | ~38 GB |
117
  | Multimodal (text path) | Yes | Yes |
118
  | Multimodal (vision via Ollama) | Broken upstream β€” see below | Broken upstream |
 
124
  | File | Use |
125
  |---|---|
126
  | `banner.svg` / `banner.png` | Repo header, Tokyo Night themed |
127
+ | `Modelfile` | Ollama wrapper around the bundled GGUF (currently the legacy pre-Heretic Qwen 3.6 27B Q4_K_M; Heretic v2 rebundle pending) β€” used by `make build` / `ollama create` for **local** builds |
128
+ | `template`, `system`, `params` | Used by HF's Ollama bridge when users `ollama run hf.co/FoolDev/Thanatos-Heretic-27B` directly (the bridge does **not** read `Modelfile` β€” see [HF Ollama docs](https://huggingface.co/docs/hub/en/ollama)). Mirrors the `Modelfile`'s template / system prompt / sampling params. |
129
  | `examples/` | Ready-to-run Python clients for Ollama, Transformers, and llama-cpp-python |
130
+ | `scripts/build.sh` | Pulls a qwen35-stamped GGUF from `llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF` and runs `ollama create` (loads on today's llama.cpp / Ollama; see `make build`). This is the path that gets you actual Heretic behavior until the bundled blob is rebundled. |
131
+ | `scripts/load_bundle.sh` | One-shot path from *this repo's* bundle β†’ loadable local Ollama tag (smudges LFS pointer via `hf download` if needed, runs `ollama create`; see `make load-bundle`). Carries a qwen36 β†’ qwen35 rebadge branch for legacy pre-rename checkouts β€” no-op on the current qwen35-stamped bundle. |
132
+ | `scripts/heal_hf_pull.sh` | Legacy recovery for users migrating from the pre-rename `FoolDev/Thanatos-27B` repo who still have a qwen36-stamped blob in their local Ollama store: rebadges the blob qwen36 β†’ qwen35 and rewrites the manifest's model-layer digest so the same tag becomes loadable in place. See `make heal-hf`. Idempotent and a no-op on tags already on qwen35 οΏ½οΏ½οΏ½ fresh pulls of `Thanatos-Heretic-27B` don't need it. |
133
  | `scripts/smoke_test.sh` | Verifies an Ollama daemon + model, runs a round-trip, asserts no chat-template tokens leak into the response. With `TOOLS_TEST=1`, also exercises an end-to-end tool-call round-trip and checks the response shape |
134
  | `scripts/bench.sh` | Measures real tok/s using Ollama's `eval_count` / `eval_duration` metadata over a 3-prompt mix (run `make bench`) |
135
+ | `scripts/fetch_vision.sh` | Pulls the vision projector (`Qwen3.6-27B-mmproj-BF16.gguf` from the Heretic GGUF repo, or `mmproj-F16.gguf` from the unsloth reference projector) for llama.cpp (Ollama vision is broken upstream β€” see [Vision](#vision)). Renamed from `fetch_mmproj.sh` because HF's Ollama bridge auto-indexed the script as a vision projector layer (filename pattern match). |
136
  | `scripts/check.sh` | Local lint: `bash -n`, `pyflakes`, `py_compile`, footgun-grep, plus `Modelfile`-vs-bridge-files sync check |
137
  | `scripts/check_bridge_sync.py` | Verifies the `Modelfile` `TEMPLATE` / `SYSTEM` / `PARAMETER` directives stay in sync with the root-level `template` / `system` / `params` files. Run as part of `make check`; called from the pre-commit hook. |
138
  | `scripts/verify_arch.py` | Cross-checks the README "Architecture" forward-pass bullets (layer count, head counts, hidden / FFN dims, RoPE factor, SSM dims, vocab, context) against the actual GGUF metadata keys. Run as `make verify-arch`. Handles both `qwen35`- and `qwen36`-stamped bundles; exit non-zero if any value mismatches. Not part of `make check` because it loads the 17 GB GGUF (LFS smudge required); run on demand. |
 
142
  | `CHANGELOG.md` | Versioned tooling/docs changes |
143
  | `README.md` | This file |
144
 
145
+ For 16 GB GPUs / unified-memory laptops, `make build QUANT=Q3_K_M`
146
+ downloads the smaller ~13 GB Q3_K_M quant from
147
+ `llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF` (qwen35-stamped,
148
+ loads directly) and creates a local `thanatos-heretic-27b` Ollama
149
+ tag. Does not redistribute via this repo. For other quants use
150
+ `make build QUANT=...`. The local-build path applies this repo's
151
+ `Modelfile`; the `hf.co/...` path applies the root-level
152
+ `template`, `system`, and `params` files (kept in sync with the
153
+ `Modelfile`).
154
 
155
+ If you want the Heretic safetensors for `transformers`, fetch them from [`llmfan46/Qwen3.6-27B-uncensored-heretic-v2`](https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2). For the vanilla pre-Heretic Qwen 3.6 27B base, use [`Qwen/Qwen3.6-27B`](https://huggingface.co/Qwen/Qwen3.6-27B).
156
 
157
  ## Architecture
158
 
159
  <p align="left">
160
+ <img src="https://huggingface.co/FoolDev/Thanatos-Heretic-27B/resolve/main/dense-flow.svg" alt="animated dense forward-pass visualization: 64-layer hybrid attention stack with a pulse traversing left-to-right, illuminating Gated DeltaNet (purple) and Gated Attention (cyan) layers in turn" width="800" />
161
  </p>
162
 
163
  - Qwen 3.6 dense, 27B parameters, 64 transformer layers
 
168
  - Vocab 248,320 (shared with 35B-A3B sibling)
169
  - 262 144 native context, extensible to ~1 M with YaRN
170
  - Vision + video supported by the **base architecture** via a separate
171
+ `mmproj` projector (not redistributed here; pull
172
+ `Qwen3.6-27B-mmproj-BF16.gguf` from
173
+ `llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF`, or
174
+ `mmproj-F16.gguf` from `unsloth/Qwen3.6-27B-GGUF` as a reference
175
+ alternative). See [Vision](#vision) below for current loader
176
+ compatibility.
177
  - Multi-token prediction (MTP) head trained for speculative decoding β€”
178
  present in the upstream `Qwen/Qwen3.6-27B` safetensors and usable via
179
  vLLM (`qwen3_next_mtp`) or SGLang (`--speculative-algo NEXTN`).
180
  **Not usable via llama.cpp / Ollama today**: the GGUF converter
181
  (`convert_hf_to_gguf.py`) explicitly skips MTP tensors for the
182
  `qwen35` / `qwen35moe` arch family ("MTP tensors are not used at
183
+ inference yet"), so the standard GGUFs (this bundle, unsloth's,
184
+ llmfan46's Heretic v2) ship with 851 tensors and no MTP head.
185
+ llmfan46 also publishes a separate
186
+ `Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-GGUF` repo
187
+ that keeps the MTP tensors for vLLM/SGLang users who want both
188
+ Heretic v2 + MTP. llama.cpp's MTP support (PR #22673, merged
189
+ 2026-05-16) currently covers other architectures only; tracking
190
+ that PR's follow-up work for when qwen35 / qwen35moe consumer
191
+ support lands. (Earlier README versions claimed MTP was available
192
+ via llama.cpp without this caveat β€” confirmed empirically via
193
+ `gguf.GGUFReader` on both this bundle and
194
+ `unsloth/Qwen3.6-27B-GGUF`, 2026-05-19.)
195
 
196
  **The bundled GGUF declares `general.architecture: 'qwen35'`** β€” not a
197
  workaround for an unimplemented `qwen36` arch, but the canonical
 
207
  exists in `transformers`; Qwen reuses the 3.5 class names.
208
  - **llama.cpp's converter.** `convert_hf_to_gguf.py` registers
209
  `Qwen3_5ForCausalLM` β†’ `MODEL_ARCH.QWEN35` and
210
+ `Qwen3_5MoeForCausalLM` β†’ `MODEL_ARCH.QWEN35MOE`. The Heretic
211
+ GGUFs this repo pulls from
212
+ (`llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF`) inherit those
213
+ stamps, as do the upstream unsloth GGUFs (`unsloth/Qwen3.6-27B-GGUF`,
214
+ `unsloth/Qwen3.6-35B-A3B-GGUF`).
215
  - **llama.cpp's model code.** `src/models/qwen35.cpp` has an
216
  explicit `case 64: type = LLM_TYPE_27B` branch for this model;
217
  `qwen35moe.cpp` has `case 40: type = LLM_TYPE_35B_A3B` for the
 
223
  `qwen35` already loads the model the upstream code path was
224
  designed to load.
225
 
226
+ `ollama run hf.co/FoolDev/Thanatos-Heretic-27B` and `llama-server -m
227
  Thanatos-27B.Q4_K_M.gguf` both load directly on current stock
228
  loaders.
229
 
 
280
  heal-hf` and `make load-bundle`) and any future arch flip:
281
 
282
  ```bash
283
+ # qwen36 -> qwen35 (the legacy recovery direction, for blobs
284
+ # pulled from the pre-rename FoolDev/Thanatos-27B repo)
285
  python3 scripts/rename_arch.py \
286
  --from-arch qwen36 --to-arch qwen35 \
287
  Thanatos-27B.Q4_K_M.qwen36.gguf \
 
297
  ```bash
298
  # A. Pull straight from HF (gets the bundled Q4_K_M GGUF + the
299
  # root-level template / system / params files in one step):
300
+ ollama run hf.co/FoolDev/Thanatos-Heretic-27B # 17 GB Q4_K_M, qwen35-stamped
301
 
302
+ # B. Build a local `thanatos-heretic-27b` tag from THIS repo's bundle
303
  # (LFS smudge if needed, then `ollama create`). Useful if you
304
  # want a bare local tag rather than the `hf.co/...` path:
305
+ make load-bundle # creates local tag thanatos-heretic-27b
306
+ ollama run thanatos-heretic-27b
307
+
308
+ # C. Bypass the bundle: download a qwen35-stamped Heretic v2 GGUF
309
+ # from llmfan46 and build locally. Loads on every current
310
+ # llama.cpp / Ollama. This is the path that gets you actual
311
+ # Heretic behavior until the bundled blob is rebundled.
312
+ make build # Q4_K_M -> thanatos-heretic-27b
313
+ make build QUANT=Q3_K_M # 13 GB smaller quant
314
  make build QUANT=Q5_K_M # 20 GB higher quality
315
+ make build GGUF_PATH=~/models/Qwen3.6-27B-uncensored-heretic-v2-Q4_K_M.gguf # skip download
316
+ ollama run thanatos-heretic-27b
317
  ```
318
 
319
  Under the hood, `make build` calls `scripts/build.sh`, which downloads the
 
321
  runs `ollama create` with the matching `Modelfile`.
322
 
323
  If you'd rather do it by hand: edit the `FROM` line in `Modelfile` and
324
+ run `ollama create thanatos-heretic-27b -f Modelfile && ollama run thanatos-heretic-27b`.
325
 
326
  Confirm everything works:
327
 
 
336
 
337
  | App | How to load this model |
338
  |---|---|
339
+ | **Ollama** | `ollama run hf.co/FoolDev/Thanatos-Heretic-27B` (default Q4_K_M). Pulls the GGUF + the root-level `template` / `system` / `params` files in one step (HF's Ollama bridge ingests these three files; it does **not** read `Modelfile`). For other quants, `make build QUANT=Q3_K_M` downloads from `llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF` and creates a local Ollama tag using the `Modelfile`, which is kept in sync with the bridge files. |
340
+ | **LM Studio** | Search β†’ `FoolDev/Thanatos-Heretic-27B` β†’ pick `Thanatos-27B.Q4_K_M.gguf` (current bundled filename; will become `Thanatos-Heretic-27B.Q4_K_M.gguf` after the rebundle). Uses the GGUF's embedded jinja chat template (Qwen 3.6 ChatML); set the system prompt manually from the `SYSTEM` block in this repo's `Modelfile`. |
341
+ | **Jan** | Hub β†’ "Import from Hugging Face" β†’ `FoolDev/Thanatos-Heretic-27B`. Same template behavior as LM Studio. |
342
+ | **llama.cpp** | `hf download FoolDev/Thanatos-Heretic-27B Thanatos-27B.Q4_K_M.gguf --local-dir .` then `llama-server -m Thanatos-27B.Q4_K_M.gguf` (or `llama-cli`, `llama-mtmd-cli` for vision via `Qwen3.6-27B-mmproj-BF16.gguf` from the Heretic GGUF repo). |
343
  | **llama-cpp-python** | See `examples/llama_cpp_quickstart.py` (text) and `examples/llama_cpp_vision.py` (image input). |
344
  | **Open WebUI / KoboldCpp / text-generation-webui** | Standard llama.cpp loader path β€” point at the GGUF, use the embedded chat template. |
345
 
 
357
  curl -s http://localhost:11434/v1/chat/completions \
358
  -H 'Content-Type: application/json' \
359
  -d '{
360
+ "model": "thanatos-heretic-27b",
361
  "messages": [
362
  {"role": "system", "content": "You are Thanatos, a precise reasoning assistant."},
363
  {"role": "user", "content": "Explain the Burrows-Wheeler transform in 200 words."}
 
395
 
396
  ## Vision
397
 
398
+ The Qwen 3.6 base (and llmfan46's Heretic v2 finetune of it) supports
399
+ image (and video) input via a separate `mmproj` projector. The full
400
+ multimodal stack is:
401
 
402
  ```
403
+ Qwen3.6-27B-uncensored-heretic-v2-Q4_K_M.gguf (~17 GB, the text decoder)
404
+ Qwen3.6-27B-mmproj-BF16.gguf (~931 MB, the vision projector)
405
  ```
406
 
407
  Both files are at
408
+ [`llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF`](https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF).
409
+ For the vanilla pre-Heretic projector, see
410
+ [`unsloth/Qwen3.6-27B-GGUF`](https://huggingface.co/unsloth/Qwen3.6-27B-GGUF)
411
+ (`mmproj-F16.gguf`, ~927 MB). This repo intentionally does not
412
+ redistribute either.
413
 
414
  ### Loader compatibility β€” the honest table
415
 
 
427
  ```bash
428
  # A. HTTP via llama-server (always built β€” the easiest path).
429
  # Reconfirmed working 2026-05-19 against llama.cpp 389ff61 + Vulkan
430
+ # on a Ryzen AI Max+ 395 / Radeon 8060S iGPU (pre-Heretic Qwen 3.6
431
+ # bundle; Heretic v2 shares the architecture so the recipe carries).
432
  llama-server \
433
+ -m Qwen3.6-27B-uncensored-heretic-v2-Q4_K_M.gguf \
434
+ --mmproj Qwen3.6-27B-mmproj-BF16.gguf \
435
  --host 127.0.0.1 --port 8765 -c 8192 -ngl 99
436
  # then POST OpenAI-style chat completions with an image_url content
437
  # block β€” e.g. {"type":"image_url","image_url":{"url":"data:image/jpeg;base64,..."}}
 
444
  # produce it β€” a plain `cmake --build build` will. If yours didn't,
445
  # run `cmake --build build --target llama-mtmd-cli`.
446
  llama-mtmd-cli \
447
+ -m Qwen3.6-27B-uncensored-heretic-v2-Q4_K_M.gguf \
448
+ --mmproj Qwen3.6-27B-mmproj-BF16.gguf \
449
  --image photo.jpg \
450
  -p "Describe this image."
451
 
452
  # C. Python via llama-cpp-python:
453
  python examples/llama_cpp_vision.py \
454
+ --gguf /path/to/Qwen3.6-27B-uncensored-heretic-v2-Q4_K_M.gguf \
455
+ --mmproj /path/to/Qwen3.6-27B-mmproj-BF16.gguf \
456
  --image /path/to/photo.jpg \
457
  --prompt "What is in this image?"
458
  ```
 
470
  | RTX 3090 / 4090 24 GB | Works, full Q4 offload, ~25-40 tok/s |
471
  | RTX 5090 32 GB | Works, full offload at higher quant (Q5/Q6), ~30-50 tok/s |
472
  | Mac Studio M2/M3 32 GB+ unified | Works, ~15-25 tok/s |
473
+ | 32 GB unified-memory laptops (Mac M-series, Ryzen AI Max+, etc.) | Borderline at Q4. `make build QUANT=Q3_K_M` (~13 GB) and trim `num_ctx` for headroom. |
474
 
475
  Most numbers in this table are estimates from comparable models; the
476
  gradient is right but the absolute values will move Β±20% with prompt
477
  shape, KV cache type, and parallel-request count. Measure your own
478
  machine with `make bench` (3-prompt mix, reports tok/s from Ollama's
479
  `eval_count` / `eval_duration` so it's not stopwatch-noisy). Reference
480
+ data points on a Ryzen AI Max+ 395 / Radeon 8060S iGPU under Vulkan
481
+ (measured against the pre-rename Qwen 3.6 bundle; Heretic v2 inherits
482
+ the architecture so per-step cost should match within bench noise):
483
  **~12.3 tok/s at Q3_K_S** and **~9.3 tok/s at Q4_K_M** (3-prompt mix,
484
  steady across short / medium / long prompts), sitting between CPU-only
485
  and a 24 GB discrete card as expected. An earlier ROCm snapshot of the
486
  same Q3_K_S bench gave ~10.1 tok/s β€” Vulkan was the clear winner on
487
+ this hardware. (Heretic v2 publishes Q3_K_M rather than Q3_K_S; the
488
+ ~13 GB Q3_K_M should sit within 5% of the ~12 GB Q3_K_S numbers.)
489
 
490
  ## Chat template
491
 
 
499
  `.Tools` / `.ToolCalls` blocks Ollama's capability detector requires.
500
  Two paths fix this, depending on how you pull the model:
501
 
502
+ - **`ollama run hf.co/FoolDev/Thanatos-Heretic-27B`** β€” HF's Ollama bridge applies
503
  the root-level `template` / `system` / `params` files in this repo
504
  (the bridge does **not** read `Modelfile`).
505
+ - **`make build` / `ollama create thanatos-heretic-27b -f Modelfile`** β€” uses the
506
  `Modelfile`'s `TEMPLATE` block.
507
 
508
  Both routes wire `.Tools` / `.ToolCalls` and tools work end-to-end on
 
545
  **Ollama path** (this repo's `Modelfile`). The `TEMPLATE` directive
546
  prompts the model to emit JSON-in-XML, the form Ollama's tool-call
547
  extractor parses into a structured `tool_calls` array. After
548
+ `make build`, `ollama show thanatos-heretic-27b` lists `tools` and `thinking`
549
  under **Capabilities**, and both `/api/chat` and `/v1/chat/completions`
550
  accept a `tools` array.
551
 
 
586
  - **No mmproj in this release**, and **vision via Ollama is broken upstream** (the qwen35/qwen35moe arch entries are present in Ollama's Go engine but missing from the C++ llama.cpp fallback Ollama uses when mmproj is attached β€” see the [Vision](#vision) section). For image input use llama.cpp directly until that's fixed.
587
  - **Q4_K_M quality loss** is real. Use Q5_K_M or Q6_K if you have the VRAM (~20-22 GB).
588
  - **No formal evaluation in this card.** Numbers above are estimates.
589
+ - **Bundled blob is pre-Heretic.** The currently-bundled `Thanatos-27B.Q4_K_M.gguf` blob is the legacy Qwen 3.6 27B Q4_K_M quant from before the rename β€” it behaves like vanilla Qwen 3.6, not Heretic v2. Use `make build` (which pulls the Heretic GGUF from llmfan46) until the rebundle ships.
590
+ - **Uncensored base.** The Heretic v2 abliteration dials back the refusal-training of upstream Qwen 3.6. Outputs may be more compliant with sensitive requests than the vanilla base; the Thanatos system prompt still steers behavior, but the safety floor is lower. Apply your own filtering for user-facing deployments.
591
 
592
  ## Related models
593
 
594
  | Model | Notes |
595
  |---|---|
596
+ | [llmfan46/Qwen3.6-27B-uncensored-heretic-v2](https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2) | **Immediate base**, safetensors |
597
+ | [llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF](https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF) | Recommended GGUF source (what `make build` pulls from) |
598
+ | [llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved](https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved) | Same Heretic v2 but keeps the MTP head for vLLM / SGLang speculative decoding |
599
+ | [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) | Upstream pre-Heretic base, safetensors |
600
+ | [unsloth/Qwen3.6-27B-GGUF](https://huggingface.co/unsloth/Qwen3.6-27B-GGUF) | Pre-Heretic GGUF mirror + reference `mmproj-F16.gguf` projector |
601
  | [FoolDev/Janus-35B](https://huggingface.co/FoolDev/Janus-35B) | 35B-A3B MoE sibling. More capacity, more memory pressure. |
602
  | [Crownelius/Crow-9B-HERETIC-4.6](https://huggingface.co/Crownelius/Crow-9B-HERETIC-4.6) | 9B starter model when 27B/35B is too heavy |
603
 
604
  ## Credits
605
 
606
+ - Immediate base: [llmfan46/Qwen3.6-27B-uncensored-heretic-v2](https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2) β€” Heretic-style abliteration of Qwen 3.6 27B
607
+ - Upstream base: [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) (Alibaba)
608
  - Reasoning teacher: Claude Opus 4.7 (Anthropic)
609
  - Distillation lineage and dataset curation: [Crownelius](https://huggingface.co/Crownelius)
610
 
examples/README.md CHANGED
@@ -1,13 +1,13 @@
1
- # Thanatos-27B examples
2
 
3
  Four minimal entry points. Pick the one that matches how you run models.
4
 
5
  | File | Backend | When to use |
6
  |---|---|---|
7
- | `ollama_chat.py` | Ollama HTTP API | You already have `ollama serve` running and the `thanatos-27b` model created from the project `Modelfile`. **Text + tool calling** β€” vision via Ollama is broken upstream for this arch. |
8
- | `transformers_quickstart.py` | Hugging Face Transformers | You want to run the upstream safetensors (`Qwen/Qwen3.6-27B`) on GPU, optionally in 4-bit via bitsandbytes. |
9
  | `llama_cpp_quickstart.py` | llama-cpp-python | You want to invoke a local GGUF directly without a daemon (CI, batch jobs, scripts). Text only. |
10
- | `llama_cpp_vision.py` | llama-cpp-python + mmproj | **Image input.** Loads a text GGUF + `mmproj-F16.gguf` and answers questions about an image. The only working vision path right now. |
11
 
12
  All four apply the same Thanatos system prompt and sampling defaults
13
  (`temp=0.6, top_p=0.95, top_k=20, repeat_penalty=1.05`) so behavior should
@@ -24,9 +24,9 @@ root-level `template` / `system` / `params` files via HF's Ollama
24
  bridge):
25
 
26
  ```bash
27
- ollama pull hf.co/FoolDev/Thanatos-27B # 17 GB Q4_K_M (only bundled quant)
28
  pip install requests
29
- MODEL=hf.co/FoolDev/Thanatos-27B python ollama_chat.py
30
  ```
31
 
32
  If you pulled before the latest qwen35 re-stamp (HF commit
@@ -36,13 +36,14 @@ in place (qwen36 β†’ qwen35, metadata-only, ~5 s) β€” the same
36
  tag then loads. Fresh pulls after the re-stamp go straight
37
  through.
38
 
39
- For a non-bundled quant (e.g. Q3_K_S ~12 GB, Q5_K_M ~20 GB),
40
- `make build QUANT=...` downloads from `unsloth/Qwen3.6-27B-GGUF`
41
- and creates a local `thanatos-27b` tag:
 
42
 
43
  ```bash
44
- cd .. && make build QUANT=Q3_K_S && cd examples
45
- MODEL=thanatos-27b python ollama_chat.py
46
  ```
47
 
48
  Or build a local tag from this repo's bundled GGUF without going
@@ -50,12 +51,12 @@ through the HF pull:
50
 
51
  ```bash
52
  cd .. && make load-bundle && cd examples
53
- MODEL=thanatos-27b python ollama_chat.py
54
  ```
55
 
56
  For a quant the repo doesn't bundle (e.g. Q5_K_M), `make build` will
57
- fetch it from `unsloth/Qwen3.6-27B-GGUF` and patch the `Modelfile`
58
- `FROM` line into a temp copy automatically:
59
 
60
  ```bash
61
  cd .. && make build QUANT=Q5_K_M && cd examples
@@ -74,7 +75,7 @@ python transformers_quickstart.py --no-4bit # bf16, ~54 GB VRAM
74
 
75
  ```bash
76
  pip install llama-cpp-python # CPU-only build
77
- python llama_cpp_quickstart.py /path/to/Qwen3.6-27B-Q4_K_M.gguf --gpu-layers 99
78
  ```
79
 
80
  For GPU offload, rebuild llama-cpp-python with the matching backend β€” see
@@ -83,13 +84,13 @@ the script header for `CMAKE_ARGS` recipes (CUDA, Metal, ROCm/HIP).
83
  ### Vision (image input)
84
 
85
  ```bash
86
- # Pull the projector once (~927 MB):
87
- hf download unsloth/Qwen3.6-27B-GGUF mmproj-F16.gguf --local-dir .
88
 
89
  pip install llama-cpp-python pillow
90
  python llama_cpp_vision.py \
91
- --gguf /path/to/Qwen3.6-27B-Q4_K_M.gguf \
92
- --mmproj /path/to/mmproj-F16.gguf \
93
  --image /path/to/photo.jpg \
94
  --prompt "Describe this image."
95
  ```
@@ -101,7 +102,7 @@ lacks them. `ollama create` accepts the dual-`FROM` and `ollama show`
101
  reports `vision` capability, but the first inference call fails with
102
  `error loading model architecture: unknown model architecture:
103
  'qwen35'` (verified empirically against the dense 27B +
104
- `mmproj-F16.gguf`). Tracked in
105
  [ollama/ollama#15898](https://github.com/ollama/ollama/issues/15898).
106
  Until that's fixed, llama.cpp / llama-cpp-python is the working path
107
  for vision.
 
1
+ # Thanatos-Heretic-27B examples
2
 
3
  Four minimal entry points. Pick the one that matches how you run models.
4
 
5
  | File | Backend | When to use |
6
  |---|---|---|
7
+ | `ollama_chat.py` | Ollama HTTP API | You already have `ollama serve` running and the `thanatos-heretic-27b` model created from the project `Modelfile`. **Text + tool calling** β€” vision via Ollama is broken upstream for this arch. |
8
+ | `transformers_quickstart.py` | Hugging Face Transformers | You want to run the Heretic safetensors (`llmfan46/Qwen3.6-27B-uncensored-heretic-v2`) on GPU, optionally in 4-bit via bitsandbytes. |
9
  | `llama_cpp_quickstart.py` | llama-cpp-python | You want to invoke a local GGUF directly without a daemon (CI, batch jobs, scripts). Text only. |
10
+ | `llama_cpp_vision.py` | llama-cpp-python + mmproj | **Image input.** Loads a text GGUF + `Qwen3.6-27B-mmproj-BF16.gguf` and answers questions about an image. The only working vision path right now. |
11
 
12
  All four apply the same Thanatos system prompt and sampling defaults
13
  (`temp=0.6, top_p=0.95, top_k=20, repeat_penalty=1.05`) so behavior should
 
24
  bridge):
25
 
26
  ```bash
27
+ ollama pull hf.co/FoolDev/Thanatos-Heretic-27B # 17 GB Q4_K_M (only bundled quant)
28
  pip install requests
29
+ MODEL=hf.co/FoolDev/Thanatos-Heretic-27B python ollama_chat.py
30
  ```
31
 
32
  If you pulled before the latest qwen35 re-stamp (HF commit
 
36
  tag then loads. Fresh pulls after the re-stamp go straight
37
  through.
38
 
39
+ For a non-bundled quant (e.g. Q3_K_M ~12 GB, Q5_K_M ~20 GB),
40
+ `make build QUANT=...` downloads from
41
+ `llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF` and creates a
42
+ local `thanatos-heretic-27b` tag:
43
 
44
  ```bash
45
+ cd .. && make build QUANT=Q3_K_M && cd examples
46
+ MODEL=thanatos-heretic-27b python ollama_chat.py
47
  ```
48
 
49
  Or build a local tag from this repo's bundled GGUF without going
 
51
 
52
  ```bash
53
  cd .. && make load-bundle && cd examples
54
+ MODEL=thanatos-heretic-27b python ollama_chat.py
55
  ```
56
 
57
  For a quant the repo doesn't bundle (e.g. Q5_K_M), `make build` will
58
+ fetch it from `llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF` and
59
+ patch the `Modelfile` `FROM` line into a temp copy automatically:
60
 
61
  ```bash
62
  cd .. && make build QUANT=Q5_K_M && cd examples
 
75
 
76
  ```bash
77
  pip install llama-cpp-python # CPU-only build
78
+ python llama_cpp_quickstart.py /path/to/Qwen3.6-27B-uncensored-heretic-v2-Q4_K_M.gguf --gpu-layers 99
79
  ```
80
 
81
  For GPU offload, rebuild llama-cpp-python with the matching backend β€” see
 
84
  ### Vision (image input)
85
 
86
  ```bash
87
+ # Pull the projector once (~931 MB):
88
+ hf download llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF Qwen3.6-27B-mmproj-BF16.gguf --local-dir .
89
 
90
  pip install llama-cpp-python pillow
91
  python llama_cpp_vision.py \
92
+ --gguf /path/to/Qwen3.6-27B-uncensored-heretic-v2-Q4_K_M.gguf \
93
+ --mmproj /path/to/Qwen3.6-27B-mmproj-BF16.gguf \
94
  --image /path/to/photo.jpg \
95
  --prompt "Describe this image."
96
  ```
 
102
  reports `vision` capability, but the first inference call fails with
103
  `error loading model architecture: unknown model architecture:
104
  'qwen35'` (verified empirically against the dense 27B +
105
+ the F16 reference projector). Tracked in
106
  [ollama/ollama#15898](https://github.com/ollama/ollama/issues/15898).
107
  Until that's fixed, llama.cpp / llama-cpp-python is the working path
108
  for vision.
examples/llama_cpp_quickstart.py CHANGED
@@ -1,6 +1,6 @@
1
  #!/usr/bin/env python3
2
  """
3
- Thanatos-27B β€” llama-cpp-python quickstart.
4
 
5
  Skip Ollama entirely and call the GGUF directly through llama-cpp-python.
6
  Useful for batch jobs, CI, or environments where you don't want a daemon.
 
1
  #!/usr/bin/env python3
2
  """
3
+ Thanatos-Heretic-27B β€” llama-cpp-python quickstart.
4
 
5
  Skip Ollama entirely and call the GGUF directly through llama-cpp-python.
6
  Useful for batch jobs, CI, or environments where you don't want a daemon.
examples/llama_cpp_vision.py CHANGED
@@ -1,6 +1,6 @@
1
  #!/usr/bin/env python3
2
  """
3
- Thanatos-27B β€” vision (image-text-to-text) via llama-cpp-python.
4
 
5
  Why this script exists:
6
  Ollama's Go engine has the qwen35 / qwen35moe arch entries (text
@@ -23,21 +23,21 @@ Install:
23
  # CMAKE_ARGS="-DGGML_METAL=on" pip install llama-cpp-python --no-binary :all:
24
  # CMAKE_ARGS="-DGGML_HIPBLAS=on" pip install llama-cpp-python --no-binary :all:
25
 
26
- Files you need (both from unsloth/Qwen3.6-27B-GGUF):
27
- 1. A text GGUF (any quant): e.g. Qwen3.6-27B-Q4_K_M.gguf (~17 GB)
28
- 2. A vision projector: mmproj-F16.gguf (~927 MB)
29
 
30
  Usage:
31
  python llama_cpp_vision.py \
32
- --gguf /path/to/Qwen3.6-27B-Q4_K_M.gguf \
33
- --mmproj /path/to/mmproj-F16.gguf \
34
  --image /path/to/photo.jpg \
35
  --prompt "What is in this image? Be specific."
36
 
37
  # CLI alternative without python binding (ships with llama.cpp):
38
  # llama-mtmd-cli \
39
- # -m Qwen3.6-27B-Q4_K_M.gguf \
40
- # --mmproj mmproj-F16.gguf \
41
  # --image photo.jpg \
42
  # -p "Describe this image."
43
  """
 
1
  #!/usr/bin/env python3
2
  """
3
+ Thanatos-Heretic-27B β€” vision (image-text-to-text) via llama-cpp-python.
4
 
5
  Why this script exists:
6
  Ollama's Go engine has the qwen35 / qwen35moe arch entries (text
 
23
  # CMAKE_ARGS="-DGGML_METAL=on" pip install llama-cpp-python --no-binary :all:
24
  # CMAKE_ARGS="-DGGML_HIPBLAS=on" pip install llama-cpp-python --no-binary :all:
25
 
26
+ Files you need (both from llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF):
27
+ 1. A text GGUF (any quant): e.g. Qwen3.6-27B-uncensored-heretic-v2-Q4_K_M.gguf (~17 GB)
28
+ 2. A vision projector: Qwen3.6-27B-mmproj-BF16.gguf (~931 MB)
29
 
30
  Usage:
31
  python llama_cpp_vision.py \
32
+ --gguf /path/to/Qwen3.6-27B-uncensored-heretic-v2-Q4_K_M.gguf \
33
+ --mmproj /path/to/Qwen3.6-27B-mmproj-BF16.gguf \
34
  --image /path/to/photo.jpg \
35
  --prompt "What is in this image? Be specific."
36
 
37
  # CLI alternative without python binding (ships with llama.cpp):
38
  # llama-mtmd-cli \
39
+ # -m Qwen3.6-27B-uncensored-heretic-v2-Q4_K_M.gguf \
40
+ # --mmproj Qwen3.6-27B-mmproj-BF16.gguf \
41
  # --image photo.jpg \
42
  # -p "Describe this image."
43
  """
examples/ollama_chat.py CHANGED
@@ -1,17 +1,17 @@
1
  #!/usr/bin/env python3
2
  """
3
- Thanatos-27B β€” Ollama chat examples.
4
 
5
  Prerequisites (pick one):
6
 
7
  A. From the bundled GGUFs (default flow):
8
  $ make build # uses Thanatos-27B.Q4_K_M.gguf
9
  # or:
10
- $ ollama create thanatos-27b -f ../Modelfile
11
 
12
  B. Pull straight from HF (Q4_K_M is the only bundled quant):
13
- $ ollama run hf.co/FoolDev/Thanatos-27B
14
- # then set MODEL=hf.co/FoolDev/Thanatos-27B below
15
 
16
  Then:
17
  $ ollama serve # usually already running
@@ -39,7 +39,7 @@ from typing import Any, Iterator
39
 
40
  import requests
41
 
42
- MODEL = os.environ.get("MODEL", "thanatos-27b")
43
  HOST = os.environ.get("HOST", "http://localhost:11434")
44
 
45
  _THINK_RE = re.compile(r"<think>.*?</think>\s*", re.DOTALL)
 
1
  #!/usr/bin/env python3
2
  """
3
+ Thanatos-Heretic-27B β€” Ollama chat examples.
4
 
5
  Prerequisites (pick one):
6
 
7
  A. From the bundled GGUFs (default flow):
8
  $ make build # uses Thanatos-27B.Q4_K_M.gguf
9
  # or:
10
+ $ ollama create thanatos-heretic-27b -f ../Modelfile
11
 
12
  B. Pull straight from HF (Q4_K_M is the only bundled quant):
13
+ $ ollama run hf.co/FoolDev/Thanatos-Heretic-27B
14
+ # then set MODEL=hf.co/FoolDev/Thanatos-Heretic-27B below
15
 
16
  Then:
17
  $ ollama serve # usually already running
 
39
 
40
  import requests
41
 
42
+ MODEL = os.environ.get("MODEL", "thanatos-heretic-27b")
43
  HOST = os.environ.get("HOST", "http://localhost:11434")
44
 
45
  _THINK_RE = re.compile(r"<think>.*?</think>\s*", re.DOTALL)
examples/transformers_quickstart.py CHANGED
@@ -1,12 +1,15 @@
1
  #!/usr/bin/env python3
2
  """
3
- Thanatos-27B β€” Hugging Face Transformers quickstart.
4
 
5
- Loads the upstream Qwen 3.6 27B safetensors directly and runs a single
6
- chat turn using its embedded chat template. Thanatos-27B is a *wrapper*
7
- around that base, so for the transformers route there is nothing to
8
- download from this repo β€” point at Qwen/Qwen3.6-27B and apply the same
9
- system prompt the Modelfile uses.
 
 
 
10
 
11
  Requirements:
12
  pip install --upgrade "transformers>=4.45" accelerate sentencepiece bitsandbytes
@@ -36,7 +39,7 @@ except ImportError as e: # pragma: no cover
36
  )
37
 
38
 
39
- MODEL_ID = "Qwen/Qwen3.6-27B"
40
 
41
  THANATOS_SYSTEM = (
42
  "You are Thanatos, a precise and capable assistant for reasoning, writing, "
 
1
  #!/usr/bin/env python3
2
  """
3
+ Thanatos-Heretic-27B β€” Hugging Face Transformers quickstart.
4
 
5
+ Loads the Heretic v2 Qwen 3.6 27B safetensors directly and runs a single
6
+ chat turn using its embedded chat template. Thanatos-Heretic-27B is a
7
+ *wrapper* around that base, so for the transformers route there is nothing
8
+ to download from this repo β€” point at llmfan46/Qwen3.6-27B-uncensored-heretic-v2
9
+ and apply the same system prompt the Modelfile uses.
10
+
11
+ Set MODEL_ID = "Qwen/Qwen3.6-27B" to bypass the Heretic abliteration and
12
+ load the vanilla upstream base instead.
13
 
14
  Requirements:
15
  pip install --upgrade "transformers>=4.45" accelerate sentencepiece bitsandbytes
 
39
  )
40
 
41
 
42
+ MODEL_ID = "llmfan46/Qwen3.6-27B-uncensored-heretic-v2"
43
 
44
  THANATOS_SYSTEM = (
45
  "You are Thanatos, a precise and capable assistant for reasoning, writing, "
scripts/bench.sh CHANGED
@@ -1,5 +1,5 @@
1
  #!/usr/bin/env bash
2
- # Thanatos-27B β€” tok/s benchmark via Ollama.
3
  #
4
  # Reads timing from Ollama's /api/chat response metadata (eval_count and
5
  # eval_duration are authoritative β€” no client-side stopwatch noise) and
@@ -7,14 +7,14 @@
7
  # number generalises a bit beyond a single shape.
8
  #
9
  # Usage:
10
- # ./scripts/bench.sh # uses MODEL=thanatos-27b
11
- # MODEL=thanatos-27b ./scripts/bench.sh
12
  # HOST=http://localhost:11434 ./scripts/bench.sh
13
  #
14
  # Requires: curl, jq, a running Ollama daemon with the model created.
15
  set -euo pipefail
16
 
17
- MODEL="${MODEL:-thanatos-27b}"
18
  HOST="${HOST:-http://localhost:11434}"
19
 
20
  red() { printf "\033[31m%s\033[0m\n" "$*" >&2; }
 
1
  #!/usr/bin/env bash
2
+ # Thanatos-Heretic-27B β€” tok/s benchmark via Ollama.
3
  #
4
  # Reads timing from Ollama's /api/chat response metadata (eval_count and
5
  # eval_duration are authoritative β€” no client-side stopwatch noise) and
 
7
  # number generalises a bit beyond a single shape.
8
  #
9
  # Usage:
10
+ # ./scripts/bench.sh # uses MODEL=thanatos-heretic-27b
11
+ # MODEL=thanatos-heretic-27b ./scripts/bench.sh
12
  # HOST=http://localhost:11434 ./scripts/bench.sh
13
  #
14
  # Requires: curl, jq, a running Ollama daemon with the model created.
15
  set -euo pipefail
16
 
17
+ MODEL="${MODEL:-thanatos-heretic-27b}"
18
  HOST="${HOST:-http://localhost:11434}"
19
 
20
  red() { printf "\033[31m%s\033[0m\n" "$*" >&2; }
scripts/build.sh CHANGED
@@ -1,5 +1,5 @@
1
  #!/usr/bin/env bash
2
- # Thanatos-27B β€” fetch a Qwen 3.6 27B GGUF and build the Ollama model.
3
  #
4
  # Usage:
5
  # ./scripts/build.sh # default: Q4_K_M
@@ -7,28 +7,27 @@
7
  # QUANT=Q6_K ./scripts/build.sh
8
  #
9
  # Skip the download by pointing at a GGUF you already have:
10
- # GGUF_PATH=/path/to/Qwen3.6-27B-Q4_K_M.gguf ./scripts/build.sh Q4_K_M
11
  #
12
  # Requires: huggingface-cli (or hf), ollama, awk.
13
  set -euo pipefail
14
 
15
  QUANT="${1:-${QUANT:-Q4_K_M}}"
16
 
17
- REPO_ID="${REPO_ID:-unsloth/Qwen3.6-27B-GGUF}"
18
- # Upstream uses dashes, e.g. Qwen3.6-27B-Q4_K_M.gguf. Quants known to exist
19
- # at unsloth/Qwen3.6-27B-GGUF (as of 2026-04):
20
- # Q3_K_S Q3_K_M Q4_0 Q4_1 Q4_K_S Q4_K_M Q5_K_S Q5_K_M Q6_K Q8_0
21
- # IQ4_XS IQ4_NL
22
- # UD-IQ2_XXS UD-IQ2_M UD-Q2_K_XL UD-IQ3_XXS UD-Q3_K_XL UD-Q4_K_XL
23
- # UD-Q5_K_XL UD-Q6_K_XL UD-Q8_K_XL
24
- GGUF_NAME="Qwen3.6-27B-${QUANT}.gguf"
25
  ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
26
  # GGUF_PATH defaults to ${ROOT}/${GGUF_NAME}, but can be overridden so users
27
  # with cached weights elsewhere don't have to copy or symlink anything.
28
  GGUF_PATH="${GGUF_PATH:-${ROOT}/${GGUF_NAME}}"
29
 
30
  MODELFILE="${ROOT}/Modelfile"
31
- TAG="${TAG:-thanatos-27b}"
32
 
33
  echo "[*] repo: ${REPO_ID}"
34
  echo "[*] quant: ${QUANT}"
@@ -96,4 +95,4 @@ ollama create "${TAG}" -f "${TMP_MODELFILE}"
96
  echo
97
  echo "[+] Done. Try it:"
98
  echo " ollama run ${TAG}"
99
- echo " python ${ROOT}/examples/ollama_chat.py # update MODEL constant if not 'thanatos-27b'"
 
1
  #!/usr/bin/env bash
2
+ # Thanatos-Heretic-27B β€” fetch a Qwen 3.6 27B GGUF and build the Ollama model.
3
  #
4
  # Usage:
5
  # ./scripts/build.sh # default: Q4_K_M
 
7
  # QUANT=Q6_K ./scripts/build.sh
8
  #
9
  # Skip the download by pointing at a GGUF you already have:
10
+ # GGUF_PATH=/path/to/Qwen3.6-27B-uncensored-heretic-v2-Q4_K_M.gguf ./scripts/build.sh Q4_K_M
11
  #
12
  # Requires: huggingface-cli (or hf), ollama, awk.
13
  set -euo pipefail
14
 
15
  QUANT="${1:-${QUANT:-Q4_K_M}}"
16
 
17
+ REPO_ID="${REPO_ID:-llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF}"
18
+ # Filenames at llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF follow
19
+ # Qwen3.6-27B-uncensored-heretic-v2-${QUANT}.gguf
20
+ # Quants known to exist (as of 2026-05):
21
+ # Q3_K_M Q3_K_L Q4_K_S Q4_K_M Q5_K_S Q5_K_M Q6_K Q8_0 BF16
22
+ # Note: no Q3_K_S in this repo β€” use Q3_K_M for the smallest practical quant.
23
+ GGUF_NAME="Qwen3.6-27B-uncensored-heretic-v2-${QUANT}.gguf"
 
24
  ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
25
  # GGUF_PATH defaults to ${ROOT}/${GGUF_NAME}, but can be overridden so users
26
  # with cached weights elsewhere don't have to copy or symlink anything.
27
  GGUF_PATH="${GGUF_PATH:-${ROOT}/${GGUF_NAME}}"
28
 
29
  MODELFILE="${ROOT}/Modelfile"
30
+ TAG="${TAG:-thanatos-heretic-27b}"
31
 
32
  echo "[*] repo: ${REPO_ID}"
33
  echo "[*] quant: ${QUANT}"
 
95
  echo
96
  echo "[+] Done. Try it:"
97
  echo " ollama run ${TAG}"
98
+ echo " python ${ROOT}/examples/ollama_chat.py # update MODEL constant if not 'thanatos-heretic-27b'"
scripts/check.sh CHANGED
@@ -1,5 +1,5 @@
1
  #!/usr/bin/env bash
2
- # Thanatos-27B β€” repo-local sanity checks.
3
  #
4
  # Runs everything that's cheap and catches a real-world bug we've already hit:
5
  #
@@ -104,9 +104,11 @@ fi
104
 
105
  # ---- 5. footgun: dot-vs-dash filename -------------------------------------
106
  #
107
- # Upstream unsloth/Qwen3.6-27B-GGUF uses dashes (Qwen3.6-27B-Q4_K_M.gguf).
108
- # Earlier commits used the wrong dot-separated pattern, which 404s.
109
- # Block re-introduction.
 
 
110
 
111
  blue "[*] grep: forbidden Qwen3.6-27B.Q* filename pattern"
112
  if grep -RnE 'Qwen3\.6-27B\.Q[0-9A-Z_]+\.gguf' \
 
1
  #!/usr/bin/env bash
2
+ # Thanatos-Heretic-27B β€” repo-local sanity checks.
3
  #
4
  # Runs everything that's cheap and catches a real-world bug we've already hit:
5
  #
 
104
 
105
  # ---- 5. footgun: dot-vs-dash filename -------------------------------------
106
  #
107
+ # Upstream llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF (and the
108
+ # legacy unsloth/Qwen3.6-27B-GGUF) use dashes
109
+ # (Qwen3.6-27B-uncensored-heretic-v2-Q4_K_M.gguf,
110
+ # Qwen3.6-27B-Q4_K_M.gguf). Earlier commits used the wrong
111
+ # dot-separated pattern, which 404s. Block re-introduction.
112
 
113
  blue "[*] grep: forbidden Qwen3.6-27B.Q* filename pattern"
114
  if grep -RnE 'Qwen3\.6-27B\.Q[0-9A-Z_]+\.gguf' \
scripts/check_bridge_sync.py CHANGED
@@ -1,13 +1,13 @@
1
  #!/usr/bin/env python3
2
  """
3
- Thanatos-27B β€” verify Modelfile and HF Ollama bridge files stay in sync.
4
 
5
  The repo ships two parallel Ollama configurations:
6
 
7
  - ``Modelfile`` is consumed by the local-build path (``ollama create -f Modelfile``).
8
  It contains ``TEMPLATE`` / ``SYSTEM`` / ``PARAMETER`` directives.
9
  - ``template`` / ``system`` / ``params`` at the repo root are consumed by HF's
10
- Ollama bridge when users ``ollama run hf.co/FoolDev/Thanatos-27B`` directly. HF
11
  does NOT read the Modelfile (per https://huggingface.co/docs/hub/en/ollama).
12
 
13
  If the two configurations drift apart, ``hf.co/...`` users and ``make build``
 
1
  #!/usr/bin/env python3
2
  """
3
+ Thanatos-Heretic-27B β€” verify Modelfile and HF Ollama bridge files stay in sync.
4
 
5
  The repo ships two parallel Ollama configurations:
6
 
7
  - ``Modelfile`` is consumed by the local-build path (``ollama create -f Modelfile``).
8
  It contains ``TEMPLATE`` / ``SYSTEM`` / ``PARAMETER`` directives.
9
  - ``template`` / ``system`` / ``params`` at the repo root are consumed by HF's
10
+ Ollama bridge when users ``ollama run hf.co/FoolDev/Thanatos-Heretic-27B`` directly. HF
11
  does NOT read the Modelfile (per https://huggingface.co/docs/hub/en/ollama).
12
 
13
  If the two configurations drift apart, ``hf.co/...`` users and ``make build``
scripts/fetch_vision.sh CHANGED
@@ -1,5 +1,5 @@
1
  #!/usr/bin/env bash
2
- # Thanatos-27B β€” fetch the vision projector (mmproj) for image input.
3
  #
4
  # Why this is separate from build.sh:
5
  # build.sh is for the Ollama text path. The mmproj is only useful for
@@ -8,16 +8,20 @@
8
  # it (see README Vision section, ollama/ollama#15898).
9
  #
10
  # Usage:
11
- # ./scripts/fetch_vision.sh # default: F16, ~927 MB
12
- # ./scripts/fetch_vision.sh BF16 # ~931 MB
13
- # ./scripts/fetch_vision.sh F32 # ~1.8 GB
 
 
 
 
14
  #
15
  # Requires: huggingface-cli (or hf).
16
  set -euo pipefail
17
 
18
- PRECISION="${1:-${PRECISION:-F16}}"
19
- REPO_ID="${REPO_ID:-unsloth/Qwen3.6-27B-GGUF}"
20
- FILE_NAME="mmproj-${PRECISION}.gguf"
21
  ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
22
  DEST="${MMPROJ_PATH:-${ROOT}/${FILE_NAME}}"
23
 
@@ -58,7 +62,7 @@ fi
58
  echo
59
  echo "[+] Done. Use it via:"
60
  echo " python ${ROOT}/examples/llama_cpp_vision.py \\"
61
- echo " --gguf /path/to/Qwen3.6-27B-Q4_K_M.gguf \\"
62
  echo " --mmproj ${DEST} \\"
63
  echo " --image /path/to/photo.jpg \\"
64
  echo " --prompt 'Describe this image.'"
 
1
  #!/usr/bin/env bash
2
+ # Thanatos-Heretic-27B β€” fetch the vision projector (mmproj) for image input.
3
  #
4
  # Why this is separate from build.sh:
5
  # build.sh is for the Ollama text path. The mmproj is only useful for
 
8
  # it (see README Vision section, ollama/ollama#15898).
9
  #
10
  # Usage:
11
+ # ./scripts/fetch_vision.sh # default: BF16 (~931 MB)
12
+ #
13
+ # llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF publishes BF16 only;
14
+ # for F16/F32 variants fall back to unsloth's reference projector:
15
+ # REPO_ID=unsloth/Qwen3.6-27B-GGUF FILE_NAME=mmproj-F16.gguf ./scripts/fetch_vision.sh
16
+ # (vision tokens are projected the same way across Qwen 3.6 27B
17
+ # finetunes, so the unsloth projector is functionally interchangeable.)
18
  #
19
  # Requires: huggingface-cli (or hf).
20
  set -euo pipefail
21
 
22
+ PRECISION="${1:-${PRECISION:-BF16}}"
23
+ REPO_ID="${REPO_ID:-llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF}"
24
+ FILE_NAME="${FILE_NAME:-Qwen3.6-27B-mmproj-${PRECISION}.gguf}"
25
  ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
26
  DEST="${MMPROJ_PATH:-${ROOT}/${FILE_NAME}}"
27
 
 
62
  echo
63
  echo "[+] Done. Use it via:"
64
  echo " python ${ROOT}/examples/llama_cpp_vision.py \\"
65
+ echo " --gguf /path/to/Qwen3.6-27B-uncensored-heretic-v2-Q4_K_M.gguf \\"
66
  echo " --mmproj ${DEST} \\"
67
  echo " --image /path/to/photo.jpg \\"
68
  echo " --prompt 'Describe this image.'"
scripts/heal_hf_pull.sh CHANGED
@@ -1,10 +1,10 @@
1
  #!/usr/bin/env bash
2
- # Thanatos-27B β€” heal a previously pulled HF-bridge tag whose bundled
3
  # GGUF is `qwen36`-stamped (legacy v0.6.0-era pulls before `964e418`,
4
  # 3rd-round-trip-era pulls between `973d7ef` and `978798f`, or
5
  # 5th-round-trip-era pulls between `ae67ed1` and `e03e10e`).
6
  #
7
- # Fresh pulls of `ollama run hf.co/FoolDev/Thanatos-27B` now get the
8
  # qwen35-stamped bundle and load directly β€” this script is the
9
  # recovery path for users who pulled a qwen36-stamped blob into
10
  # their local Ollama store during one of the qwen36 windows
@@ -13,7 +13,7 @@
13
  # It rebadges the HF-bridge tag's model blob in-place (qwen36 ->
14
  # qwen35, metadata-only, byte-identical tensors) and rewrites the
15
  # manifest's model-layer digest to point at the new blob. After
16
- # running, the cached `hf.co/FoolDev/Thanatos-27B` tag loads.
17
  #
18
  # Idempotent: a tag already on qwen35 / qwen35moe is left untouched.
19
  # The current bundle is qwen35-stamped so this script is a no-op for
@@ -22,13 +22,13 @@
22
  #
23
  # Usage:
24
  # ./scripts/heal_hf_pull.sh # default tag
25
- # TAG=hf.co/FoolDev/Thanatos-27B:Q4_K_M ./scripts/heal_hf_pull.sh
26
  #
27
  # Requires: ollama, jq, python3 with the `gguf` package, sha256sum.
28
  set -euo pipefail
29
 
30
  ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
31
- TAG="${TAG:-hf.co/FoolDev/Thanatos-27B:Q4_K_M}"
32
  OLLAMA_MODELS="${OLLAMA_MODELS:-${HOME}/.ollama/models}"
33
 
34
  red() { printf "\033[31m%s\033[0m\n" "$*"; }
@@ -50,7 +50,7 @@ done
50
 
51
  # `ollama show --modelfile` writes a FROM line with the absolute blob path.
52
  # Reliable regardless of which case variant the user pulled with
53
- # (hf.co's 307 lets `Thanatos-27B` and `thanatos-27b` both resolve to the
54
  # canonical repo, and ollama stores the manifest under whichever case
55
  # was first registered).
56
  #
@@ -79,8 +79,8 @@ blue "[*] blob: ${MODEL_BLOB}"
79
  # referenced from exactly one tag in the heal scenario β€” fresh HF pull
80
  # of a single :Q4_K_M tag β€” but if someone has multiple tags pointing
81
  # at the same blob, we filter down to the one matching ${TAG}.
82
- TAG_PATH="${TAG#hf.co/}" # FoolDev/Thanatos-27B:Q4_K_M
83
- NAMESPACE_PATH="${TAG_PATH%:*}" # FoolDev/Thanatos-27B
84
  TAG_FILE="${TAG_PATH##*:}" # Q4_K_M
85
 
86
  MANIFEST="$(find "${OLLAMA_MODELS}/manifests/hf.co" \
 
1
  #!/usr/bin/env bash
2
+ # Thanatos-Heretic-27B β€” heal a previously pulled HF-bridge tag whose bundled
3
  # GGUF is `qwen36`-stamped (legacy v0.6.0-era pulls before `964e418`,
4
  # 3rd-round-trip-era pulls between `973d7ef` and `978798f`, or
5
  # 5th-round-trip-era pulls between `ae67ed1` and `e03e10e`).
6
  #
7
+ # Fresh pulls of `ollama run hf.co/FoolDev/Thanatos-Heretic-27B` now get the
8
  # qwen35-stamped bundle and load directly β€” this script is the
9
  # recovery path for users who pulled a qwen36-stamped blob into
10
  # their local Ollama store during one of the qwen36 windows
 
13
  # It rebadges the HF-bridge tag's model blob in-place (qwen36 ->
14
  # qwen35, metadata-only, byte-identical tensors) and rewrites the
15
  # manifest's model-layer digest to point at the new blob. After
16
+ # running, the cached `hf.co/FoolDev/Thanatos-Heretic-27B` tag loads.
17
  #
18
  # Idempotent: a tag already on qwen35 / qwen35moe is left untouched.
19
  # The current bundle is qwen35-stamped so this script is a no-op for
 
22
  #
23
  # Usage:
24
  # ./scripts/heal_hf_pull.sh # default tag
25
+ # TAG=hf.co/FoolDev/Thanatos-Heretic-27B:Q4_K_M ./scripts/heal_hf_pull.sh
26
  #
27
  # Requires: ollama, jq, python3 with the `gguf` package, sha256sum.
28
  set -euo pipefail
29
 
30
  ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
31
+ TAG="${TAG:-hf.co/FoolDev/Thanatos-Heretic-27B:Q4_K_M}"
32
  OLLAMA_MODELS="${OLLAMA_MODELS:-${HOME}/.ollama/models}"
33
 
34
  red() { printf "\033[31m%s\033[0m\n" "$*"; }
 
50
 
51
  # `ollama show --modelfile` writes a FROM line with the absolute blob path.
52
  # Reliable regardless of which case variant the user pulled with
53
+ # (hf.co's 307 lets `Thanatos-Heretic-27B` and `thanatos-heretic-27b` both resolve to the
54
  # canonical repo, and ollama stores the manifest under whichever case
55
  # was first registered).
56
  #
 
79
  # referenced from exactly one tag in the heal scenario β€” fresh HF pull
80
  # of a single :Q4_K_M tag β€” but if someone has multiple tags pointing
81
  # at the same blob, we filter down to the one matching ${TAG}.
82
+ TAG_PATH="${TAG#hf.co/}" # FoolDev/Thanatos-Heretic-27B:Q4_K_M
83
+ NAMESPACE_PATH="${TAG_PATH%:*}" # FoolDev/Thanatos-Heretic-27B
84
  TAG_FILE="${TAG_PATH##*:}" # Q4_K_M
85
 
86
  MANIFEST="$(find "${OLLAMA_MODELS}/manifests/hf.co" \
scripts/install-hooks.sh CHANGED
@@ -1,5 +1,5 @@
1
  #!/usr/bin/env bash
2
- # Thanatos-27B β€” install scripts/check.sh as a git pre-commit hook.
3
  #
4
  # Idempotent. Re-runs are safe.
5
  set -euo pipefail
 
1
  #!/usr/bin/env bash
2
+ # Thanatos-Heretic-27B β€” install scripts/check.sh as a git pre-commit hook.
3
  #
4
  # Idempotent. Re-runs are safe.
5
  set -euo pipefail
scripts/load_bundle.sh CHANGED
@@ -1,5 +1,5 @@
1
  #!/usr/bin/env bash
2
- # Thanatos-27B β€” load this repo's bundle into Ollama as a local tag.
3
  #
4
  # The bundled GGUF (Thanatos-27B.Q4_K_M.gguf) is qwen35-stamped and
5
  # loads directly on stock llama.cpp / Ollama. This script is the
@@ -15,13 +15,13 @@
15
  # 3. Run `ollama create <tag> -f <temp Modelfile pointing at the
16
  # resolved bundle>`.
17
  #
18
- # Useful if you want a bare local tag (`thanatos-27b`) rather than
19
- # the `hf.co/FoolDev/Thanatos-27B` path. The legacy qwen36 rebadge
20
  # branch is kept for anyone working from a pre-e03e10e checkout.
21
  #
22
  # Usage:
23
- # ./scripts/load_bundle.sh # default tag: thanatos-27b
24
- # TAG=thanatos-27b-bundle ./scripts/load_bundle.sh
25
  # BUNDLE=/path/to/Thanatos-27B.Q4_K_M.gguf ./scripts/load_bundle.sh
26
  #
27
  # Requires: ollama, python3 with the `gguf` package, hf (if the bundle
@@ -30,8 +30,8 @@ set -euo pipefail
30
 
31
  ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
32
  BUNDLE="${BUNDLE:-${ROOT}/Thanatos-27B.Q4_K_M.gguf}"
33
- TAG="${TAG:-thanatos-27b}"
34
- REPO_ID="${REPO_ID:-FoolDev/Thanatos-27B}"
35
  MODELFILE="${ROOT}/Modelfile"
36
 
37
  red() { printf "\033[31m%s\033[0m\n" "$*"; }
 
1
  #!/usr/bin/env bash
2
+ # Thanatos-Heretic-27B β€” load this repo's bundle into Ollama as a local tag.
3
  #
4
  # The bundled GGUF (Thanatos-27B.Q4_K_M.gguf) is qwen35-stamped and
5
  # loads directly on stock llama.cpp / Ollama. This script is the
 
15
  # 3. Run `ollama create <tag> -f <temp Modelfile pointing at the
16
  # resolved bundle>`.
17
  #
18
+ # Useful if you want a bare local tag (`thanatos-heretic-27b`) rather than
19
+ # the `hf.co/FoolDev/Thanatos-Heretic-27B` path. The legacy qwen36 rebadge
20
  # branch is kept for anyone working from a pre-e03e10e checkout.
21
  #
22
  # Usage:
23
+ # ./scripts/load_bundle.sh # default tag: thanatos-heretic-27b
24
+ # TAG=thanatos-heretic-27b-bundle ./scripts/load_bundle.sh
25
  # BUNDLE=/path/to/Thanatos-27B.Q4_K_M.gguf ./scripts/load_bundle.sh
26
  #
27
  # Requires: ollama, python3 with the `gguf` package, hf (if the bundle
 
30
 
31
  ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
32
  BUNDLE="${BUNDLE:-${ROOT}/Thanatos-27B.Q4_K_M.gguf}"
33
+ TAG="${TAG:-thanatos-heretic-27b}"
34
+ REPO_ID="${REPO_ID:-FoolDev/Thanatos-Heretic-27B}"
35
  MODELFILE="${ROOT}/Modelfile"
36
 
37
  red() { printf "\033[31m%s\033[0m\n" "$*"; }
scripts/smoke_test.sh CHANGED
@@ -1,5 +1,5 @@
1
  #!/usr/bin/env bash
2
- # Thanatos-27B β€” smoke test against a running Ollama daemon.
3
  #
4
  # Verifies:
5
  # 1. The Ollama server is reachable.
@@ -14,11 +14,11 @@
14
  # Usage:
15
  # ./scripts/smoke_test.sh # fast checks only
16
  # TOOLS_TEST=1 ./scripts/smoke_test.sh # add tool-call round-trip
17
- # MODEL=hf.co/FoolDev/Thanatos-27B:Q4_K_M ./scripts/smoke_test.sh
18
  # HOST=http://localhost:11434 ./scripts/smoke_test.sh
19
  set -euo pipefail
20
 
21
- MODEL="${MODEL:-thanatos-27b}"
22
  HOST="${HOST:-http://localhost:11434}"
23
  PROMPT="${PROMPT:-Reply with the single word: OK}"
24
 
@@ -46,9 +46,9 @@ green "[+] server reachable"
46
 
47
  # 2. Model present? Match case-insensitively: Ollama 0.24 normalizes
48
  # model names at lookup but preserves whatever case was first registered
49
- # on disk (e.g. `make load-bundle` may produce `Thanatos-27B:latest`
50
- # even when invoked with TAG=thanatos-27b, if an earlier session left a
51
- # Thanatos-27B manifest dir behind). The exact tag the user typed is
52
  # still valid for `ollama run` β€” the comparison just needs to be
53
  # case-folded to match.
54
  if ! curl -fsS "${HOST}/api/tags" | jq -e --arg m "${MODEL}" '.models[] | select((.name | ascii_downcase) | startswith($m | ascii_downcase))' >/dev/null; then
 
1
  #!/usr/bin/env bash
2
+ # Thanatos-Heretic-27B β€” smoke test against a running Ollama daemon.
3
  #
4
  # Verifies:
5
  # 1. The Ollama server is reachable.
 
14
  # Usage:
15
  # ./scripts/smoke_test.sh # fast checks only
16
  # TOOLS_TEST=1 ./scripts/smoke_test.sh # add tool-call round-trip
17
+ # MODEL=hf.co/FoolDev/Thanatos-Heretic-27B:Q4_K_M ./scripts/smoke_test.sh
18
  # HOST=http://localhost:11434 ./scripts/smoke_test.sh
19
  set -euo pipefail
20
 
21
+ MODEL="${MODEL:-thanatos-heretic-27b}"
22
  HOST="${HOST:-http://localhost:11434}"
23
  PROMPT="${PROMPT:-Reply with the single word: OK}"
24
 
 
46
 
47
  # 2. Model present? Match case-insensitively: Ollama 0.24 normalizes
48
  # model names at lookup but preserves whatever case was first registered
49
+ # on disk (e.g. `make load-bundle` may produce `Thanatos-Heretic-27B:latest`
50
+ # even when invoked with TAG=thanatos-heretic-27b, if an earlier session left a
51
+ # Thanatos-Heretic-27B manifest dir behind). The exact tag the user typed is
52
  # still valid for `ollama run` β€” the comparison just needs to be
53
  # case-folded to match.
54
  if ! curl -fsS "${HOST}/api/tags" | jq -e --arg m "${MODEL}" '.models[] | select((.name | ascii_downcase) | startswith($m | ascii_downcase))' >/dev/null; then
scripts/verify_arch.py CHANGED
@@ -1,6 +1,6 @@
1
  #!/usr/bin/env python3
2
  """
3
- Thanatos-27B β€” verify the README "Architecture" forward-pass bullets
4
  against the actual GGUF metadata.
5
 
6
  Reads either the qwen35- or qwen36-stamped bundle (or any GGUF that
@@ -69,8 +69,8 @@ def main() -> int:
69
  return 2
70
  root = Path(__file__).resolve().parent.parent
71
  default_paths = [
72
- root / "Thanatos-27B.Q4_K_M.qwen35.gguf",
73
- root / "Thanatos-27B.Q4_K_M.qwen36.gguf",
74
  root / "Thanatos-27B.Q4_K_M.gguf",
75
  ]
76
  if len(sys.argv) == 2:
@@ -78,7 +78,7 @@ def main() -> int:
78
  else:
79
  path = next((p for p in default_paths if p.exists() and p.stat().st_size > 1024), None)
80
  if path is None:
81
- print("[!] no Thanatos-27B GGUF found in repo root; pass a path explicitly", file=sys.stderr)
82
  return 2
83
 
84
  print(f"[*] reading: {path}")
 
1
  #!/usr/bin/env python3
2
  """
3
+ Thanatos-Heretic-27B β€” verify the README "Architecture" forward-pass bullets
4
  against the actual GGUF metadata.
5
 
6
  Reads either the qwen35- or qwen36-stamped bundle (or any GGUF that
 
69
  return 2
70
  root = Path(__file__).resolve().parent.parent
71
  default_paths = [
72
+ root / "Thanatos-Heretic-27B.Q4_K_M.qwen35.gguf",
73
+ root / "Thanatos-Heretic-27B.Q4_K_M.qwen36.gguf",
74
  root / "Thanatos-27B.Q4_K_M.gguf",
75
  ]
76
  if len(sys.argv) == 2:
 
78
  else:
79
  path = next((p for p in default_paths if p.exists() and p.stat().st_size > 1024), None)
80
  if path is None:
81
+ print("[!] no Thanatos-Heretic-27B GGUF found in repo root; pass a path explicitly", file=sys.stderr)
82
  return 2
83
 
84
  print(f"[*] reading: {path}")