FoolDev Claude Opus 4.7 commited on
Commit
73e905b
·
1 Parent(s): 7097156

Revert base swap back to Qwen/Qwen3.6-27B (keep -Heretic name)

Browse files

Undoes the Qwen → llmfan46/Qwen3.6-27B-uncensored-heretic-v2 base
swap from 16e1ddd. Project name string (Thanatos-27B-Heretic),
Ollama tag (thanatos-27b-heretic), HF repo URL, banner -HERETIC
wordmark, and git remote are all preserved per explicit choice
("undo base only, keep name").

- README: frontmatter base_model → Qwen/Qwen3.6-27B; drop
base_model_relation and heretic/uncensored tags (imatrix kept).
Tagline, badge, Architecture line, sibling paragraph, Quick-start
path C, Local-apps table, Vision section, Related-models table,
Credits, Known-limitations all back to vanilla framing. Added a
"Note on the name" callout explaining the name-vs-base mismatch.
- Tooling: scripts/build.sh + fetch_vision.sh REPO_ID back to
unsloth/Qwen3.6-27B-GGUF; filename pattern + Q3_K_S smallest
quant restored. Modelfile preamble flipped. transformers
example MODEL_ID back to Qwen/Qwen3.6-27B. examples/README.md
+ llama_cpp_vision.py recipes flipped. CITATION.cff
title/abstract/refs/keywords flipped. Makefile + .gitignore
comments flipped.
- banner.svg subtitle "Dense 27B · Opus 4.7 distilled · uncensored"
→ "Qwen 3.6 · Dense 27B · Opus 4.7 distilled"; PNG re-rasterized.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

.gitignore CHANGED
@@ -5,22 +5,16 @@ __pycache__/
5
  .venv/
6
  venv/
7
 
8
- # Local model weights. We don't redistribute the Heretic v2 GGUFs
9
- # here — `make build` fetches one from
10
- # llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF locally.
11
  # The single Thanatos-27B.*.gguf we DO ship backs the HF/Ollama
12
  # "Use this model" widget (ollama run hf.co/FoolDev/Thanatos-27B-Heretic).
13
- # The bundled file is still named Thanatos-27B.*.gguf from before the
14
- # rename; whitelist also covers Thanatos-27B-Heretic.*.gguf for the
15
- # pending Heretic rebundle.
16
  *.gguf
17
  !Thanatos-27B.*.gguf
18
- !Thanatos-27B-Heretic.*.gguf
19
  # Local-only rebadge experiments produced by scripts/rename_arch.py.
20
  # These re-stamp general.architecture and are not loadable by current
21
  # ollama / llama.cpp; don't track or push them.
22
  Thanatos-27B.*.qwen[0-9]*.gguf
23
- Thanatos-27B-Heretic.*.qwen[0-9]*.gguf
24
  *.safetensors
25
  *.bin
26
 
 
5
  .venv/
6
  venv/
7
 
8
+ # Local model weights. We don't redistribute the upstream Qwen GGUFs
9
+ # here — `make build` fetches one from unsloth/Qwen3.6-27B-GGUF locally.
 
10
  # The single Thanatos-27B.*.gguf we DO ship backs the HF/Ollama
11
  # "Use this model" widget (ollama run hf.co/FoolDev/Thanatos-27B-Heretic).
 
 
 
12
  *.gguf
13
  !Thanatos-27B.*.gguf
 
14
  # Local-only rebadge experiments produced by scripts/rename_arch.py.
15
  # These re-stamp general.architecture and are not loadable by current
16
  # ollama / llama.cpp; don't track or push them.
17
  Thanatos-27B.*.qwen[0-9]*.gguf
 
18
  *.safetensors
19
  *.bin
20
 
CHANGELOG.md CHANGED
@@ -7,6 +7,55 @@ and documentation**, not the underlying base model.
7
 
8
  ## [Unreleased]
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ### Changed (acknowledge HF's `imatrix` auto-tag in frontmatter)
11
  - **Added `imatrix` to the README `tags:` list.** HF's tag
12
  auto-detector was surfacing `imatrix` on the rendered model
 
7
 
8
  ## [Unreleased]
9
 
10
+ ### Reverted (base swap to Heretic v2 — name kept, base back to vanilla Qwen)
11
+ - **Undone the `Qwen/Qwen3.6-27B` → `llmfan46/Qwen3.6-27B-uncensored-heretic-v2`
12
+ base swap** that shipped in `16e1ddd` and was polished in
13
+ subsequent commits. Current base is back to vanilla
14
+ `Qwen/Qwen3.6-27B`. README frontmatter `base_model:`, the
15
+ `Base-…` badge, the Architecture line, the sibling paragraph,
16
+ the Quick-start path C, the Local-apps table, the Vision
17
+ section, the Related-models table, the Credits, and the
18
+ Known-limitations section all flipped back to the pre-swap
19
+ Qwen-only framing. `heretic` / `uncensored` tags removed
20
+ (`imatrix` stays — the bundled blob is still iMatrix-quantized
21
+ regardless of which base is described). `base_model_relation:
22
+ finetune` removed; this is a packaging wrapper, not a finetune.
23
+ - **Tooling flipped back to unsloth's GGUF mirror.**
24
+ `scripts/build.sh` `REPO_ID` back to `unsloth/Qwen3.6-27B-GGUF`
25
+ with filename pattern `Qwen3.6-27B-${QUANT}.gguf`; quant list
26
+ back to the unsloth catalog (Q3_K_S restored as the smallest
27
+ practical quant). `scripts/fetch_vision.sh` defaults back to
28
+ `PRECISION=F16` and `mmproj-F16.gguf` from unsloth. Modelfile
29
+ preamble flipped. `examples/transformers_quickstart.py`
30
+ `MODEL_ID` back to `Qwen/Qwen3.6-27B`. `examples/README.md` and
31
+ `examples/llama_cpp_vision.py` recipes flipped. `CITATION.cff`
32
+ title, abstract, references, and keywords flipped. `Makefile`
33
+ help-text + `build` docstring flipped. `.gitignore` comments
34
+ + whitelist + rebadge-artifact glob flipped.
35
+ - **`banner.svg`** subtitle reverted `Dense 27B · Opus 4.7
36
+ distilled · uncensored` → `Qwen 3.6 · Dense 27B · Opus 4.7
37
+ distilled`. `THANATOS-27B-HERETIC` wordmark **kept** — the
38
+ project name string and HF repo URL are preserved per explicit
39
+ choice ("undo base only, keep name"). `banner.png`
40
+ re-rasterized at 2× via rsvg-convert.
41
+ - **Project name string `Thanatos-27B-Heretic` and Ollama tag
42
+ `thanatos-27b-heretic` retained** across all files. HF repo
43
+ URL stays at `FoolDev/Thanatos-27B-Heretic`; git remote
44
+ unchanged. A "Note on the name" callout added to the README
45
+ tagline explaining the name-vs-base mismatch so users aren't
46
+ surprised.
47
+ - **Bundled blob unchanged** (`Thanatos-27B.Q4_K_M.gguf` LFS
48
+ pointer SHA `5ed60d0a...`). It was always the legacy unsloth
49
+ Qwen Q4_K_M quant; with the base reverted, the blob and the
50
+ declared base are now consistent again. The "Bundled blob
51
+ status" callout in TL;DR removed since it no longer applies.
52
+ - **HF repo migration:** the HF repo at
53
+ `FoolDev/Thanatos-27B-Heretic` keeps its current name (the
54
+ user's earlier rename via HF UI stands). If you want to also
55
+ rename the HF repo back to `FoolDev/Thanatos-27B`, that's a
56
+ separate HF UI action — HF will serve a 307 redirect from the
57
+ new name to the old once renamed.
58
+
59
  ### Changed (acknowledge HF's `imatrix` auto-tag in frontmatter)
60
  - **Added `imatrix` to the README `tags:` list.** HF's tag
61
  auto-detector was surfacing `imatrix` on the rendered model
CITATION.cff CHANGED
@@ -1,5 +1,5 @@
1
  cff-version: 1.2.0
2
- title: "Thanatos-27B-Heretic: A Dense Distillation Wrapper for llmfan46's Qwen 3.6 27B Uncensored Heretic v2"
3
  message: "If you use this model card or its accompanying files, please cite as below."
4
  type: software
5
  authors:
@@ -8,15 +8,17 @@ authors:
8
  repository-code: "https://huggingface.co/FoolDev/Thanatos-27B-Heretic"
9
  url: "https://huggingface.co/FoolDev/Thanatos-27B-Heretic"
10
  abstract: >-
11
- Thanatos-27B-Heretic is a personal repackaging of llmfan46's uncensored
12
- Heretic v2 finetune of Qwen 3.6 27B (dense), with Claude Opus 4.7 in
13
- the reasoning teacher slot. The repository ships an Ollama Modelfile,
14
- sampling defaults, usage examples, and a single ready-to-run GGUF
15
- (Q4_K_M ~17 GB) so the HF "Use this model" widget surfaces a one-liner
16
- Ollama snippet. Other quants (Q3_K_M, Q5_K_M, Q6_K, etc.) and the
17
- Heretic safetensors are pulled from upstream
18
- (llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF and the matching
19
- non-GGUF repo) on demand rather than redistributed.
 
 
20
  keywords:
21
  - qwen
22
  - qwen3.6
@@ -24,17 +26,10 @@ keywords:
24
  - distillation
25
  - reasoning
26
  - llm
27
- - heretic
28
- - uncensored
29
  license: Apache-2.0
30
  references:
31
  - type: software
32
- title: "Qwen3.6-27B-uncensored-heretic-v2 (immediate base)"
33
- authors:
34
- - name: llmfan46
35
- url: "https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2"
36
- - type: software
37
- title: "Qwen3.6-27B (upstream base)"
38
  authors:
39
  - name: Alibaba Qwen Team
40
  url: "https://huggingface.co/Qwen/Qwen3.6-27B"
 
1
  cff-version: 1.2.0
2
+ title: "Thanatos-27B-Heretic: A Dense Distillation Wrapper for Qwen 3.6 27B"
3
  message: "If you use this model card or its accompanying files, please cite as below."
4
  type: software
5
  authors:
 
8
  repository-code: "https://huggingface.co/FoolDev/Thanatos-27B-Heretic"
9
  url: "https://huggingface.co/FoolDev/Thanatos-27B-Heretic"
10
  abstract: >-
11
+ Thanatos-27B-Heretic is a personal repackaging of the dense Qwen 3.6 27B base
12
+ model with Claude Opus 4.7 in the reasoning teacher slot. The
13
+ repository ships an Ollama Modelfile, sampling defaults, usage
14
+ examples, and a single ready-to-run GGUF (Q4_K_M ~17 GB) so the HF
15
+ "Use this model" widget surfaces a one-liner Ollama snippet. Other
16
+ quants (Q3_K_S, Q5_K_M, Q6_K, etc.) and the upstream safetensors
17
+ (Qwen/Qwen3.6-27B) are pulled from upstream
18
+ (unsloth/Qwen3.6-27B-GGUF) on demand rather than redistributed.
19
+ (The repo carries the `-Heretic` suffix from a prior swap to
20
+ llmfan46/Qwen3.6-27B-uncensored-heretic-v2 that was reverted;
21
+ current base is vanilla Qwen 3.6 27B.)
22
  keywords:
23
  - qwen
24
  - qwen3.6
 
26
  - distillation
27
  - reasoning
28
  - llm
 
 
29
  license: Apache-2.0
30
  references:
31
  - type: software
32
+ title: "Qwen3.6-27B"
 
 
 
 
 
33
  authors:
34
  - name: Alibaba Qwen Team
35
  url: "https://huggingface.co/Qwen/Qwen3.6-27B"
Makefile CHANGED
@@ -10,9 +10,9 @@
10
  # MODEL model tag for smoke (default: $(TAG))
11
  #
12
  # Examples:
13
- # make build # Q4_K_M from llmfan46 Heretic v2 GGUF (qwen35-stamped, loads today)
14
- # make build QUANT=Q3_K_M # smaller quant (Heretic repo has no Q3_K_S)
15
- # make build GGUF_PATH=~/models/Qwen3.6-27B-uncensored-heretic-v2-Q4_K_M.gguf
16
  # make load-bundle # this repo's bundled GGUF -> local Ollama tag (smudge LFS if needed)
17
  # make smoke
18
  # make check
@@ -37,7 +37,7 @@ ifdef GGUF_PATH
37
  @echo " GGUF_PATH=$(GGUF_PATH)"
38
  endif
39
 
40
- build: ## Download qwen35-stamped Heretic v2 GGUF from llmfan46 and run 'ollama create' (loads today).
41
  GGUF_PATH=$(GGUF_PATH) TAG=$(TAG) ./scripts/build.sh $(QUANT)
42
 
43
  load-bundle: ## Load THIS repo's bundled GGUF into a local Ollama tag (smudge LFS + ollama create).
 
10
  # MODEL model tag for smoke (default: $(TAG))
11
  #
12
  # Examples:
13
+ # make build # Q4_K_M from unsloth (qwen35-stamped, loads today)
14
+ # make build QUANT=Q3_K_S # smaller quant
15
+ # make build GGUF_PATH=~/models/Qwen3.6-27B-Q4_K_M.gguf
16
  # make load-bundle # this repo's bundled GGUF -> local Ollama tag (smudge LFS if needed)
17
  # make smoke
18
  # make check
 
37
  @echo " GGUF_PATH=$(GGUF_PATH)"
38
  endif
39
 
40
+ build: ## Download qwen35-stamped GGUF from unsloth and run 'ollama create' (loads today).
41
  GGUF_PATH=$(GGUF_PATH) TAG=$(TAG) ./scripts/build.sh $(QUANT)
42
 
43
  load-bundle: ## Load THIS repo's bundled GGUF into a local Ollama tag (smudge LFS + ollama create).
Modelfile CHANGED
@@ -16,16 +16,15 @@
16
  # `e03e10e` after the 4th qwen36 round trip had its friction
17
  # re-tested in a fresh next-day session).
18
  #
19
- # For other quants (Q3_K_M, Q5_K_M, Q6_K, etc.), `make build QUANT=Q3_K_M`
20
- # downloads the chosen quant from llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF
21
- # (filename pattern Qwen3.6-27B-uncensored-heretic-v2-${QUANT}.gguf) and
22
- # patches FROM in a temp Modelfile copy. Note: no Q3_K_S in this repo;
23
- # use Q3_K_M for the smallest practical quant.
24
  #
25
  # Other GGUF sources (use with `make build GGUF_PATH=...`):
26
- # https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF # primary (this repo's default)
27
- # https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-GGUF # MTP head preserved
28
- # https://huggingface.co/unsloth/Qwen3.6-27B-GGUF # vanilla Qwen 3.6 (pre-Heretic)
29
 
30
  FROM ./Thanatos-27B.Q4_K_M.gguf
31
 
 
16
  # `e03e10e` after the 4th qwen36 round trip had its friction
17
  # re-tested in a fresh next-day session).
18
  #
19
+ # For other quants (Q3_K_S, Q5_K_M, Q6_K, etc.), `make build QUANT=Q3_K_S`
20
+ # downloads the chosen quant from unsloth/Qwen3.6-27B-GGUF and patches
21
+ # FROM in a temp Modelfile copy. The Q3_K_S used to ship in this repo;
22
+ # it was removed so HF's Ollama bridge picks Q4_K_M as the default
23
+ # `:latest` tag instead of Q3_K_S (alphabetically-first heuristic).
24
  #
25
  # Other GGUF sources (use with `make build GGUF_PATH=...`):
26
+ # https://huggingface.co/unsloth/Qwen3.6-27B-GGUF
27
+ # https://huggingface.co/rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled-GGUF
 
28
 
29
  FROM ./Thanatos-27B.Q4_K_M.gguf
30
 
README.md CHANGED
@@ -1,8 +1,7 @@
1
  ---
2
  license: apache-2.0
3
  base_model:
4
- - llmfan46/Qwen3.6-27B-uncensored-heretic-v2
5
- base_model_relation: finetune
6
  datasets:
7
  - crownelius/Creative_Writing_ShareGPT_Enhanced
8
  - microsoft/rStar-Coder
@@ -41,8 +40,6 @@ tags:
41
  - agent
42
  - gguf
43
  - ollama
44
- - heretic
45
- - uncensored
46
  - imatrix
47
  library_name: transformers
48
  pipeline_tag: image-text-to-text
@@ -51,19 +48,24 @@ pipeline_tag: image-text-to-text
51
  <img src="https://huggingface.co/FoolDev/Thanatos-27B-Heretic/resolve/main/banner.svg" alt="Thanatos-27B-Heretic banner" width="100%" />
52
 
53
  [![License](https://img.shields.io/badge/License-Apache_2.0-7aa2f7?style=flat&labelColor=1a1b26)](https://opensource.org/licenses/Apache-2.0)
54
- [![Base Model](https://img.shields.io/badge/Base-Heretic_v2-bb9af7?style=flat&labelColor=1a1b26)](https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2)
55
  [![Architecture](https://img.shields.io/badge/Arch-Dense_27B-ff9e64?style=flat&labelColor=1a1b26)](#architecture)
56
  [![Sibling](https://img.shields.io/badge/Sibling-Janus--35B-7dcfff?style=flat&labelColor=1a1b26)](https://huggingface.co/FoolDev/Janus-35B)
57
  [![Buy me a coffee](https://img.shields.io/badge/%E2%98%95%20Buy_me_a_coffee-e0af68?style=flat&logo=buymeacoffee&logoColor=1a1b26&labelColor=1a1b26)](https://buymeacoffee.com/cardoffoolm)
58
 
59
  # Thanatos-27B-Heretic
60
 
61
- > **Dense Reasoning. Friendlier Footprint. Uncensored.**
62
- > *llmfan46's Heretic v2 abliteration of Qwen 3.6 27B (dense), repackaged with Claude Opus 4.7 in the teacher slot.*
63
 
64
- **`Architecture:`** `Qwen 3.6 27B (Dense)` | **`Parameters:`** `27B` | **`Base:`** `Heretic v2 (llmfan46)` | **`Teacher:`** `Claude Opus 4.7` | **`Type:`** `Distilled + Abliterated LLM`
65
 
66
- A personal sibling to [`FoolDev/Janus-35B`](https://huggingface.co/FoolDev/Janus-35B). Same teacher (Claude Opus 4.7), same dataset family, but built on [`llmfan46/Qwen3.6-27B-uncensored-heretic-v2`](https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2) — an uncensored Heretic-style abliteration of the dense [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) base instead of the 35B-A3B MoE. Smaller, easier to deploy, no expert-routing surprises, and refusal-trained behavior is dialed back at the base layer.
 
 
 
 
 
67
 
68
  ## TL;DR
69
 
@@ -76,25 +78,14 @@ template — HF's Ollama bridge ingests those three files, not
76
  ollama run hf.co/FoolDev/Thanatos-27B-Heretic # ~17 GB Q4_K_M, qwen35-stamped, loads on stock Ollama
77
  ```
78
 
79
- > **Bundled blob status:** the GGUF currently bundled in this repo
80
- > is the legacy pre-Heretic Qwen 3.6 27B Q4_K_M quant from before
81
- > the rename. Behaves identically to vanilla Qwen 3.6 27B for now;
82
- > the Heretic v2 rebundle (from
83
- > `llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF`) is pending —
84
- > see the top entry of [CHANGELOG](CHANGELOG.md). If you want the
85
- > Heretic behavior today, use the local-build path below
86
- > (`make build`), which pulls the Heretic GGUF directly.
87
-
88
  If you pulled the bundle during any of the qwen36 windows on the
89
  pre-rename `FoolDev/Thanatos-27B` repo (2026-05-19/20) and still
90
  have a qwen36-stamped blob in your local Ollama store, `make
91
- heal-hf` rebadges it in place. Fresh pulls of the new
92
- `Thanatos-27B-Heretic` repo go straight through.
93
 
94
- For other quants (Q3_K_M ~13 GB, Q5_K_M ~19 GB, etc.), `make build
95
  QUANT=...` is the simplest path. See [Quick start](#quick-start)
96
- below for the full matrix. Note: no Q3_K_S in the Heretic GGUF
97
- repo — use Q3_K_M for the smallest practical quant.
98
 
99
  For image input use llama.cpp directly — Ollama vision is broken for
100
  this architecture upstream (see [Vision](#vision)).
@@ -103,7 +94,7 @@ this architecture upstream (see [Vision](#vision)).
103
 
104
  The 35B-A3B is a sparse mixture-of-experts model: 35B parameters total but only ~3B active per token. That makes it fast at inference but **memory-hungry at load time** — the full 35B has to live in VRAM/RAM even though only 3B is doing useful work each step.
105
 
106
- The 27B is **dense**: every parameter participates in every forward pass. It's slower per token than 35B-A3B — on a Ryzen AI Max+ 395 / Radeon 8060S iGPU the dense 27B at Q3_K_S clocks ~10 tok/s, versus ~27 tok/s for the MoE 35B at ~Q4 (`make bench`, 3-prompt mix, measured against the pre-rename Qwen 3.6 bundle; Heretic v2 inherits the same architecture so per-step cost should match) — but the working set fits comfortably on commodity GPUs and avoids the MoE-specific load-balance failure modes.
107
 
108
  | | Thanatos-27B-Heretic (this) | [Janus-35B](https://huggingface.co/FoolDev/Janus-35B) |
109
  |---|---|---|
@@ -113,7 +104,7 @@ The 27B is **dense**: every parameter participates in every forward pass. It's s
113
  | Layers | 64 | 40 |
114
  | Hidden size | 5120 | 2048 |
115
  | Q4_K_M GGUF size | ~17 GB (bundled) | ~19 GB (bundled) |
116
- | Q3_K_M GGUF size | ~13 GB (build locally via `make build QUANT=Q3_K_M`) | n/a |
117
  | Min host memory @ Q4 / 8K ctx | ~22 GB | ~38 GB |
118
  | Multimodal (text path) | Yes | Yes |
119
  | Multimodal (vision via Ollama) | Broken upstream — see below | Broken upstream |
@@ -126,15 +117,15 @@ The 27B is **dense**: every parameter participates in every forward pass. It's s
126
  |---|---|
127
  | `banner.svg` / `banner.png` | Repo header, Tokyo Night themed |
128
  | `dense-flow.svg` / `dense-flow.png` | Architecture diagram: 64-layer hybrid attention stack with animated forward-pass pulse (SVG); static frame fallback (PNG) |
129
- | `Modelfile` | Ollama wrapper around the bundled GGUF (currently the legacy pre-Heretic Qwen 3.6 27B Q4_K_M; Heretic v2 rebundle pending) — used by `make build` / `ollama create` for **local** builds |
130
  | `template`, `system`, `params` | Used by HF's Ollama bridge when users `ollama run hf.co/FoolDev/Thanatos-27B-Heretic` directly (the bridge does **not** read `Modelfile` — see [HF Ollama docs](https://huggingface.co/docs/hub/en/ollama)). Mirrors the `Modelfile`'s template / system prompt / sampling params. |
131
  | `examples/` | Ready-to-run Python clients for Ollama, Transformers, and llama-cpp-python |
132
- | `scripts/build.sh` | Pulls a qwen35-stamped GGUF from `llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF` and runs `ollama create` (loads on today's llama.cpp / Ollama; see `make build`). This is the path that gets you actual Heretic behavior until the bundled blob is rebundled. |
133
  | `scripts/load_bundle.sh` | One-shot path from *this repo's* bundle → loadable local Ollama tag (smudges LFS pointer via `hf download` if needed, runs `ollama create`; see `make load-bundle`). Carries a qwen36 → qwen35 rebadge branch for legacy pre-rename checkouts — no-op on the current qwen35-stamped bundle. |
134
- | `scripts/heal_hf_pull.sh` | Legacy recovery for users migrating from the pre-rename `FoolDev/Thanatos-27B` repo who still have a qwen36-stamped blob in their local Ollama store: rebadges the blob qwen36 → qwen35 and rewrites the manifest's model-layer digest so the same tag becomes loadable in place. See `make heal-hf`. Idempotent and a no-op on tags already on qwen35 — fresh pulls of `Thanatos-27B-Heretic` don't need it. |
135
  | `scripts/smoke_test.sh` | Verifies an Ollama daemon + model, runs a round-trip, asserts no chat-template tokens leak into the response. With `TOOLS_TEST=1`, also exercises an end-to-end tool-call round-trip and checks the response shape |
136
  | `scripts/bench.sh` | Measures real tok/s using Ollama's `eval_count` / `eval_duration` metadata over a 3-prompt mix (run `make bench`) |
137
- | `scripts/fetch_vision.sh` | Pulls the vision projector (`Qwen3.6-27B-mmproj-BF16.gguf` from the Heretic GGUF repo, or `mmproj-F16.gguf` from the unsloth reference projector) for llama.cpp (Ollama vision is broken upstream — see [Vision](#vision)). Renamed from `fetch_mmproj.sh` because HF's Ollama bridge auto-indexed the script as a vision projector layer (filename pattern match). |
138
  | `scripts/check.sh` | Local lint: `bash -n`, `pyflakes`, `py_compile`, footgun-grep, plus `Modelfile`-vs-bridge-files sync check |
139
  | `scripts/check_bridge_sync.py` | Verifies the `Modelfile` `TEMPLATE` / `SYSTEM` / `PARAMETER` directives stay in sync with the root-level `template` / `system` / `params` files. Run as part of `make check`; called from the pre-commit hook. |
140
  | `scripts/verify_arch.py` | Cross-checks the README "Architecture" forward-pass bullets (layer count, head counts, hidden / FFN dims, RoPE factor, SSM dims, vocab, context) against the actual GGUF metadata keys. Run as `make verify-arch`. Handles both `qwen35`- and `qwen36`-stamped bundles; exit non-zero if any value mismatches. Not part of `make check` because it loads the 17 GB GGUF (LFS smudge required); run on demand. |
@@ -144,17 +135,16 @@ The 27B is **dense**: every parameter participates in every forward pass. It's s
144
  | `CHANGELOG.md` | Versioned tooling/docs changes |
145
  | `README.md` | This file |
146
 
147
- For 16 GB GPUs / unified-memory laptops, `make build QUANT=Q3_K_M`
148
- downloads the smaller ~13 GB Q3_K_M quant from
149
- `llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF` (qwen35-stamped,
150
- loads directly) and creates a local `thanatos-27b-heretic` Ollama
151
- tag. Does not redistribute via this repo. For other quants use
152
- `make build QUANT=...`. The local-build path applies this repo's
153
- `Modelfile`; the `hf.co/...` path applies the root-level
154
- `template`, `system`, and `params` files (kept in sync with the
155
- `Modelfile`).
156
 
157
- If you want the Heretic safetensors for `transformers`, fetch them from [`llmfan46/Qwen3.6-27B-uncensored-heretic-v2`](https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2). For the vanilla pre-Heretic Qwen 3.6 27B base, use [`Qwen/Qwen3.6-27B`](https://huggingface.co/Qwen/Qwen3.6-27B).
158
 
159
  ## Architecture
160
 
@@ -170,30 +160,23 @@ If you want the Heretic safetensors for `transformers`, fetch them from [`llmfan
170
  - Vocab 248,320 (shared with 35B-A3B sibling)
171
  - 262 144 native context, extensible to ~1 M with YaRN
172
  - Vision + video supported by the **base architecture** via a separate
173
- `mmproj` projector (not redistributed here; pull
174
- `Qwen3.6-27B-mmproj-BF16.gguf` from
175
- `llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF`, or
176
- `mmproj-F16.gguf` from `unsloth/Qwen3.6-27B-GGUF` as a reference
177
- alternative). See [Vision](#vision) below for current loader
178
- compatibility.
179
  - Multi-token prediction (MTP) head trained for speculative decoding —
180
  present in the upstream `Qwen/Qwen3.6-27B` safetensors and usable via
181
  vLLM (`qwen3_next_mtp`) or SGLang (`--speculative-algo NEXTN`).
182
  **Not usable via llama.cpp / Ollama today**: the GGUF converter
183
  (`convert_hf_to_gguf.py`) explicitly skips MTP tensors for the
184
  `qwen35` / `qwen35moe` arch family ("MTP tensors are not used at
185
- inference yet"), so the standard GGUFs (this bundle, unsloth's,
186
- llmfan46's Heretic v2) ship with 851 tensors and no MTP head.
187
- llmfan46 also publishes a separate
188
- `Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-GGUF` repo
189
- that keeps the MTP tensors for vLLM/SGLang users who want both
190
- Heretic v2 + MTP. llama.cpp's MTP support (PR #22673, merged
191
- 2026-05-16) currently covers other architectures only; tracking
192
- that PR's follow-up work for when qwen35 / qwen35moe consumer
193
- support lands. (Earlier README versions claimed MTP was available
194
- via llama.cpp without this caveat — confirmed empirically via
195
- `gguf.GGUFReader` on both this bundle and
196
- `unsloth/Qwen3.6-27B-GGUF`, 2026-05-19.)
197
 
198
  **The bundled GGUF declares `general.architecture: 'qwen35'`** — not a
199
  workaround for an unimplemented `qwen36` arch, but the canonical
@@ -209,11 +192,9 @@ stack:
209
  exists in `transformers`; Qwen reuses the 3.5 class names.
210
  - **llama.cpp's converter.** `convert_hf_to_gguf.py` registers
211
  `Qwen3_5ForCausalLM` → `MODEL_ARCH.QWEN35` and
212
- `Qwen3_5MoeForCausalLM` → `MODEL_ARCH.QWEN35MOE`. The Heretic
213
- GGUFs this repo pulls from
214
- (`llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF`) inherit those
215
- stamps, as do the upstream unsloth GGUFs (`unsloth/Qwen3.6-27B-GGUF`,
216
- `unsloth/Qwen3.6-35B-A3B-GGUF`).
217
  - **llama.cpp's model code.** `src/models/qwen35.cpp` has an
218
  explicit `case 64: type = LLM_TYPE_27B` branch for this model;
219
  `qwen35moe.cpp` has `case 40: type = LLM_TYPE_35B_A3B` for the
@@ -307,14 +288,12 @@ ollama run hf.co/FoolDev/Thanatos-27B-Heretic # 17 GB Q4_K_M, qwen35-s
307
  make load-bundle # creates local tag thanatos-27b-heretic
308
  ollama run thanatos-27b-heretic
309
 
310
- # C. Bypass the bundle: download a qwen35-stamped Heretic v2 GGUF
311
- # from llmfan46 and build locally. Loads on every current
312
- # llama.cpp / Ollama. This is the path that gets you actual
313
- # Heretic behavior until the bundled blob is rebundled.
314
  make build # Q4_K_M -> thanatos-27b-heretic
315
- make build QUANT=Q3_K_M # 13 GB smaller quant
316
- make build QUANT=Q5_K_M # 19 GB higher quality
317
- make build GGUF_PATH=~/models/Qwen3.6-27B-uncensored-heretic-v2-Q4_K_M.gguf # skip download
318
  ollama run thanatos-27b-heretic
319
  ```
320
 
@@ -338,10 +317,10 @@ python examples/ollama_chat.py # full demo: chat, streaming, tools, OpenAI-
338
 
339
  | App | How to load this model |
340
  |---|---|
341
- | **Ollama** | `ollama run hf.co/FoolDev/Thanatos-27B-Heretic` (default Q4_K_M). Pulls the GGUF + the root-level `template` / `system` / `params` files in one step (HF's Ollama bridge ingests these three files; it does **not** read `Modelfile`). For other quants, `make build QUANT=Q3_K_M` downloads from `llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF` and creates a local Ollama tag using the `Modelfile`, which is kept in sync with the bridge files. |
342
- | **LM Studio** | Search → `FoolDev/Thanatos-27B-Heretic` → pick `Thanatos-27B.Q4_K_M.gguf` (current bundled filename; will become `Thanatos-27B-Heretic.Q4_K_M.gguf` after the rebundle). Uses the GGUF's embedded jinja chat template (Qwen 3.6 ChatML); set the system prompt manually from the `SYSTEM` block in this repo's `Modelfile`. |
343
  | **Jan** | Hub → "Import from Hugging Face" → `FoolDev/Thanatos-27B-Heretic`. Same template behavior as LM Studio. |
344
- | **llama.cpp** | `hf download FoolDev/Thanatos-27B-Heretic Thanatos-27B.Q4_K_M.gguf --local-dir .` then `llama-server -m Thanatos-27B.Q4_K_M.gguf` (or `llama-cli`, `llama-mtmd-cli` for vision via `Qwen3.6-27B-mmproj-BF16.gguf` from the Heretic GGUF repo). |
345
  | **llama-cpp-python** | See `examples/llama_cpp_quickstart.py` (text) and `examples/llama_cpp_vision.py` (image input). |
346
  | **Open WebUI / KoboldCpp / text-generation-webui** | Standard llama.cpp loader path — point at the GGUF, use the embedded chat template. |
347
 
@@ -397,21 +376,17 @@ Behavior rules:
397
 
398
  ## Vision
399
 
400
- The Qwen 3.6 base (and llmfan46's Heretic v2 finetune of it) supports
401
- image (and video) input via a separate `mmproj` projector. The full
402
- multimodal stack is:
403
 
404
  ```
405
- Qwen3.6-27B-uncensored-heretic-v2-Q4_K_M.gguf (~17 GB, the text decoder)
406
- Qwen3.6-27B-mmproj-BF16.gguf (~931 MB, the vision projector)
407
  ```
408
 
409
  Both files are at
410
- [`llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF`](https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF).
411
- For the vanilla pre-Heretic projector, see
412
- [`unsloth/Qwen3.6-27B-GGUF`](https://huggingface.co/unsloth/Qwen3.6-27B-GGUF)
413
- (`mmproj-F16.gguf`, ~927 MB). This repo intentionally does not
414
- redistribute either.
415
 
416
  ### Loader compatibility — the honest table
417
 
@@ -429,11 +404,10 @@ Three flavors, in order of build-time effort:
429
  ```bash
430
  # A. HTTP via llama-server (always built — the easiest path).
431
  # Reconfirmed working 2026-05-19 against llama.cpp 389ff61 + Vulkan
432
- # on a Ryzen AI Max+ 395 / Radeon 8060S iGPU (pre-Heretic Qwen 3.6
433
- # bundle; Heretic v2 shares the architecture so the recipe carries).
434
  llama-server \
435
- -m Qwen3.6-27B-uncensored-heretic-v2-Q4_K_M.gguf \
436
- --mmproj Qwen3.6-27B-mmproj-BF16.gguf \
437
  --host 127.0.0.1 --port 8765 -c 8192 -ngl 99
438
  # then POST OpenAI-style chat completions with an image_url content
439
  # block — e.g. {"type":"image_url","image_url":{"url":"data:image/jpeg;base64,..."}}
@@ -446,15 +420,15 @@ llama-server \
446
  # produce it — a plain `cmake --build build` will. If yours didn't,
447
  # run `cmake --build build --target llama-mtmd-cli`.
448
  llama-mtmd-cli \
449
- -m Qwen3.6-27B-uncensored-heretic-v2-Q4_K_M.gguf \
450
- --mmproj Qwen3.6-27B-mmproj-BF16.gguf \
451
  --image photo.jpg \
452
  -p "Describe this image."
453
 
454
  # C. Python via llama-cpp-python:
455
  python examples/llama_cpp_vision.py \
456
- --gguf /path/to/Qwen3.6-27B-uncensored-heretic-v2-Q4_K_M.gguf \
457
- --mmproj /path/to/Qwen3.6-27B-mmproj-BF16.gguf \
458
  --image /path/to/photo.jpg \
459
  --prompt "What is in this image?"
460
  ```
@@ -472,22 +446,19 @@ The dense 27B is the lighter sibling to Janus-35B and the easier of the two to d
472
  | RTX 3090 / 4090 24 GB | Works, full Q4 offload, ~25-40 tok/s |
473
  | RTX 5090 32 GB | Works, full offload at higher quant (Q5/Q6), ~30-50 tok/s |
474
  | Mac Studio M2/M3 32 GB+ unified | Works, ~15-25 tok/s |
475
- | 32 GB unified-memory laptops (Mac M-series, Ryzen AI Max+, etc.) | Borderline at Q4. `make build QUANT=Q3_K_M` (~13 GB) and trim `num_ctx` for headroom. |
476
 
477
  Most numbers in this table are estimates from comparable models; the
478
  gradient is right but the absolute values will move ±20% with prompt
479
  shape, KV cache type, and parallel-request count. Measure your own
480
  machine with `make bench` (3-prompt mix, reports tok/s from Ollama's
481
  `eval_count` / `eval_duration` so it's not stopwatch-noisy). Reference
482
- data points on a Ryzen AI Max+ 395 / Radeon 8060S iGPU under Vulkan
483
- (measured against the pre-rename Qwen 3.6 bundle; Heretic v2 inherits
484
- the architecture so per-step cost should match within bench noise):
485
  **~12.3 tok/s at Q3_K_S** and **~9.3 tok/s at Q4_K_M** (3-prompt mix,
486
  steady across short / medium / long prompts), sitting between CPU-only
487
  and a 24 GB discrete card as expected. An earlier ROCm snapshot of the
488
  same Q3_K_S bench gave ~10.1 tok/s — Vulkan was the clear winner on
489
- this hardware. (Heretic v2 publishes Q3_K_M rather than Q3_K_S; the
490
- ~13 GB Q3_K_M should sit within 5% of the ~12 GB Q3_K_S numbers.)
491
 
492
  ## Chat template
493
 
@@ -588,25 +559,19 @@ python examples/ollama_chat.py # section 3 runs a real round-trip
588
  - **No mmproj in this release**, and **vision via Ollama is broken upstream** (the qwen35/qwen35moe arch entries are present in Ollama's Go engine but missing from the C++ llama.cpp fallback Ollama uses when mmproj is attached — see the [Vision](#vision) section). For image input use llama.cpp directly until that's fixed.
589
  - **Q4_K_M quality loss** is real. Use Q5_K_M or Q6_K if you have the VRAM (~20-22 GB).
590
  - **No formal evaluation in this card.** Numbers above are estimates.
591
- - **Bundled blob is pre-Heretic.** The currently-bundled `Thanatos-27B.Q4_K_M.gguf` blob is the legacy Qwen 3.6 27B Q4_K_M quant from before the rename — it behaves like vanilla Qwen 3.6, not Heretic v2. Use `make build` (which pulls the Heretic GGUF from llmfan46) until the rebundle ships.
592
- - **Uncensored base.** The Heretic v2 abliteration dials back the refusal-training of upstream Qwen 3.6. Outputs may be more compliant with sensitive requests than the vanilla base; the Thanatos system prompt still steers behavior, but the safety floor is lower. Apply your own filtering for user-facing deployments.
593
 
594
  ## Related models
595
 
596
  | Model | Notes |
597
  |---|---|
598
- | [llmfan46/Qwen3.6-27B-uncensored-heretic-v2](https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2) | **Immediate base**, safetensors |
599
- | [llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF](https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF) | Recommended GGUF source (what `make build` pulls from) |
600
- | [llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved](https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved) | Same Heretic v2 but keeps the MTP head for vLLM / SGLang speculative decoding |
601
- | [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) | Upstream pre-Heretic base, safetensors |
602
- | [unsloth/Qwen3.6-27B-GGUF](https://huggingface.co/unsloth/Qwen3.6-27B-GGUF) | Pre-Heretic GGUF mirror + reference `mmproj-F16.gguf` projector |
603
  | [FoolDev/Janus-35B](https://huggingface.co/FoolDev/Janus-35B) | 35B-A3B MoE sibling. More capacity, more memory pressure. |
604
  | [Crownelius/Crow-9B-HERETIC-4.6](https://huggingface.co/Crownelius/Crow-9B-HERETIC-4.6) | 9B starter model when 27B/35B is too heavy |
605
 
606
  ## Credits
607
 
608
- - Immediate base: [llmfan46/Qwen3.6-27B-uncensored-heretic-v2](https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2) — Heretic-style abliteration of Qwen 3.6 27B
609
- - Upstream base: [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) (Alibaba)
610
  - Reasoning teacher: Claude Opus 4.7 (Anthropic)
611
  - Distillation lineage and dataset curation: [Crownelius](https://huggingface.co/Crownelius)
612
 
 
1
  ---
2
  license: apache-2.0
3
  base_model:
4
+ - Qwen/Qwen3.6-27B
 
5
  datasets:
6
  - crownelius/Creative_Writing_ShareGPT_Enhanced
7
  - microsoft/rStar-Coder
 
40
  - agent
41
  - gguf
42
  - ollama
 
 
43
  - imatrix
44
  library_name: transformers
45
  pipeline_tag: image-text-to-text
 
48
  <img src="https://huggingface.co/FoolDev/Thanatos-27B-Heretic/resolve/main/banner.svg" alt="Thanatos-27B-Heretic banner" width="100%" />
49
 
50
  [![License](https://img.shields.io/badge/License-Apache_2.0-7aa2f7?style=flat&labelColor=1a1b26)](https://opensource.org/licenses/Apache-2.0)
51
+ [![Base Model](https://img.shields.io/badge/Base-Qwen3.6--27B-bb9af7?style=flat&labelColor=1a1b26)](https://huggingface.co/Qwen/Qwen3.6-27B)
52
  [![Architecture](https://img.shields.io/badge/Arch-Dense_27B-ff9e64?style=flat&labelColor=1a1b26)](#architecture)
53
  [![Sibling](https://img.shields.io/badge/Sibling-Janus--35B-7dcfff?style=flat&labelColor=1a1b26)](https://huggingface.co/FoolDev/Janus-35B)
54
  [![Buy me a coffee](https://img.shields.io/badge/%E2%98%95%20Buy_me_a_coffee-e0af68?style=flat&logo=buymeacoffee&logoColor=1a1b26&labelColor=1a1b26)](https://buymeacoffee.com/cardoffoolm)
55
 
56
  # Thanatos-27B-Heretic
57
 
58
+ > **Dense Reasoning. Friendlier Footprint.**
59
+ > *Qwen 3.6 27B (dense) repackaged with Claude Opus 4.7 in the teacher slot.*
60
 
61
+ **`Architecture:`** `Qwen 3.6 27B (Dense)` | **`Parameters:`** `27B` | **`Teacher:`** `Claude Opus 4.7` | **`Type:`** `Distilled LLM`
62
 
63
+ A personal sibling to [`FoolDev/Janus-35B`](https://huggingface.co/FoolDev/Janus-35B). Same teacher (Claude Opus 4.7), same dataset family, but built on the **dense** [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) base instead of the 35B-A3B MoE. Smaller, easier to deploy, no expert-routing surprises.
64
+
65
+ > **Note on the name.** The repo carries the `-Heretic` suffix from a
66
+ > prior swap to `llmfan46/Qwen3.6-27B-uncensored-heretic-v2` that was
67
+ > reverted. The current base is the vanilla `Qwen/Qwen3.6-27B`; the
68
+ > name string and HF repo URL are kept for continuity.
69
 
70
  ## TL;DR
71
 
 
78
  ollama run hf.co/FoolDev/Thanatos-27B-Heretic # ~17 GB Q4_K_M, qwen35-stamped, loads on stock Ollama
79
  ```
80
 
 
 
 
 
 
 
 
 
 
81
  If you pulled the bundle during any of the qwen36 windows on the
82
  pre-rename `FoolDev/Thanatos-27B` repo (2026-05-19/20) and still
83
  have a qwen36-stamped blob in your local Ollama store, `make
84
+ heal-hf` rebadges it in place. Fresh pulls go straight through.
 
85
 
86
+ For other quants (Q3_K_S ~12 GB, Q5_K_M ~20 GB, etc.), `make build
87
  QUANT=...` is the simplest path. See [Quick start](#quick-start)
88
+ below for the full matrix.
 
89
 
90
  For image input use llama.cpp directly — Ollama vision is broken for
91
  this architecture upstream (see [Vision](#vision)).
 
94
 
95
  The 35B-A3B is a sparse mixture-of-experts model: 35B parameters total but only ~3B active per token. That makes it fast at inference but **memory-hungry at load time** — the full 35B has to live in VRAM/RAM even though only 3B is doing useful work each step.
96
 
97
+ The 27B is **dense**: every parameter participates in every forward pass. It's slower per token than 35B-A3B — on a Ryzen AI Max+ 395 / Radeon 8060S iGPU the dense 27B at Q3_K_S clocks ~10 tok/s, versus ~27 tok/s for the MoE 35B at ~Q4 (`make bench`, 3-prompt mix) — but the working set fits comfortably on commodity GPUs and avoids the MoE-specific load-balance failure modes.
98
 
99
  | | Thanatos-27B-Heretic (this) | [Janus-35B](https://huggingface.co/FoolDev/Janus-35B) |
100
  |---|---|---|
 
104
  | Layers | 64 | 40 |
105
  | Hidden size | 5120 | 2048 |
106
  | Q4_K_M GGUF size | ~17 GB (bundled) | ~19 GB (bundled) |
107
+ | Q3_K_S GGUF size | ~12 GB (build locally via `make build QUANT=Q3_K_S`) | n/a |
108
  | Min host memory @ Q4 / 8K ctx | ~22 GB | ~38 GB |
109
  | Multimodal (text path) | Yes | Yes |
110
  | Multimodal (vision via Ollama) | Broken upstream — see below | Broken upstream |
 
117
  |---|---|
118
  | `banner.svg` / `banner.png` | Repo header, Tokyo Night themed |
119
  | `dense-flow.svg` / `dense-flow.png` | Architecture diagram: 64-layer hybrid attention stack with animated forward-pass pulse (SVG); static frame fallback (PNG) |
120
+ | `Modelfile` | Ollama wrapper around the bundled Qwen 3.6 27B GGUF — used by `make build` / `ollama create` for **local** builds |
121
  | `template`, `system`, `params` | Used by HF's Ollama bridge when users `ollama run hf.co/FoolDev/Thanatos-27B-Heretic` directly (the bridge does **not** read `Modelfile` — see [HF Ollama docs](https://huggingface.co/docs/hub/en/ollama)). Mirrors the `Modelfile`'s template / system prompt / sampling params. |
122
  | `examples/` | Ready-to-run Python clients for Ollama, Transformers, and llama-cpp-python |
123
+ | `scripts/build.sh` | Pulls a qwen35-stamped GGUF from `unsloth/Qwen3.6-27B-GGUF` and runs `ollama create` (loads on today's llama.cpp / Ollama; see `make build`) |
124
  | `scripts/load_bundle.sh` | One-shot path from *this repo's* bundle → loadable local Ollama tag (smudges LFS pointer via `hf download` if needed, runs `ollama create`; see `make load-bundle`). Carries a qwen36 → qwen35 rebadge branch for legacy pre-rename checkouts — no-op on the current qwen35-stamped bundle. |
125
+ | `scripts/heal_hf_pull.sh` | Legacy recovery for users who pulled `hf.co/FoolDev/Thanatos-27B-Heretic` (or the pre-rename `FoolDev/Thanatos-27B`) *before* the latest qwen35 re-stamp and still have a qwen36-stamped blob in their local Ollama store: rebadges the blob qwen36 → qwen35 and rewrites the manifest's model-layer digest so the same tag becomes loadable in place. See `make heal-hf`. Idempotent and a no-op on tags already on qwen35 — fresh pulls don't need it. |
126
  | `scripts/smoke_test.sh` | Verifies an Ollama daemon + model, runs a round-trip, asserts no chat-template tokens leak into the response. With `TOOLS_TEST=1`, also exercises an end-to-end tool-call round-trip and checks the response shape |
127
  | `scripts/bench.sh` | Measures real tok/s using Ollama's `eval_count` / `eval_duration` metadata over a 3-prompt mix (run `make bench`) |
128
+ | `scripts/fetch_vision.sh` | Pulls the vision projector (`mmproj-F16.gguf`) for llama.cpp (Ollama vision is broken upstream — see [Vision](#vision)). Renamed from `fetch_mmproj.sh` because HF's Ollama bridge auto-indexed the script as a vision projector layer (filename pattern match). |
129
  | `scripts/check.sh` | Local lint: `bash -n`, `pyflakes`, `py_compile`, footgun-grep, plus `Modelfile`-vs-bridge-files sync check |
130
  | `scripts/check_bridge_sync.py` | Verifies the `Modelfile` `TEMPLATE` / `SYSTEM` / `PARAMETER` directives stay in sync with the root-level `template` / `system` / `params` files. Run as part of `make check`; called from the pre-commit hook. |
131
  | `scripts/verify_arch.py` | Cross-checks the README "Architecture" forward-pass bullets (layer count, head counts, hidden / FFN dims, RoPE factor, SSM dims, vocab, context) against the actual GGUF metadata keys. Run as `make verify-arch`. Handles both `qwen35`- and `qwen36`-stamped bundles; exit non-zero if any value mismatches. Not part of `make check` because it loads the 17 GB GGUF (LFS smudge required); run on demand. |
 
135
  | `CHANGELOG.md` | Versioned tooling/docs changes |
136
  | `README.md` | This file |
137
 
138
+ For 16 GB GPUs / unified-memory laptops, `make build QUANT=Q3_K_S`
139
+ downloads the smaller ~12 GB Q3_K_S quant from
140
+ `unsloth/Qwen3.6-27B-GGUF` (qwen35-stamped, loads directly) and
141
+ creates a local `thanatos-27b-heretic` Ollama tag. Does not redistribute
142
+ via this repo. For other quants use `make build QUANT=...`. The
143
+ local-build path applies this repo's `Modelfile`; the `hf.co/...`
144
+ path applies the root-level `template`, `system`, and `params`
145
+ files (kept in sync with the `Modelfile`).
 
146
 
147
+ If you want the safetensors for `transformers`, fetch them from [`Qwen/Qwen3.6-27B`](https://huggingface.co/Qwen/Qwen3.6-27B).
148
 
149
  ## Architecture
150
 
 
160
  - Vocab 248,320 (shared with 35B-A3B sibling)
161
  - 262 144 native context, extensible to ~1 M with YaRN
162
  - Vision + video supported by the **base architecture** via a separate
163
+ `mmproj` projector (not redistributed here; pull `mmproj-F16.gguf`
164
+ from `unsloth/Qwen3.6-27B-GGUF`). See [Vision](#vision) below for
165
+ current loader compatibility.
 
 
 
166
  - Multi-token prediction (MTP) head trained for speculative decoding —
167
  present in the upstream `Qwen/Qwen3.6-27B` safetensors and usable via
168
  vLLM (`qwen3_next_mtp`) or SGLang (`--speculative-algo NEXTN`).
169
  **Not usable via llama.cpp / Ollama today**: the GGUF converter
170
  (`convert_hf_to_gguf.py`) explicitly skips MTP tensors for the
171
  `qwen35` / `qwen35moe` arch family ("MTP tensors are not used at
172
+ inference yet"), so the bundled GGUF and the unsloth GGUFs ship with
173
+ 851 tensors and no MTP head. llama.cpp's MTP support (PR #22673,
174
+ merged 2026-05-16) currently covers other architectures only;
175
+ tracking that PR's follow-up work for when qwen35 / qwen35moe
176
+ consumer support lands. (Earlier README versions claimed MTP was
177
+ available without this caveat confirmed empirically via
178
+ `gguf.GGUFReader` on both this bundle and `unsloth/Qwen3.6-27B-GGUF`,
179
+ 2026-05-19.)
 
 
 
 
180
 
181
  **The bundled GGUF declares `general.architecture: 'qwen35'`** — not a
182
  workaround for an unimplemented `qwen36` arch, but the canonical
 
192
  exists in `transformers`; Qwen reuses the 3.5 class names.
193
  - **llama.cpp's converter.** `convert_hf_to_gguf.py` registers
194
  `Qwen3_5ForCausalLM` → `MODEL_ARCH.QWEN35` and
195
+ `Qwen3_5MoeForCausalLM` → `MODEL_ARCH.QWEN35MOE`. The unsloth
196
+ GGUFs this repo pulls from (`unsloth/Qwen3.6-27B-GGUF`,
197
+ `unsloth/Qwen3.6-35B-A3B-GGUF`) inherit those stamps.
 
 
198
  - **llama.cpp's model code.** `src/models/qwen35.cpp` has an
199
  explicit `case 64: type = LLM_TYPE_27B` branch for this model;
200
  `qwen35moe.cpp` has `case 40: type = LLM_TYPE_35B_A3B` for the
 
288
  make load-bundle # creates local tag thanatos-27b-heretic
289
  ollama run thanatos-27b-heretic
290
 
291
+ # C. Bypass the bundle: download a qwen35-stamped GGUF from unsloth
292
+ # and build locally. Loads on every current llama.cpp / Ollama.
 
 
293
  make build # Q4_K_M -> thanatos-27b-heretic
294
+ make build QUANT=Q3_K_S # 12 GB smaller quant
295
+ make build QUANT=Q5_K_M # 20 GB higher quality
296
+ make build GGUF_PATH=~/models/Qwen3.6-27B-Q4_K_M.gguf # skip download
297
  ollama run thanatos-27b-heretic
298
  ```
299
 
 
317
 
318
  | App | How to load this model |
319
  |---|---|
320
+ | **Ollama** | `ollama run hf.co/FoolDev/Thanatos-27B-Heretic` (default Q4_K_M). Pulls the GGUF + the root-level `template` / `system` / `params` files in one step (HF's Ollama bridge ingests these three files; it does **not** read `Modelfile`). For other quants, `make build QUANT=Q3_K_S` downloads from unsloth and creates a local Ollama tag using the `Modelfile`, which is kept in sync with the bridge files. |
321
+ | **LM Studio** | Search → `FoolDev/Thanatos-27B-Heretic` → pick `Thanatos-27B.Q4_K_M.gguf`. Uses the GGUF's embedded jinja chat template (Qwen 3.6 ChatML); set the system prompt manually from the `SYSTEM` block in this repo's `Modelfile`. |
322
  | **Jan** | Hub → "Import from Hugging Face" → `FoolDev/Thanatos-27B-Heretic`. Same template behavior as LM Studio. |
323
+ | **llama.cpp** | `hf download FoolDev/Thanatos-27B-Heretic Thanatos-27B.Q4_K_M.gguf --local-dir .` then `llama-server -m Thanatos-27B.Q4_K_M.gguf` (or `llama-cli`, `llama-mtmd-cli` for vision via the upstream `mmproj-F16.gguf`). |
324
  | **llama-cpp-python** | See `examples/llama_cpp_quickstart.py` (text) and `examples/llama_cpp_vision.py` (image input). |
325
  | **Open WebUI / KoboldCpp / text-generation-webui** | Standard llama.cpp loader path — point at the GGUF, use the embedded chat template. |
326
 
 
376
 
377
  ## Vision
378
 
379
+ The Qwen 3.6 base supports image (and video) input via a separate
380
+ `mmproj` projector. The full multimodal stack is:
 
381
 
382
  ```
383
+ Qwen3.6-27B-Q4_K_M.gguf (~17 GB, the text decoder)
384
+ mmproj-F16.gguf (~927 MB, the vision projector)
385
  ```
386
 
387
  Both files are at
388
+ [`unsloth/Qwen3.6-27B-GGUF`](https://huggingface.co/unsloth/Qwen3.6-27B-GGUF).
389
+ This repo intentionally does not redistribute either.
 
 
 
390
 
391
  ### Loader compatibility — the honest table
392
 
 
404
  ```bash
405
  # A. HTTP via llama-server (always built — the easiest path).
406
  # Reconfirmed working 2026-05-19 against llama.cpp 389ff61 + Vulkan
407
+ # on a Ryzen AI Max+ 395 / Radeon 8060S iGPU.
 
408
  llama-server \
409
+ -m Qwen3.6-27B-Q4_K_M.gguf \
410
+ --mmproj mmproj-F16.gguf \
411
  --host 127.0.0.1 --port 8765 -c 8192 -ngl 99
412
  # then POST OpenAI-style chat completions with an image_url content
413
  # block — e.g. {"type":"image_url","image_url":{"url":"data:image/jpeg;base64,..."}}
 
420
  # produce it — a plain `cmake --build build` will. If yours didn't,
421
  # run `cmake --build build --target llama-mtmd-cli`.
422
  llama-mtmd-cli \
423
+ -m Qwen3.6-27B-Q4_K_M.gguf \
424
+ --mmproj mmproj-F16.gguf \
425
  --image photo.jpg \
426
  -p "Describe this image."
427
 
428
  # C. Python via llama-cpp-python:
429
  python examples/llama_cpp_vision.py \
430
+ --gguf /path/to/Qwen3.6-27B-Q4_K_M.gguf \
431
+ --mmproj /path/to/mmproj-F16.gguf \
432
  --image /path/to/photo.jpg \
433
  --prompt "What is in this image?"
434
  ```
 
446
  | RTX 3090 / 4090 24 GB | Works, full Q4 offload, ~25-40 tok/s |
447
  | RTX 5090 32 GB | Works, full offload at higher quant (Q5/Q6), ~30-50 tok/s |
448
  | Mac Studio M2/M3 32 GB+ unified | Works, ~15-25 tok/s |
449
+ | 32 GB unified-memory laptops (Mac M-series, Ryzen AI Max+, etc.) | Borderline at Q4. `make build QUANT=Q3_K_S` (~12 GB) and trim `num_ctx` for headroom. |
450
 
451
  Most numbers in this table are estimates from comparable models; the
452
  gradient is right but the absolute values will move ±20% with prompt
453
  shape, KV cache type, and parallel-request count. Measure your own
454
  machine with `make bench` (3-prompt mix, reports tok/s from Ollama's
455
  `eval_count` / `eval_duration` so it's not stopwatch-noisy). Reference
456
+ data points on a Ryzen AI Max+ 395 / Radeon 8060S iGPU under Vulkan:
 
 
457
  **~12.3 tok/s at Q3_K_S** and **~9.3 tok/s at Q4_K_M** (3-prompt mix,
458
  steady across short / medium / long prompts), sitting between CPU-only
459
  and a 24 GB discrete card as expected. An earlier ROCm snapshot of the
460
  same Q3_K_S bench gave ~10.1 tok/s — Vulkan was the clear winner on
461
+ this hardware.
 
462
 
463
  ## Chat template
464
 
 
559
  - **No mmproj in this release**, and **vision via Ollama is broken upstream** (the qwen35/qwen35moe arch entries are present in Ollama's Go engine but missing from the C++ llama.cpp fallback Ollama uses when mmproj is attached — see the [Vision](#vision) section). For image input use llama.cpp directly until that's fixed.
560
  - **Q4_K_M quality loss** is real. Use Q5_K_M or Q6_K if you have the VRAM (~20-22 GB).
561
  - **No formal evaluation in this card.** Numbers above are estimates.
 
 
562
 
563
  ## Related models
564
 
565
  | Model | Notes |
566
  |---|---|
567
+ | [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) | Upstream base, safetensors |
568
+ | [unsloth/Qwen3.6-27B-GGUF](https://huggingface.co/unsloth/Qwen3.6-27B-GGUF) | Recommended GGUF source |
 
 
 
569
  | [FoolDev/Janus-35B](https://huggingface.co/FoolDev/Janus-35B) | 35B-A3B MoE sibling. More capacity, more memory pressure. |
570
  | [Crownelius/Crow-9B-HERETIC-4.6](https://huggingface.co/Crownelius/Crow-9B-HERETIC-4.6) | 9B starter model when 27B/35B is too heavy |
571
 
572
  ## Credits
573
 
574
+ - Base model: [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) (Alibaba)
 
575
  - Reasoning teacher: Claude Opus 4.7 (Anthropic)
576
  - Distillation lineage and dataset curation: [Crownelius](https://huggingface.co/Crownelius)
577
 
banner.png CHANGED
banner.svg CHANGED
examples/README.md CHANGED
@@ -5,9 +5,9 @@ Four minimal entry points. Pick the one that matches how you run models.
5
  | File | Backend | When to use |
6
  |---|---|---|
7
  | `ollama_chat.py` | Ollama HTTP API | You already have `ollama serve` running and the `thanatos-27b-heretic` model created from the project `Modelfile`. **Text + tool calling** — vision via Ollama is broken upstream for this arch. |
8
- | `transformers_quickstart.py` | Hugging Face Transformers | You want to run the Heretic safetensors (`llmfan46/Qwen3.6-27B-uncensored-heretic-v2`) on GPU, optionally in 4-bit via bitsandbytes. |
9
  | `llama_cpp_quickstart.py` | llama-cpp-python | You want to invoke a local GGUF directly without a daemon (CI, batch jobs, scripts). Text only. |
10
- | `llama_cpp_vision.py` | llama-cpp-python + mmproj | **Image input.** Loads a text GGUF + `Qwen3.6-27B-mmproj-BF16.gguf` and answers questions about an image. The only working vision path right now. |
11
 
12
  All four apply the same Thanatos system prompt and sampling defaults
13
  (`temp=0.6, top_p=0.95, top_k=20, repeat_penalty=1.05`) so behavior should
@@ -36,13 +36,12 @@ in place (qwen36 → qwen35, metadata-only, ~5 s) — the same
36
  tag then loads. Fresh pulls after the re-stamp go straight
37
  through.
38
 
39
- For a non-bundled quant (e.g. Q3_K_M ~13 GB, Q5_K_M ~19 GB),
40
- `make build QUANT=...` downloads from
41
- `llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF` and creates a
42
- local `thanatos-27b-heretic` tag:
43
 
44
  ```bash
45
- cd .. && make build QUANT=Q3_K_M && cd examples
46
  MODEL=thanatos-27b-heretic python ollama_chat.py
47
  ```
48
 
@@ -55,8 +54,8 @@ MODEL=thanatos-27b-heretic python ollama_chat.py
55
  ```
56
 
57
  For a quant the repo doesn't bundle (e.g. Q5_K_M), `make build` will
58
- fetch it from `llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF` and
59
- patch the `Modelfile` `FROM` line into a temp copy automatically:
60
 
61
  ```bash
62
  cd .. && make build QUANT=Q5_K_M && cd examples
@@ -75,7 +74,7 @@ python transformers_quickstart.py --no-4bit # bf16, ~54 GB VRAM
75
 
76
  ```bash
77
  pip install llama-cpp-python # CPU-only build
78
- python llama_cpp_quickstart.py /path/to/Qwen3.6-27B-uncensored-heretic-v2-Q4_K_M.gguf --gpu-layers 99
79
  ```
80
 
81
  For GPU offload, rebuild llama-cpp-python with the matching backend — see
@@ -84,13 +83,13 @@ the script header for `CMAKE_ARGS` recipes (CUDA, Metal, ROCm/HIP).
84
  ### Vision (image input)
85
 
86
  ```bash
87
- # Pull the projector once (~931 MB):
88
- hf download llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF Qwen3.6-27B-mmproj-BF16.gguf --local-dir .
89
 
90
  pip install llama-cpp-python pillow
91
  python llama_cpp_vision.py \
92
- --gguf /path/to/Qwen3.6-27B-uncensored-heretic-v2-Q4_K_M.gguf \
93
- --mmproj /path/to/Qwen3.6-27B-mmproj-BF16.gguf \
94
  --image /path/to/photo.jpg \
95
  --prompt "Describe this image."
96
  ```
@@ -102,7 +101,7 @@ lacks them. `ollama create` accepts the dual-`FROM` and `ollama show`
102
  reports `vision` capability, but the first inference call fails with
103
  `error loading model architecture: unknown model architecture:
104
  'qwen35'` (verified empirically against the dense 27B +
105
- the F16 reference projector). Tracked in
106
  [ollama/ollama#15898](https://github.com/ollama/ollama/issues/15898).
107
  Until that's fixed, llama.cpp / llama-cpp-python is the working path
108
  for vision.
 
5
  | File | Backend | When to use |
6
  |---|---|---|
7
  | `ollama_chat.py` | Ollama HTTP API | You already have `ollama serve` running and the `thanatos-27b-heretic` model created from the project `Modelfile`. **Text + tool calling** — vision via Ollama is broken upstream for this arch. |
8
+ | `transformers_quickstart.py` | Hugging Face Transformers | You want to run the upstream safetensors (`Qwen/Qwen3.6-27B`) on GPU, optionally in 4-bit via bitsandbytes. |
9
  | `llama_cpp_quickstart.py` | llama-cpp-python | You want to invoke a local GGUF directly without a daemon (CI, batch jobs, scripts). Text only. |
10
+ | `llama_cpp_vision.py` | llama-cpp-python + mmproj | **Image input.** Loads a text GGUF + `mmproj-F16.gguf` and answers questions about an image. The only working vision path right now. |
11
 
12
  All four apply the same Thanatos system prompt and sampling defaults
13
  (`temp=0.6, top_p=0.95, top_k=20, repeat_penalty=1.05`) so behavior should
 
36
  tag then loads. Fresh pulls after the re-stamp go straight
37
  through.
38
 
39
+ For a non-bundled quant (e.g. Q3_K_S ~12 GB, Q5_K_M ~20 GB),
40
+ `make build QUANT=...` downloads from `unsloth/Qwen3.6-27B-GGUF`
41
+ and creates a local `thanatos-27b-heretic` tag:
 
42
 
43
  ```bash
44
+ cd .. && make build QUANT=Q3_K_S && cd examples
45
  MODEL=thanatos-27b-heretic python ollama_chat.py
46
  ```
47
 
 
54
  ```
55
 
56
  For a quant the repo doesn't bundle (e.g. Q5_K_M), `make build` will
57
+ fetch it from `unsloth/Qwen3.6-27B-GGUF` and patch the `Modelfile`
58
+ `FROM` line into a temp copy automatically:
59
 
60
  ```bash
61
  cd .. && make build QUANT=Q5_K_M && cd examples
 
74
 
75
  ```bash
76
  pip install llama-cpp-python # CPU-only build
77
+ python llama_cpp_quickstart.py /path/to/Qwen3.6-27B-Q4_K_M.gguf --gpu-layers 99
78
  ```
79
 
80
  For GPU offload, rebuild llama-cpp-python with the matching backend — see
 
83
  ### Vision (image input)
84
 
85
  ```bash
86
+ # Pull the projector once (~927 MB):
87
+ hf download unsloth/Qwen3.6-27B-GGUF mmproj-F16.gguf --local-dir .
88
 
89
  pip install llama-cpp-python pillow
90
  python llama_cpp_vision.py \
91
+ --gguf /path/to/Qwen3.6-27B-Q4_K_M.gguf \
92
+ --mmproj /path/to/mmproj-F16.gguf \
93
  --image /path/to/photo.jpg \
94
  --prompt "Describe this image."
95
  ```
 
101
  reports `vision` capability, but the first inference call fails with
102
  `error loading model architecture: unknown model architecture:
103
  'qwen35'` (verified empirically against the dense 27B +
104
+ `mmproj-F16.gguf`). Tracked in
105
  [ollama/ollama#15898](https://github.com/ollama/ollama/issues/15898).
106
  Until that's fixed, llama.cpp / llama-cpp-python is the working path
107
  for vision.
examples/llama_cpp_vision.py CHANGED
@@ -23,21 +23,21 @@ Install:
23
  # CMAKE_ARGS="-DGGML_METAL=on" pip install llama-cpp-python --no-binary :all:
24
  # CMAKE_ARGS="-DGGML_HIPBLAS=on" pip install llama-cpp-python --no-binary :all:
25
 
26
- Files you need (both from llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF):
27
- 1. A text GGUF (any quant): e.g. Qwen3.6-27B-uncensored-heretic-v2-Q4_K_M.gguf (~17 GB)
28
- 2. A vision projector: Qwen3.6-27B-mmproj-BF16.gguf (~931 MB)
29
 
30
  Usage:
31
  python llama_cpp_vision.py \
32
- --gguf /path/to/Qwen3.6-27B-uncensored-heretic-v2-Q4_K_M.gguf \
33
- --mmproj /path/to/Qwen3.6-27B-mmproj-BF16.gguf \
34
  --image /path/to/photo.jpg \
35
  --prompt "What is in this image? Be specific."
36
 
37
  # CLI alternative without python binding (ships with llama.cpp):
38
  # llama-mtmd-cli \
39
- # -m Qwen3.6-27B-uncensored-heretic-v2-Q4_K_M.gguf \
40
- # --mmproj Qwen3.6-27B-mmproj-BF16.gguf \
41
  # --image photo.jpg \
42
  # -p "Describe this image."
43
  """
 
23
  # CMAKE_ARGS="-DGGML_METAL=on" pip install llama-cpp-python --no-binary :all:
24
  # CMAKE_ARGS="-DGGML_HIPBLAS=on" pip install llama-cpp-python --no-binary :all:
25
 
26
+ Files you need (both from unsloth/Qwen3.6-27B-GGUF):
27
+ 1. A text GGUF (any quant): e.g. Qwen3.6-27B-Q4_K_M.gguf (~17 GB)
28
+ 2. A vision projector: mmproj-F16.gguf (~927 MB)
29
 
30
  Usage:
31
  python llama_cpp_vision.py \
32
+ --gguf /path/to/Qwen3.6-27B-Q4_K_M.gguf \
33
+ --mmproj /path/to/mmproj-F16.gguf \
34
  --image /path/to/photo.jpg \
35
  --prompt "What is in this image? Be specific."
36
 
37
  # CLI alternative without python binding (ships with llama.cpp):
38
  # llama-mtmd-cli \
39
+ # -m Qwen3.6-27B-Q4_K_M.gguf \
40
+ # --mmproj mmproj-F16.gguf \
41
  # --image photo.jpg \
42
  # -p "Describe this image."
43
  """
examples/transformers_quickstart.py CHANGED
@@ -2,14 +2,11 @@
2
  """
3
  Thanatos-27B-Heretic — Hugging Face Transformers quickstart.
4
 
5
- Loads the Heretic v2 Qwen 3.6 27B safetensors directly and runs a single
6
  chat turn using its embedded chat template. Thanatos-27B-Heretic is a
7
  *wrapper* around that base, so for the transformers route there is nothing
8
- to download from this repo — point at llmfan46/Qwen3.6-27B-uncensored-heretic-v2
9
- and apply the same system prompt the Modelfile uses.
10
-
11
- Set MODEL_ID = "Qwen/Qwen3.6-27B" to bypass the Heretic abliteration and
12
- load the vanilla upstream base instead.
13
 
14
  Requirements:
15
  pip install --upgrade "transformers>=4.45" accelerate sentencepiece bitsandbytes
@@ -39,7 +36,7 @@ except ImportError as e: # pragma: no cover
39
  )
40
 
41
 
42
- MODEL_ID = "llmfan46/Qwen3.6-27B-uncensored-heretic-v2"
43
 
44
  THANATOS_SYSTEM = (
45
  "You are Thanatos, a precise and capable assistant for reasoning, writing, "
 
2
  """
3
  Thanatos-27B-Heretic — Hugging Face Transformers quickstart.
4
 
5
+ Loads the upstream Qwen 3.6 27B safetensors directly and runs a single
6
  chat turn using its embedded chat template. Thanatos-27B-Heretic is a
7
  *wrapper* around that base, so for the transformers route there is nothing
8
+ to download from this repo — point at Qwen/Qwen3.6-27B and apply the same
9
+ system prompt the Modelfile uses.
 
 
 
10
 
11
  Requirements:
12
  pip install --upgrade "transformers>=4.45" accelerate sentencepiece bitsandbytes
 
36
  )
37
 
38
 
39
+ MODEL_ID = "Qwen/Qwen3.6-27B"
40
 
41
  THANATOS_SYSTEM = (
42
  "You are Thanatos, a precise and capable assistant for reasoning, writing, "
scripts/build.sh CHANGED
@@ -7,20 +7,21 @@
7
  # QUANT=Q6_K ./scripts/build.sh
8
  #
9
  # Skip the download by pointing at a GGUF you already have:
10
- # GGUF_PATH=/path/to/Qwen3.6-27B-uncensored-heretic-v2-Q4_K_M.gguf ./scripts/build.sh Q4_K_M
11
  #
12
  # Requires: huggingface-cli (or hf), ollama, awk.
13
  set -euo pipefail
14
 
15
  QUANT="${1:-${QUANT:-Q4_K_M}}"
16
 
17
- REPO_ID="${REPO_ID:-llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF}"
18
- # Filenames at llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF follow
19
- # Qwen3.6-27B-uncensored-heretic-v2-${QUANT}.gguf
20
- # Quants known to exist (as of 2026-05):
21
- # Q3_K_M Q3_K_L Q4_K_S Q4_K_M Q5_K_S Q5_K_M Q6_K Q8_0 BF16
22
- # Note: no Q3_K_S in this repo — use Q3_K_M for the smallest practical quant.
23
- GGUF_NAME="Qwen3.6-27B-uncensored-heretic-v2-${QUANT}.gguf"
 
24
  ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
25
  # GGUF_PATH defaults to ${ROOT}/${GGUF_NAME}, but can be overridden so users
26
  # with cached weights elsewhere don't have to copy or symlink anything.
 
7
  # QUANT=Q6_K ./scripts/build.sh
8
  #
9
  # Skip the download by pointing at a GGUF you already have:
10
+ # GGUF_PATH=/path/to/Qwen3.6-27B-Q4_K_M.gguf ./scripts/build.sh Q4_K_M
11
  #
12
  # Requires: huggingface-cli (or hf), ollama, awk.
13
  set -euo pipefail
14
 
15
  QUANT="${1:-${QUANT:-Q4_K_M}}"
16
 
17
+ REPO_ID="${REPO_ID:-unsloth/Qwen3.6-27B-GGUF}"
18
+ # Upstream uses dashes, e.g. Qwen3.6-27B-Q4_K_M.gguf. Quants known to exist
19
+ # at unsloth/Qwen3.6-27B-GGUF (as of 2026-04):
20
+ # Q3_K_S Q3_K_M Q4_0 Q4_1 Q4_K_S Q4_K_M Q5_K_S Q5_K_M Q6_K Q8_0
21
+ # IQ4_XS IQ4_NL
22
+ # UD-IQ2_XXS UD-IQ2_M UD-Q2_K_XL UD-IQ3_XXS UD-Q3_K_XL UD-Q4_K_XL
23
+ # UD-Q5_K_XL UD-Q6_K_XL UD-Q8_K_XL
24
+ GGUF_NAME="Qwen3.6-27B-${QUANT}.gguf"
25
  ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
26
  # GGUF_PATH defaults to ${ROOT}/${GGUF_NAME}, but can be overridden so users
27
  # with cached weights elsewhere don't have to copy or symlink anything.
scripts/check.sh CHANGED
@@ -104,11 +104,9 @@ fi
104
 
105
  # ---- 5. footgun: dot-vs-dash filename -------------------------------------
106
  #
107
- # Upstream llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF (and the
108
- # legacy unsloth/Qwen3.6-27B-GGUF) use dashes
109
- # (Qwen3.6-27B-uncensored-heretic-v2-Q4_K_M.gguf,
110
- # Qwen3.6-27B-Q4_K_M.gguf). Earlier commits used the wrong
111
- # dot-separated pattern, which 404s. Block re-introduction.
112
 
113
  blue "[*] grep: forbidden Qwen3.6-27B.Q* filename pattern"
114
  if grep -RnE 'Qwen3\.6-27B\.Q[0-9A-Z_]+\.gguf' \
 
104
 
105
  # ---- 5. footgun: dot-vs-dash filename -------------------------------------
106
  #
107
+ # Upstream unsloth/Qwen3.6-27B-GGUF uses dashes (Qwen3.6-27B-Q4_K_M.gguf).
108
+ # Earlier commits used the wrong dot-separated pattern, which 404s.
109
+ # Block re-introduction.
 
 
110
 
111
  blue "[*] grep: forbidden Qwen3.6-27B.Q* filename pattern"
112
  if grep -RnE 'Qwen3\.6-27B\.Q[0-9A-Z_]+\.gguf' \
scripts/fetch_vision.sh CHANGED
@@ -8,20 +8,16 @@
8
  # it (see README Vision section, ollama/ollama#15898).
9
  #
10
  # Usage:
11
- # ./scripts/fetch_vision.sh # default: BF16 (~931 MB)
12
- #
13
- # llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF publishes BF16 only;
14
- # for F16/F32 variants fall back to unsloth's reference projector:
15
- # REPO_ID=unsloth/Qwen3.6-27B-GGUF FILE_NAME=mmproj-F16.gguf ./scripts/fetch_vision.sh
16
- # (vision tokens are projected the same way across Qwen 3.6 27B
17
- # finetunes, so the unsloth projector is functionally interchangeable.)
18
  #
19
  # Requires: huggingface-cli (or hf).
20
  set -euo pipefail
21
 
22
- PRECISION="${1:-${PRECISION:-BF16}}"
23
- REPO_ID="${REPO_ID:-llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF}"
24
- FILE_NAME="${FILE_NAME:-Qwen3.6-27B-mmproj-${PRECISION}.gguf}"
25
  ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
26
  DEST="${MMPROJ_PATH:-${ROOT}/${FILE_NAME}}"
27
 
@@ -62,7 +58,7 @@ fi
62
  echo
63
  echo "[+] Done. Use it via:"
64
  echo " python ${ROOT}/examples/llama_cpp_vision.py \\"
65
- echo " --gguf /path/to/Qwen3.6-27B-uncensored-heretic-v2-Q4_K_M.gguf \\"
66
  echo " --mmproj ${DEST} \\"
67
  echo " --image /path/to/photo.jpg \\"
68
  echo " --prompt 'Describe this image.'"
 
8
  # it (see README Vision section, ollama/ollama#15898).
9
  #
10
  # Usage:
11
+ # ./scripts/fetch_vision.sh # default: F16, ~927 MB
12
+ # ./scripts/fetch_vision.sh BF16 # ~931 MB
13
+ # ./scripts/fetch_vision.sh F32 # ~1.8 GB
 
 
 
 
14
  #
15
  # Requires: huggingface-cli (or hf).
16
  set -euo pipefail
17
 
18
+ PRECISION="${1:-${PRECISION:-F16}}"
19
+ REPO_ID="${REPO_ID:-unsloth/Qwen3.6-27B-GGUF}"
20
+ FILE_NAME="mmproj-${PRECISION}.gguf"
21
  ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
22
  DEST="${MMPROJ_PATH:-${ROOT}/${FILE_NAME}}"
23
 
 
58
  echo
59
  echo "[+] Done. Use it via:"
60
  echo " python ${ROOT}/examples/llama_cpp_vision.py \\"
61
+ echo " --gguf /path/to/Qwen3.6-27B-Q4_K_M.gguf \\"
62
  echo " --mmproj ${DEST} \\"
63
  echo " --image /path/to/photo.jpg \\"
64
  echo " --prompt 'Describe this image.'"