Instructions to use FoolDev/Thanatos-27B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use FoolDev/Thanatos-27B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="FoolDev/Thanatos-27B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("FoolDev/Thanatos-27B", dtype="auto")

llama-cpp-python

How to use FoolDev/Thanatos-27B with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="FoolDev/Thanatos-27B",
	filename="Thanatos-27B.Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use FoolDev/Thanatos-27B with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf FoolDev/Thanatos-27B:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf FoolDev/Thanatos-27B:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf FoolDev/Thanatos-27B:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf FoolDev/Thanatos-27B:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf FoolDev/Thanatos-27B:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf FoolDev/Thanatos-27B:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf FoolDev/Thanatos-27B:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf FoolDev/Thanatos-27B:Q4_K_M

Use Docker

docker model run hf.co/FoolDev/Thanatos-27B:Q4_K_M

LM Studio
Jan

vLLM

How to use FoolDev/Thanatos-27B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "FoolDev/Thanatos-27B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FoolDev/Thanatos-27B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/FoolDev/Thanatos-27B:Q4_K_M

SGLang

How to use FoolDev/Thanatos-27B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "FoolDev/Thanatos-27B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FoolDev/Thanatos-27B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "FoolDev/Thanatos-27B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FoolDev/Thanatos-27B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Ollama
How to use FoolDev/Thanatos-27B with Ollama:
```
ollama run hf.co/FoolDev/Thanatos-27B:Q4_K_M
```

Unsloth Studio

How to use FoolDev/Thanatos-27B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for FoolDev/Thanatos-27B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for FoolDev/Thanatos-27B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for FoolDev/Thanatos-27B to start chatting

How to use FoolDev/Thanatos-27B with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf FoolDev/Thanatos-27B:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "FoolDev/Thanatos-27B:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use FoolDev/Thanatos-27B with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf FoolDev/Thanatos-27B:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default FoolDev/Thanatos-27B:Q4_K_M

Run Hermes

hermes

Atomic Chat new

OpenClaw new

How to use FoolDev/Thanatos-27B with OpenClaw:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf FoolDev/Thanatos-27B:Q4_K_M

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "FoolDev/Thanatos-27B:Q4_K_M" \
  --custom-provider-id llama-cpp \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use FoolDev/Thanatos-27B with Docker Model Runner:
```
docker model run hf.co/FoolDev/Thanatos-27B:Q4_K_M
```

Lemonade

How to use FoolDev/Thanatos-27B with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull FoolDev/Thanatos-27B:Q4_K_M

Run and chat with the model

lemonade run user.Thanatos-27B-Q4_K_M

List all available models

lemonade list

FoolDev Claude Opus 4.7 commited on May 24

Commit

7197abd

1 Parent(s): 73e905b

Rename back: Thanatos-27B-Heretic → Thanatos-27B (HF repo also renamed)

Browse files

The HF repo was renamed back to FoolDev/Thanatos-27B via the HF UI
(serves a 307 from the prior -Heretic name). With the base also
reverted to vanilla Qwen/Qwen3.6-27B in 73e905b, the -Heretic
suffix had no remaining justification.

- Bulk renamed Thanatos-27B-Heretic → Thanatos-27B and
thanatos-27b-heretic → thanatos-27b across README, Modelfile,
scripts, examples, CITATION.cff, Makefile, .gitignore.
- banner.svg: dropped the -HERETIC tspan, leaving THANATOS-27B
wordmark. banner.png re-rasterized at 2× via rsvg-convert.
- README "Note on the name" callout removed (name and base are
aligned again).
- CITATION.cff: dropped the trailing parenthetical about the
reverted Heretic swap.
- Makefile clean target deduped (the sed produced two identical
Thanatos-27B.*.qwen[0-9]*.gguf entries).
- Git remote re-pointed to git@hf.co:FoolDev/Thanatos-27B.
- CHANGELOG history retained as-is; entries below the new top
one still reference Thanatos-27B-Heretic as a record of work
done under that name.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Files changed (22) hide show

CHANGELOG.md +23 -0
CITATION.cff +4 -7
Makefile +5 -5
Modelfile +2 -2
README.md +24 -29
banner.png +0 -0
banner.svg +1 -1
examples/README.md +7 -7
examples/llama_cpp_quickstart.py +1 -1
examples/llama_cpp_vision.py +1 -1
examples/ollama_chat.py +5 -5
examples/transformers_quickstart.py +2 -2
scripts/bench.sh +4 -4
scripts/build.sh +3 -3
scripts/check.sh +1 -1
scripts/check_bridge_sync.py +2 -2
scripts/fetch_vision.sh +1 -1
scripts/heal_hf_pull.sh +8 -8
scripts/install-hooks.sh +1 -1
scripts/load_bundle.sh +7 -7
scripts/smoke_test.sh +6 -6
scripts/verify_arch.py +4 -4

CHANGELOG.md CHANGED Viewed

@@ -7,6 +7,29 @@ and documentation**, not the underlying base model.
 ## [Unreleased]
 ### Reverted (base swap to Heretic v2 — name kept, base back to vanilla Qwen)
 - **Undone the `Qwen/Qwen3.6-27B` → `llmfan46/Qwen3.6-27B-uncensored-heretic-v2`
   base swap** that shipped in `16e1ddd` and was polished in

 ## [Unreleased]
+### Changed (project name reverted: Thanatos-27B-Heretic → Thanatos-27B)
+- **HF repo renamed back to `FoolDev/Thanatos-27B`** via the HF UI
+  (HF serves a 307 redirect from `FoolDev/Thanatos-27B-Heretic` to
+  the canonical name). Now that the base is also reverted to
+  vanilla `Qwen/Qwen3.6-27B`, the `-Heretic` suffix had no
+  remaining justification.
+- **Bulk-renamed `Thanatos-27B-Heretic` → `Thanatos-27B`** and
+  `thanatos-27b-heretic` → `thanatos-27b` across README,
+  Modelfile, scripts, examples, CITATION.cff, Makefile,
+  .gitignore. CHANGELOG history entries below this one retain
+  their original wording (they document actions taken under the
+  Heretic name, before this revert).
+- **Banner wordmark** — dropped the `-HERETIC` tspan from
+  banner.svg, leaving `THANATOS-27B`. banner.png re-rasterized at
+  2× via rsvg-convert.
+- **README "Note on the name" callout removed** — no longer
+  applicable since name and base are aligned again.
+- **CITATION.cff abstract** — dropped the trailing parenthetical
+  about the reverted Heretic swap.
+- **Local git remote re-pointed** from
+  `git@hf.co:FoolDev/Thanatos-27B-Heretic` to
+  `git@hf.co:FoolDev/Thanatos-27B`.
 ### Reverted (base swap to Heretic v2 — name kept, base back to vanilla Qwen)
 - **Undone the `Qwen/Qwen3.6-27B` → `llmfan46/Qwen3.6-27B-uncensored-heretic-v2`
   base swap** that shipped in `16e1ddd` and was polished in

CITATION.cff CHANGED Viewed

@@ -1,14 +1,14 @@
 cff-version: 1.2.0
-title: "Thanatos-27B-Heretic: A Dense Distillation Wrapper for Qwen 3.6 27B"
 message: "If you use this model card or its accompanying files, please cite as below."
 type: software
 authors:
   - name: FoolDev
     website: "https://huggingface.co/FoolDev"
-repository-code: "https://huggingface.co/FoolDev/Thanatos-27B-Heretic"
-url: "https://huggingface.co/FoolDev/Thanatos-27B-Heretic"
 abstract: >-
-  Thanatos-27B-Heretic is a personal repackaging of the dense Qwen 3.6 27B base
   model with Claude Opus 4.7 in the reasoning teacher slot. The
   repository ships an Ollama Modelfile, sampling defaults, usage
   examples, and a single ready-to-run GGUF (Q4_K_M ~17 GB) so the HF
@@ -16,9 +16,6 @@ abstract: >-
   quants (Q3_K_S, Q5_K_M, Q6_K, etc.) and the upstream safetensors
   (Qwen/Qwen3.6-27B) are pulled from upstream
   (unsloth/Qwen3.6-27B-GGUF) on demand rather than redistributed.
-  (The repo carries the `-Heretic` suffix from a prior swap to
-  llmfan46/Qwen3.6-27B-uncensored-heretic-v2 that was reverted;
-  current base is vanilla Qwen 3.6 27B.)
 keywords:
   - qwen
   - qwen3.6

 cff-version: 1.2.0
+title: "Thanatos-27B: A Dense Distillation Wrapper for Qwen 3.6 27B"
 message: "If you use this model card or its accompanying files, please cite as below."
 type: software
 authors:
   - name: FoolDev
     website: "https://huggingface.co/FoolDev"
+repository-code: "https://huggingface.co/FoolDev/Thanatos-27B"
+url: "https://huggingface.co/FoolDev/Thanatos-27B"
 abstract: >-
+  Thanatos-27B is a personal repackaging of the dense Qwen 3.6 27B base
   model with Claude Opus 4.7 in the reasoning teacher slot. The
   repository ships an Ollama Modelfile, sampling defaults, usage
   examples, and a single ready-to-run GGUF (Q4_K_M ~17 GB) so the HF
   quants (Q3_K_S, Q5_K_M, Q6_K, etc.) and the upstream safetensors
   (Qwen/Qwen3.6-27B) are pulled from upstream
   (unsloth/Qwen3.6-27B-GGUF) on demand rather than redistributed.
 keywords:
   - qwen
   - qwen3.6

Makefile CHANGED Viewed

@@ -1,11 +1,11 @@
-# Thanatos-27B-Heretic convenience Makefile.
 #
 # All work is delegated to scripts/* — this file just gives common
 # operations short, discoverable names.
 #
 # Variables you can override on the command line:
 #   QUANT     GGUF quant suffix       (default: Q4_K_M)
-#   TAG       Ollama model tag        (default: thanatos-27b-heretic)
 #   GGUF_PATH path to existing GGUF   (skip the download)
 #   MODEL     model tag for smoke     (default: $(TAG))
 #
@@ -19,7 +19,7 @@
 #   make clean
 QUANT ?= Q4_K_M
-TAG   ?= thanatos-27b-heretic
 MODEL ?= $(TAG)
 .DEFAULT_GOAL := help
@@ -43,7 +43,7 @@ build:  ## Download qwen35-stamped GGUF from unsloth and run 'ollama create' (lo
 load-bundle:  ## Load THIS repo's bundled GGUF into a local Ollama tag (smudge LFS + ollama create).
 	TAG=$(TAG) ./scripts/load_bundle.sh
-heal-hf:  ## Heal an already-pulled hf.co/FoolDev/Thanatos-27B-Heretic tag in-store (rebadge blob + manifest digest).
 	./scripts/heal_hf_pull.sh
 smoke:  ## Verify the model is reachable and round-trips.
@@ -69,6 +69,6 @@ hooks:  ## Install scripts/check.sh as the git pre-commit hook.
 clean:  ## Remove local GGUF copies and ephemeral caches in this repo.
 	@echo "[*] removing local GGUFs and ephemeral caches in $$PWD"
-	@rm -f ./Qwen3.6-27B-*.gguf ./mmproj-*.gguf ./Thanatos-27B.*.qwen[0-9]*.gguf ./Thanatos-27B-Heretic.*.qwen[0-9]*.gguf
 	@rm -rf ./.cache __pycache__ examples/__pycache__
 	@echo "[+] clean"

+# Thanatos-27B convenience Makefile.
 #
 # All work is delegated to scripts/* — this file just gives common
 # operations short, discoverable names.
 #
 # Variables you can override on the command line:
 #   QUANT     GGUF quant suffix       (default: Q4_K_M)
+#   TAG       Ollama model tag        (default: thanatos-27b)
 #   GGUF_PATH path to existing GGUF   (skip the download)
 #   MODEL     model tag for smoke     (default: $(TAG))
 #
 #   make clean
 QUANT ?= Q4_K_M
+TAG   ?= thanatos-27b
 MODEL ?= $(TAG)
 .DEFAULT_GOAL := help
 load-bundle:  ## Load THIS repo's bundled GGUF into a local Ollama tag (smudge LFS + ollama create).
 	TAG=$(TAG) ./scripts/load_bundle.sh
+heal-hf:  ## Heal an already-pulled hf.co/FoolDev/Thanatos-27B tag in-store (rebadge blob + manifest digest).
 	./scripts/heal_hf_pull.sh
 smoke:  ## Verify the model is reachable and round-trips.
 clean:  ## Remove local GGUF copies and ephemeral caches in this repo.
 	@echo "[*] removing local GGUFs and ephemeral caches in $$PWD"
+	@rm -f ./Qwen3.6-27B-*.gguf ./mmproj-*.gguf ./Thanatos-27B.*.qwen[0-9]*.gguf
 	@rm -rf ./.cache __pycache__ examples/__pycache__
 	@echo "[+] clean"

Modelfile CHANGED Viewed

@@ -1,4 +1,4 @@
-# Thanatos-27B-Heretic — Ollama wrapper around Qwen 3.6 27B (dense)
 #
 # Text + tool calling. Vision via Ollama is currently broken for this
 # architecture (ollama/ollama#15898 — the qwen35 arch entries are in
@@ -10,7 +10,7 @@
 # stamped `general.architecture: 'qwen35'` — the upstream-canonical
 # arch entry every released llama.cpp / Ollama loads under for the
 # Qwen 3.5 / 3.6 hybrid SSM + attention family. `ollama create
-# thanatos-27b-heretic -f Modelfile && ollama run thanatos-27b-heretic` loads it
 # directly. See README "Architecture" for the full stamp history
 # (eight flips between qwen35 and qwen36, settled on qwen35 at
 # `e03e10e` after the 4th qwen36 round trip had its friction

+# Thanatos-27B — Ollama wrapper around Qwen 3.6 27B (dense)
 #
 # Text + tool calling. Vision via Ollama is currently broken for this
 # architecture (ollama/ollama#15898 — the qwen35 arch entries are in
 # stamped `general.architecture: 'qwen35'` — the upstream-canonical
 # arch entry every released llama.cpp / Ollama loads under for the
 # Qwen 3.5 / 3.6 hybrid SSM + attention family. `ollama create
+# thanatos-27b -f Modelfile && ollama run thanatos-27b` loads it
 # directly. See README "Architecture" for the full stamp history
 # (eight flips between qwen35 and qwen36, settled on qwen35 at
 # `e03e10e` after the 4th qwen36 round trip had its friction

README.md CHANGED Viewed

@@ -45,7 +45,7 @@ library_name: transformers
 pipeline_tag: image-text-to-text
 ---
-<img src="https://huggingface.co/FoolDev/Thanatos-27B-Heretic/resolve/main/banner.svg" alt="Thanatos-27B-Heretic banner" width="100%" />
 [![License](https://img.shields.io/badge/License-Apache_2.0-7aa2f7?style=flat&labelColor=1a1b26)](https://opensource.org/licenses/Apache-2.0)
 [![Base Model](https://img.shields.io/badge/Base-Qwen3.6--27B-bb9af7?style=flat&labelColor=1a1b26)](https://huggingface.co/Qwen/Qwen3.6-27B)
@@ -53,7 +53,7 @@ pipeline_tag: image-text-to-text
 [![Sibling](https://img.shields.io/badge/Sibling-Janus--35B-7dcfff?style=flat&labelColor=1a1b26)](https://huggingface.co/FoolDev/Janus-35B)
 [![Buy me a coffee](https://img.shields.io/badge/%E2%98%95%20Buy_me_a_coffee-e0af68?style=flat&logo=buymeacoffee&logoColor=1a1b26&labelColor=1a1b26)](https://buymeacoffee.com/cardoffoolm)
-# Thanatos-27B-Heretic
 > **Dense Reasoning. Friendlier Footprint.**
 > *Qwen 3.6 27B (dense) repackaged with Claude Opus 4.7 in the teacher slot.*
@@ -62,11 +62,6 @@ pipeline_tag: image-text-to-text
 A personal sibling to [`FoolDev/Janus-35B`](https://huggingface.co/FoolDev/Janus-35B). Same teacher (Claude Opus 4.7), same dataset family, but built on the **dense** [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) base instead of the 35B-A3B MoE. Smaller, easier to deploy, no expert-routing surprises.
-> **Note on the name.** The repo carries the `-Heretic` suffix from a
-> prior swap to `llmfan46/Qwen3.6-27B-uncensored-heretic-v2` that was
-> reverted. The current base is the vanilla `Qwen/Qwen3.6-27B`; the
-> name string and HF repo URL are kept for continuity.
 ## TL;DR
 One-liner via Hugging Face (pulls a GGUF + this repo's root-level
@@ -75,7 +70,7 @@ template — HF's Ollama bridge ingests those three files, not
 `Modelfile`):
 ```bash
-ollama run hf.co/FoolDev/Thanatos-27B-Heretic           # ~17 GB Q4_K_M, qwen35-stamped, loads on stock Ollama
 ```
 If you pulled the bundle during any of the qwen36 windows on the
@@ -96,7 +91,7 @@ The 35B-A3B is a sparse mixture-of-experts model: 35B parameters total but only
 The 27B is **dense**: every parameter participates in every forward pass. It's slower per token than 35B-A3B — on a Ryzen AI Max+ 395 / Radeon 8060S iGPU the dense 27B at Q3_K_S clocks ~10 tok/s, versus ~27 tok/s for the MoE 35B at ~Q4 (`make bench`, 3-prompt mix) — but the working set fits comfortably on commodity GPUs and avoids the MoE-specific load-balance failure modes.
-| | Thanatos-27B-Heretic (this) | [Janus-35B](https://huggingface.co/FoolDev/Janus-35B) |
 |---|---|---|
 | Architecture | Dense transformer | MoE 256 experts, 8 active |
 | Total params | 27 B | 35 B |
@@ -118,11 +113,11 @@ The 27B is **dense**: every parameter participates in every forward pass. It's s
 | `banner.svg` / `banner.png` | Repo header, Tokyo Night themed |
 | `dense-flow.svg` / `dense-flow.png` | Architecture diagram: 64-layer hybrid attention stack with animated forward-pass pulse (SVG); static frame fallback (PNG) |
 | `Modelfile` | Ollama wrapper around the bundled Qwen 3.6 27B GGUF — used by `make build` / `ollama create` for **local** builds |
-| `template`, `system`, `params` | Used by HF's Ollama bridge when users `ollama run hf.co/FoolDev/Thanatos-27B-Heretic` directly (the bridge does **not** read `Modelfile` — see [HF Ollama docs](https://huggingface.co/docs/hub/en/ollama)). Mirrors the `Modelfile`'s template / system prompt / sampling params. |
 | `examples/` | Ready-to-run Python clients for Ollama, Transformers, and llama-cpp-python |
 | `scripts/build.sh` | Pulls a qwen35-stamped GGUF from `unsloth/Qwen3.6-27B-GGUF` and runs `ollama create` (loads on today's llama.cpp / Ollama; see `make build`) |
 | `scripts/load_bundle.sh` | One-shot path from *this repo's* bundle → loadable local Ollama tag (smudges LFS pointer via `hf download` if needed, runs `ollama create`; see `make load-bundle`). Carries a qwen36 → qwen35 rebadge branch for legacy pre-rename checkouts — no-op on the current qwen35-stamped bundle. |
-| `scripts/heal_hf_pull.sh` | Legacy recovery for users who pulled `hf.co/FoolDev/Thanatos-27B-Heretic` (or the pre-rename `FoolDev/Thanatos-27B`) *before* the latest qwen35 re-stamp and still have a qwen36-stamped blob in their local Ollama store: rebadges the blob qwen36 → qwen35 and rewrites the manifest's model-layer digest so the same tag becomes loadable in place. See `make heal-hf`. Idempotent and a no-op on tags already on qwen35 — fresh pulls don't need it. |
 | `scripts/smoke_test.sh` | Verifies an Ollama daemon + model, runs a round-trip, asserts no chat-template tokens leak into the response. With `TOOLS_TEST=1`, also exercises an end-to-end tool-call round-trip and checks the response shape |
 | `scripts/bench.sh` | Measures real tok/s using Ollama's `eval_count` / `eval_duration` metadata over a 3-prompt mix (run `make bench`) |
 | `scripts/fetch_vision.sh` | Pulls the vision projector (`mmproj-F16.gguf`) for llama.cpp (Ollama vision is broken upstream — see [Vision](#vision)). Renamed from `fetch_mmproj.sh` because HF's Ollama bridge auto-indexed the script as a vision projector layer (filename pattern match). |
@@ -138,7 +133,7 @@ The 27B is **dense**: every parameter participates in every forward pass. It's s
 For 16 GB GPUs / unified-memory laptops, `make build QUANT=Q3_K_S`
 downloads the smaller ~12 GB Q3_K_S quant from
 `unsloth/Qwen3.6-27B-GGUF` (qwen35-stamped, loads directly) and
-creates a local `thanatos-27b-heretic` Ollama tag. Does not redistribute
 via this repo. For other quants use `make build QUANT=...`. The
 local-build path applies this repo's `Modelfile`; the `hf.co/...`
 path applies the root-level `template`, `system`, and `params`
@@ -149,7 +144,7 @@ If you want the safetensors for `transformers`, fetch them from [`Qwen/Qwen3.6-2
 ## Architecture
 <p align="left">
-  <img src="https://huggingface.co/FoolDev/Thanatos-27B-Heretic/resolve/main/dense-flow.svg" alt="animated dense forward-pass visualization: 64-layer hybrid attention stack with a pulse traversing left-to-right, illuminating Gated DeltaNet (purple) and Gated Attention (cyan) layers in turn" width="800" />
 </p>
 - Qwen 3.6 dense, 27B parameters, 64 transformer layers
@@ -206,7 +201,7 @@ There is no PR or tracking issue for a `qwen36` arch entry in
 `qwen35` already loads the model the upstream code path was
 designed to load.
-`ollama run hf.co/FoolDev/Thanatos-27B-Heretic` and `llama-server -m
 Thanatos-27B.Q4_K_M.gguf` both load directly on current stock
 loaders.
@@ -280,21 +275,21 @@ Three paths:
 ```bash
 # A. Pull straight from HF (gets the bundled Q4_K_M GGUF + the
 #    root-level template / system / params files in one step):
-ollama run hf.co/FoolDev/Thanatos-27B-Heretic           # 17 GB Q4_K_M, qwen35-stamped
-# B. Build a local `thanatos-27b-heretic` tag from THIS repo's bundle
 #    (LFS smudge if needed, then `ollama create`). Useful if you
 #    want a bare local tag rather than the `hf.co/...` path:
-make load-bundle                                 # creates local tag thanatos-27b-heretic
-ollama run thanatos-27b-heretic
 # C. Bypass the bundle: download a qwen35-stamped GGUF from unsloth
 #    and build locally. Loads on every current llama.cpp / Ollama.
-make build                                              # Q4_K_M  -> thanatos-27b-heretic
 make build QUANT=Q3_K_S                                 # 12 GB smaller quant
 make build QUANT=Q5_K_M                                 # 20 GB higher quality
 make build GGUF_PATH=~/models/Qwen3.6-27B-Q4_K_M.gguf   # skip download
-ollama run thanatos-27b-heretic
 ```
 Under the hood, `make build` calls `scripts/build.sh`, which downloads the
@@ -302,7 +297,7 @@ GGUF if missing (set `GGUF_PATH` to point at one you already have) and
 runs `ollama create` with the matching `Modelfile`.
 If you'd rather do it by hand: edit the `FROM` line in `Modelfile` and
-run `ollama create thanatos-27b-heretic -f Modelfile && ollama run thanatos-27b-heretic`.
 Confirm everything works:
@@ -317,10 +312,10 @@ python examples/ollama_chat.py      # full demo: chat, streaming, tools, OpenAI-
 | App | How to load this model |
 |---|---|
-| **Ollama** | `ollama run hf.co/FoolDev/Thanatos-27B-Heretic` (default Q4_K_M). Pulls the GGUF + the root-level `template` / `system` / `params` files in one step (HF's Ollama bridge ingests these three files; it does **not** read `Modelfile`). For other quants, `make build QUANT=Q3_K_S` downloads from unsloth and creates a local Ollama tag using the `Modelfile`, which is kept in sync with the bridge files. |
-| **LM Studio** | Search → `FoolDev/Thanatos-27B-Heretic` → pick `Thanatos-27B.Q4_K_M.gguf`. Uses the GGUF's embedded jinja chat template (Qwen 3.6 ChatML); set the system prompt manually from the `SYSTEM` block in this repo's `Modelfile`. |
-| **Jan** | Hub → "Import from Hugging Face" → `FoolDev/Thanatos-27B-Heretic`. Same template behavior as LM Studio. |
-| **llama.cpp** | `hf download FoolDev/Thanatos-27B-Heretic Thanatos-27B.Q4_K_M.gguf --local-dir .` then `llama-server -m Thanatos-27B.Q4_K_M.gguf` (or `llama-cli`, `llama-mtmd-cli` for vision via the upstream `mmproj-F16.gguf`). |
 | **llama-cpp-python** | See `examples/llama_cpp_quickstart.py` (text) and `examples/llama_cpp_vision.py` (image input). |
 | **Open WebUI / KoboldCpp / text-generation-webui** | Standard llama.cpp loader path — point at the GGUF, use the embedded chat template. |
@@ -338,7 +333,7 @@ external schema.
 curl -s http://localhost:11434/v1/chat/completions \
   -H 'Content-Type: application/json' \
   -d '{
-    "model": "thanatos-27b-heretic",
     "messages": [
       {"role": "system", "content": "You are Thanatos, a precise reasoning assistant."},
       {"role": "user", "content": "Explain the Burrows-Wheeler transform in 200 words."}
@@ -472,10 +467,10 @@ Ollama is the exception: its conversion of the embedded jinja loses the
 `.Tools` / `.ToolCalls` blocks Ollama's capability detector requires.
 Two paths fix this, depending on how you pull the model:
-- **`ollama run hf.co/FoolDev/Thanatos-27B-Heretic`** — HF's Ollama bridge applies
   the root-level `template` / `system` / `params` files in this repo
   (the bridge does **not** read `Modelfile`).
-- **`make build` / `ollama create thanatos-27b-heretic -f Modelfile`** — uses the
   `Modelfile`'s `TEMPLATE` block.
 Both routes wire `.Tools` / `.ToolCalls` and tools work end-to-end on
@@ -518,7 +513,7 @@ the model adapts to whichever shape the system prompt prescribes.
 **Ollama path** (this repo's `Modelfile`). The `TEMPLATE` directive
 prompts the model to emit JSON-in-XML, the form Ollama's tool-call
 extractor parses into a structured `tool_calls` array. After
-`make build`, `ollama show thanatos-27b-heretic` lists `tools` and `thinking`
 under **Capabilities**, and both `/api/chat` and `/v1/chat/completions`
 accept a `tools` array.

 pipeline_tag: image-text-to-text
 ---
+<img src="https://huggingface.co/FoolDev/Thanatos-27B/resolve/main/banner.svg" alt="Thanatos-27B banner" width="100%" />
 [![License](https://img.shields.io/badge/License-Apache_2.0-7aa2f7?style=flat&labelColor=1a1b26)](https://opensource.org/licenses/Apache-2.0)
 [![Base Model](https://img.shields.io/badge/Base-Qwen3.6--27B-bb9af7?style=flat&labelColor=1a1b26)](https://huggingface.co/Qwen/Qwen3.6-27B)
 [![Sibling](https://img.shields.io/badge/Sibling-Janus--35B-7dcfff?style=flat&labelColor=1a1b26)](https://huggingface.co/FoolDev/Janus-35B)
 [![Buy me a coffee](https://img.shields.io/badge/%E2%98%95%20Buy_me_a_coffee-e0af68?style=flat&logo=buymeacoffee&logoColor=1a1b26&labelColor=1a1b26)](https://buymeacoffee.com/cardoffoolm)
+# Thanatos-27B
 > **Dense Reasoning. Friendlier Footprint.**
 > *Qwen 3.6 27B (dense) repackaged with Claude Opus 4.7 in the teacher slot.*
 A personal sibling to [`FoolDev/Janus-35B`](https://huggingface.co/FoolDev/Janus-35B). Same teacher (Claude Opus 4.7), same dataset family, but built on the **dense** [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) base instead of the 35B-A3B MoE. Smaller, easier to deploy, no expert-routing surprises.
 ## TL;DR
 One-liner via Hugging Face (pulls a GGUF + this repo's root-level
 `Modelfile`):
 ```bash
+ollama run hf.co/FoolDev/Thanatos-27B           # ~17 GB Q4_K_M, qwen35-stamped, loads on stock Ollama
 ```
 If you pulled the bundle during any of the qwen36 windows on the
 The 27B is **dense**: every parameter participates in every forward pass. It's slower per token than 35B-A3B — on a Ryzen AI Max+ 395 / Radeon 8060S iGPU the dense 27B at Q3_K_S clocks ~10 tok/s, versus ~27 tok/s for the MoE 35B at ~Q4 (`make bench`, 3-prompt mix) — but the working set fits comfortably on commodity GPUs and avoids the MoE-specific load-balance failure modes.
+| | Thanatos-27B (this) | [Janus-35B](https://huggingface.co/FoolDev/Janus-35B) |
 |---|---|---|
 | Architecture | Dense transformer | MoE 256 experts, 8 active |
 | Total params | 27 B | 35 B |
 | `banner.svg` / `banner.png` | Repo header, Tokyo Night themed |
 | `dense-flow.svg` / `dense-flow.png` | Architecture diagram: 64-layer hybrid attention stack with animated forward-pass pulse (SVG); static frame fallback (PNG) |
 | `Modelfile` | Ollama wrapper around the bundled Qwen 3.6 27B GGUF — used by `make build` / `ollama create` for **local** builds |
+| `template`, `system`, `params` | Used by HF's Ollama bridge when users `ollama run hf.co/FoolDev/Thanatos-27B` directly (the bridge does **not** read `Modelfile` — see [HF Ollama docs](https://huggingface.co/docs/hub/en/ollama)). Mirrors the `Modelfile`'s template / system prompt / sampling params. |
 | `examples/` | Ready-to-run Python clients for Ollama, Transformers, and llama-cpp-python |
 | `scripts/build.sh` | Pulls a qwen35-stamped GGUF from `unsloth/Qwen3.6-27B-GGUF` and runs `ollama create` (loads on today's llama.cpp / Ollama; see `make build`) |
 | `scripts/load_bundle.sh` | One-shot path from *this repo's* bundle → loadable local Ollama tag (smudges LFS pointer via `hf download` if needed, runs `ollama create`; see `make load-bundle`). Carries a qwen36 → qwen35 rebadge branch for legacy pre-rename checkouts — no-op on the current qwen35-stamped bundle. |
+| `scripts/heal_hf_pull.sh` | Legacy recovery for users who pulled `hf.co/FoolDev/Thanatos-27B` (or the pre-rename `FoolDev/Thanatos-27B`) *before* the latest qwen35 re-stamp and still have a qwen36-stamped blob in their local Ollama store: rebadges the blob qwen36 → qwen35 and rewrites the manifest's model-layer digest so the same tag becomes loadable in place. See `make heal-hf`. Idempotent and a no-op on tags already on qwen35 — fresh pulls don't need it. |
 | `scripts/smoke_test.sh` | Verifies an Ollama daemon + model, runs a round-trip, asserts no chat-template tokens leak into the response. With `TOOLS_TEST=1`, also exercises an end-to-end tool-call round-trip and checks the response shape |
 | `scripts/bench.sh` | Measures real tok/s using Ollama's `eval_count` / `eval_duration` metadata over a 3-prompt mix (run `make bench`) |
 | `scripts/fetch_vision.sh` | Pulls the vision projector (`mmproj-F16.gguf`) for llama.cpp (Ollama vision is broken upstream — see [Vision](#vision)). Renamed from `fetch_mmproj.sh` because HF's Ollama bridge auto-indexed the script as a vision projector layer (filename pattern match). |
 For 16 GB GPUs / unified-memory laptops, `make build QUANT=Q3_K_S`
 downloads the smaller ~12 GB Q3_K_S quant from
 `unsloth/Qwen3.6-27B-GGUF` (qwen35-stamped, loads directly) and
+creates a local `thanatos-27b` Ollama tag. Does not redistribute
 via this repo. For other quants use `make build QUANT=...`. The
 local-build path applies this repo's `Modelfile`; the `hf.co/...`
 path applies the root-level `template`, `system`, and `params`
 ## Architecture
 <p align="left">
+  <img src="https://huggingface.co/FoolDev/Thanatos-27B/resolve/main/dense-flow.svg" alt="animated dense forward-pass visualization: 64-layer hybrid attention stack with a pulse traversing left-to-right, illuminating Gated DeltaNet (purple) and Gated Attention (cyan) layers in turn" width="800" />
 </p>
 - Qwen 3.6 dense, 27B parameters, 64 transformer layers
 `qwen35` already loads the model the upstream code path was
 designed to load.
+`ollama run hf.co/FoolDev/Thanatos-27B` and `llama-server -m
 Thanatos-27B.Q4_K_M.gguf` both load directly on current stock
 loaders.
 ```bash
 # A. Pull straight from HF (gets the bundled Q4_K_M GGUF + the
 #    root-level template / system / params files in one step):
+ollama run hf.co/FoolDev/Thanatos-27B           # 17 GB Q4_K_M, qwen35-stamped
+# B. Build a local `thanatos-27b` tag from THIS repo's bundle
 #    (LFS smudge if needed, then `ollama create`). Useful if you
 #    want a bare local tag rather than the `hf.co/...` path:
+make load-bundle                                 # creates local tag thanatos-27b
+ollama run thanatos-27b
 # C. Bypass the bundle: download a qwen35-stamped GGUF from unsloth
 #    and build locally. Loads on every current llama.cpp / Ollama.
+make build                                              # Q4_K_M  -> thanatos-27b
 make build QUANT=Q3_K_S                                 # 12 GB smaller quant
 make build QUANT=Q5_K_M                                 # 20 GB higher quality
 make build GGUF_PATH=~/models/Qwen3.6-27B-Q4_K_M.gguf   # skip download
+ollama run thanatos-27b
 ```
 Under the hood, `make build` calls `scripts/build.sh`, which downloads the
 runs `ollama create` with the matching `Modelfile`.
 If you'd rather do it by hand: edit the `FROM` line in `Modelfile` and
+run `ollama create thanatos-27b -f Modelfile && ollama run thanatos-27b`.
 Confirm everything works:
 | App | How to load this model |
 |---|---|
+| **Ollama** | `ollama run hf.co/FoolDev/Thanatos-27B` (default Q4_K_M). Pulls the GGUF + the root-level `template` / `system` / `params` files in one step (HF's Ollama bridge ingests these three files; it does **not** read `Modelfile`). For other quants, `make build QUANT=Q3_K_S` downloads from unsloth and creates a local Ollama tag using the `Modelfile`, which is kept in sync with the bridge files. |
+| **LM Studio** | Search → `FoolDev/Thanatos-27B` → pick `Thanatos-27B.Q4_K_M.gguf`. Uses the GGUF's embedded jinja chat template (Qwen 3.6 ChatML); set the system prompt manually from the `SYSTEM` block in this repo's `Modelfile`. |
+| **Jan** | Hub → "Import from Hugging Face" → `FoolDev/Thanatos-27B`. Same template behavior as LM Studio. |
+| **llama.cpp** | `hf download FoolDev/Thanatos-27B Thanatos-27B.Q4_K_M.gguf --local-dir .` then `llama-server -m Thanatos-27B.Q4_K_M.gguf` (or `llama-cli`, `llama-mtmd-cli` for vision via the upstream `mmproj-F16.gguf`). |
 | **llama-cpp-python** | See `examples/llama_cpp_quickstart.py` (text) and `examples/llama_cpp_vision.py` (image input). |
 | **Open WebUI / KoboldCpp / text-generation-webui** | Standard llama.cpp loader path — point at the GGUF, use the embedded chat template. |
 curl -s http://localhost:11434/v1/chat/completions \
   -H 'Content-Type: application/json' \
   -d '{
+    "model": "thanatos-27b",
     "messages": [
       {"role": "system", "content": "You are Thanatos, a precise reasoning assistant."},
       {"role": "user", "content": "Explain the Burrows-Wheeler transform in 200 words."}
 `.Tools` / `.ToolCalls` blocks Ollama's capability detector requires.
 Two paths fix this, depending on how you pull the model:
+- **`ollama run hf.co/FoolDev/Thanatos-27B`** — HF's Ollama bridge applies
   the root-level `template` / `system` / `params` files in this repo
   (the bridge does **not** read `Modelfile`).
+- **`make build` / `ollama create thanatos-27b -f Modelfile`** — uses the
   `Modelfile`'s `TEMPLATE` block.
 Both routes wire `.Tools` / `.ToolCalls` and tools work end-to-end on
 **Ollama path** (this repo's `Modelfile`). The `TEMPLATE` directive
 prompts the model to emit JSON-in-XML, the form Ollama's tool-call
 extractor parses into a structured `tool_calls` array. After
+`make build`, `ollama show thanatos-27b` lists `tools` and `thinking`
 under **Capabilities**, and both `/api/chat` and `/v1/chat/completions`
 accept a `tools` array.

banner.png CHANGED Viewed

banner.svg CHANGED Viewed

examples/README.md CHANGED Viewed

@@ -1,10 +1,10 @@
-# Thanatos-27B-Heretic examples
 Four minimal entry points. Pick the one that matches how you run models.
 | File | Backend | When to use |
 |---|---|---|
-| `ollama_chat.py` | Ollama HTTP API | You already have `ollama serve` running and the `thanatos-27b-heretic` model created from the project `Modelfile`. **Text + tool calling** — vision via Ollama is broken upstream for this arch. |
 | `transformers_quickstart.py` | Hugging Face Transformers | You want to run the upstream safetensors (`Qwen/Qwen3.6-27B`) on GPU, optionally in 4-bit via bitsandbytes. |
 | `llama_cpp_quickstart.py` | llama-cpp-python | You want to invoke a local GGUF directly without a daemon (CI, batch jobs, scripts). Text only. |
 | `llama_cpp_vision.py` | llama-cpp-python + mmproj | **Image input.** Loads a text GGUF + `mmproj-F16.gguf` and answers questions about an image. The only working vision path right now. |
@@ -24,9 +24,9 @@ root-level `template` / `system` / `params` files via HF's Ollama
 bridge):
 ```bash
-ollama pull hf.co/FoolDev/Thanatos-27B-Heretic           # 17 GB Q4_K_M (only bundled quant)
 pip install requests
-MODEL=hf.co/FoolDev/Thanatos-27B-Heretic python ollama_chat.py
 ```
 If you pulled before the latest qwen35 re-stamp (HF commit
@@ -38,11 +38,11 @@ through.
 For a non-bundled quant (e.g. Q3_K_S ~12 GB, Q5_K_M ~20 GB),
 `make build QUANT=...` downloads from `unsloth/Qwen3.6-27B-GGUF`
-and creates a local `thanatos-27b-heretic` tag:
 ```bash
 cd ..  &&  make build QUANT=Q3_K_S  &&  cd examples
-MODEL=thanatos-27b-heretic python ollama_chat.py
 ```
 Or build a local tag from this repo's bundled GGUF without going
@@ -50,7 +50,7 @@ through the HF pull:
 ```bash
 cd ..  &&  make load-bundle  &&  cd examples
-MODEL=thanatos-27b-heretic python ollama_chat.py
 ```
 For a quant the repo doesn't bundle (e.g. Q5_K_M), `make build` will

+# Thanatos-27B examples
 Four minimal entry points. Pick the one that matches how you run models.
 | File | Backend | When to use |
 |---|---|---|
+| `ollama_chat.py` | Ollama HTTP API | You already have `ollama serve` running and the `thanatos-27b` model created from the project `Modelfile`. **Text + tool calling** — vision via Ollama is broken upstream for this arch. |
 | `transformers_quickstart.py` | Hugging Face Transformers | You want to run the upstream safetensors (`Qwen/Qwen3.6-27B`) on GPU, optionally in 4-bit via bitsandbytes. |
 | `llama_cpp_quickstart.py` | llama-cpp-python | You want to invoke a local GGUF directly without a daemon (CI, batch jobs, scripts). Text only. |
 | `llama_cpp_vision.py` | llama-cpp-python + mmproj | **Image input.** Loads a text GGUF + `mmproj-F16.gguf` and answers questions about an image. The only working vision path right now. |
 bridge):
 ```bash
+ollama pull hf.co/FoolDev/Thanatos-27B           # 17 GB Q4_K_M (only bundled quant)
 pip install requests
+MODEL=hf.co/FoolDev/Thanatos-27B python ollama_chat.py
 ```
 If you pulled before the latest qwen35 re-stamp (HF commit
 For a non-bundled quant (e.g. Q3_K_S ~12 GB, Q5_K_M ~20 GB),
 `make build QUANT=...` downloads from `unsloth/Qwen3.6-27B-GGUF`
+and creates a local `thanatos-27b` tag:
 ```bash
 cd ..  &&  make build QUANT=Q3_K_S  &&  cd examples
+MODEL=thanatos-27b python ollama_chat.py
 ```
 Or build a local tag from this repo's bundled GGUF without going
 ```bash
 cd ..  &&  make load-bundle  &&  cd examples
+MODEL=thanatos-27b python ollama_chat.py
 ```
 For a quant the repo doesn't bundle (e.g. Q5_K_M), `make build` will

examples/llama_cpp_quickstart.py CHANGED Viewed

@@ -1,6 +1,6 @@
 #!/usr/bin/env python3
 """
-Thanatos-27B-Heretic — llama-cpp-python quickstart.
 Skip Ollama entirely and call the GGUF directly through llama-cpp-python.
 Useful for batch jobs, CI, or environments where you don't want a daemon.

 #!/usr/bin/env python3
 """
+Thanatos-27B — llama-cpp-python quickstart.
 Skip Ollama entirely and call the GGUF directly through llama-cpp-python.
 Useful for batch jobs, CI, or environments where you don't want a daemon.

examples/llama_cpp_vision.py CHANGED Viewed

@@ -1,6 +1,6 @@
 #!/usr/bin/env python3
 """
-Thanatos-27B-Heretic — vision (image-text-to-text) via llama-cpp-python.
 Why this script exists:
     Ollama's Go engine has the qwen35 / qwen35moe arch entries (text

 #!/usr/bin/env python3
 """
+Thanatos-27B — vision (image-text-to-text) via llama-cpp-python.
 Why this script exists:
     Ollama's Go engine has the qwen35 / qwen35moe arch entries (text

examples/ollama_chat.py CHANGED Viewed

@@ -1,17 +1,17 @@
 #!/usr/bin/env python3
 """
-Thanatos-27B-Heretic — Ollama chat examples.
 Prerequisites (pick one):
     A. From the bundled GGUFs (default flow):
         $ make build                     # uses Thanatos-27B.Q4_K_M.gguf
         # or:
-        $ ollama create thanatos-27b-heretic -f ../Modelfile
     B. Pull straight from HF (Q4_K_M is the only bundled quant):
-        $ ollama run hf.co/FoolDev/Thanatos-27B-Heretic
-        # then set MODEL=hf.co/FoolDev/Thanatos-27B-Heretic below
 Then:
     $ ollama serve         # usually already running
@@ -39,7 +39,7 @@ from typing import Any, Iterator
 import requests
-MODEL = os.environ.get("MODEL", "thanatos-27b-heretic")
 HOST = os.environ.get("HOST", "http://localhost:11434")
 _THINK_RE = re.compile(r"<think>.*?</think>\s*", re.DOTALL)

 #!/usr/bin/env python3
 """
+Thanatos-27B — Ollama chat examples.
 Prerequisites (pick one):
     A. From the bundled GGUFs (default flow):
         $ make build                     # uses Thanatos-27B.Q4_K_M.gguf
         # or:
+        $ ollama create thanatos-27b -f ../Modelfile
     B. Pull straight from HF (Q4_K_M is the only bundled quant):
+        $ ollama run hf.co/FoolDev/Thanatos-27B
+        # then set MODEL=hf.co/FoolDev/Thanatos-27B below
 Then:
     $ ollama serve         # usually already running
 import requests
+MODEL = os.environ.get("MODEL", "thanatos-27b")
 HOST = os.environ.get("HOST", "http://localhost:11434")
 _THINK_RE = re.compile(r"<think>.*?</think>\s*", re.DOTALL)

examples/transformers_quickstart.py CHANGED Viewed

@@ -1,9 +1,9 @@
 #!/usr/bin/env python3
 """
-Thanatos-27B-Heretic — Hugging Face Transformers quickstart.
 Loads the upstream Qwen 3.6 27B safetensors directly and runs a single
-chat turn using its embedded chat template. Thanatos-27B-Heretic is a
 *wrapper* around that base, so for the transformers route there is nothing
 to download from this repo — point at Qwen/Qwen3.6-27B and apply the same
 system prompt the Modelfile uses.

 #!/usr/bin/env python3
 """
+Thanatos-27B — Hugging Face Transformers quickstart.
 Loads the upstream Qwen 3.6 27B safetensors directly and runs a single
+chat turn using its embedded chat template. Thanatos-27B is a
 *wrapper* around that base, so for the transformers route there is nothing
 to download from this repo — point at Qwen/Qwen3.6-27B and apply the same
 system prompt the Modelfile uses.

scripts/bench.sh CHANGED Viewed

@@ -1,5 +1,5 @@
 #!/usr/bin/env bash
-# Thanatos-27B-Heretic — tok/s benchmark via Ollama.
 #
 # Reads timing from Ollama's /api/chat response metadata (eval_count and
 # eval_duration are authoritative — no client-side stopwatch noise) and
@@ -7,14 +7,14 @@
 # number generalises a bit beyond a single shape.
 #
 # Usage:
-#   ./scripts/bench.sh                       # uses MODEL=thanatos-27b-heretic
-#   MODEL=thanatos-27b-heretic ./scripts/bench.sh
 #   HOST=http://localhost:11434 ./scripts/bench.sh
 #
 # Requires: curl, jq, a running Ollama daemon with the model created.
 set -euo pipefail
-MODEL="${MODEL:-thanatos-27b-heretic}"
 HOST="${HOST:-http://localhost:11434}"
 red()   { printf "\033[31m%s\033[0m\n" "$*" >&2; }

 #!/usr/bin/env bash
+# Thanatos-27B — tok/s benchmark via Ollama.
 #
 # Reads timing from Ollama's /api/chat response metadata (eval_count and
 # eval_duration are authoritative — no client-side stopwatch noise) and
 # number generalises a bit beyond a single shape.
 #
 # Usage:
+#   ./scripts/bench.sh                       # uses MODEL=thanatos-27b
+#   MODEL=thanatos-27b ./scripts/bench.sh
 #   HOST=http://localhost:11434 ./scripts/bench.sh
 #
 # Requires: curl, jq, a running Ollama daemon with the model created.
 set -euo pipefail
+MODEL="${MODEL:-thanatos-27b}"
 HOST="${HOST:-http://localhost:11434}"
 red()   { printf "\033[31m%s\033[0m\n" "$*" >&2; }

scripts/build.sh CHANGED Viewed

@@ -1,5 +1,5 @@
 #!/usr/bin/env bash
-# Thanatos-27B-Heretic — fetch a Qwen 3.6 27B GGUF and build the Ollama model.
 #
 # Usage:
 #   ./scripts/build.sh                       # default: Q4_K_M
@@ -28,7 +28,7 @@ ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
 GGUF_PATH="${GGUF_PATH:-${ROOT}/${GGUF_NAME}}"
 MODELFILE="${ROOT}/Modelfile"
-TAG="${TAG:-thanatos-27b-heretic}"
 echo "[*] repo:     ${REPO_ID}"
 echo "[*] quant:    ${QUANT}"
@@ -96,4 +96,4 @@ ollama create "${TAG}" -f "${TMP_MODELFILE}"
 echo
 echo "[+] Done. Try it:"
 echo "    ollama run ${TAG}"
-echo "    python ${ROOT}/examples/ollama_chat.py   # update MODEL constant if not 'thanatos-27b-heretic'"

 #!/usr/bin/env bash
+# Thanatos-27B — fetch a Qwen 3.6 27B GGUF and build the Ollama model.
 #
 # Usage:
 #   ./scripts/build.sh                       # default: Q4_K_M
 GGUF_PATH="${GGUF_PATH:-${ROOT}/${GGUF_NAME}}"
 MODELFILE="${ROOT}/Modelfile"
+TAG="${TAG:-thanatos-27b}"
 echo "[*] repo:     ${REPO_ID}"
 echo "[*] quant:    ${QUANT}"
 echo
 echo "[+] Done. Try it:"
 echo "    ollama run ${TAG}"
+echo "    python ${ROOT}/examples/ollama_chat.py   # update MODEL constant if not 'thanatos-27b'"

scripts/check.sh CHANGED Viewed

@@ -1,5 +1,5 @@
 #!/usr/bin/env bash
-# Thanatos-27B-Heretic — repo-local sanity checks.
 #
 # Runs everything that's cheap and catches a real-world bug we've already hit:
 #

 #!/usr/bin/env bash
+# Thanatos-27B — repo-local sanity checks.
 #
 # Runs everything that's cheap and catches a real-world bug we've already hit:
 #

scripts/check_bridge_sync.py CHANGED Viewed

@@ -1,13 +1,13 @@
 #!/usr/bin/env python3
 """
-Thanatos-27B-Heretic — verify Modelfile and HF Ollama bridge files stay in sync.
 The repo ships two parallel Ollama configurations:
   - ``Modelfile`` is consumed by the local-build path (``ollama create -f Modelfile``).
     It contains ``TEMPLATE`` / ``SYSTEM`` / ``PARAMETER`` directives.
   - ``template`` / ``system`` / ``params`` at the repo root are consumed by HF's
-    Ollama bridge when users ``ollama run hf.co/FoolDev/Thanatos-27B-Heretic`` directly. HF
     does NOT read the Modelfile (per https://huggingface.co/docs/hub/en/ollama).
 If the two configurations drift apart, ``hf.co/...`` users and ``make build``

 #!/usr/bin/env python3
 """
+Thanatos-27B — verify Modelfile and HF Ollama bridge files stay in sync.
 The repo ships two parallel Ollama configurations:
   - ``Modelfile`` is consumed by the local-build path (``ollama create -f Modelfile``).
     It contains ``TEMPLATE`` / ``SYSTEM`` / ``PARAMETER`` directives.
   - ``template`` / ``system`` / ``params`` at the repo root are consumed by HF's
+    Ollama bridge when users ``ollama run hf.co/FoolDev/Thanatos-27B`` directly. HF
     does NOT read the Modelfile (per https://huggingface.co/docs/hub/en/ollama).
 If the two configurations drift apart, ``hf.co/...`` users and ``make build``

scripts/fetch_vision.sh CHANGED Viewed

@@ -1,5 +1,5 @@
 #!/usr/bin/env bash
-# Thanatos-27B-Heretic — fetch the vision projector (mmproj) for image input.
 #
 # Why this is separate from build.sh:
 #   build.sh is for the Ollama text path. The mmproj is only useful for

 #!/usr/bin/env bash
+# Thanatos-27B — fetch the vision projector (mmproj) for image input.
 #
 # Why this is separate from build.sh:
 #   build.sh is for the Ollama text path. The mmproj is only useful for

scripts/heal_hf_pull.sh CHANGED Viewed

@@ -1,10 +1,10 @@
 #!/usr/bin/env bash
-# Thanatos-27B-Heretic — heal a previously pulled HF-bridge tag whose bundled
 # GGUF is `qwen36`-stamped (legacy v0.6.0-era pulls before `964e418`,
 # 3rd-round-trip-era pulls between `973d7ef` and `978798f`, or
 # 5th-round-trip-era pulls between `ae67ed1` and `e03e10e`).
 #
-# Fresh pulls of `ollama run hf.co/FoolDev/Thanatos-27B-Heretic` now get the
 # qwen35-stamped bundle and load directly — this script is the
 # recovery path for users who pulled a qwen36-stamped blob into
 # their local Ollama store during one of the qwen36 windows
@@ -13,7 +13,7 @@
 # It rebadges the HF-bridge tag's model blob in-place (qwen36 ->
 # qwen35, metadata-only, byte-identical tensors) and rewrites the
 # manifest's model-layer digest to point at the new blob. After
-# running, the cached `hf.co/FoolDev/Thanatos-27B-Heretic` tag loads.
 #
 # Idempotent: a tag already on qwen35 / qwen35moe is left untouched.
 # The current bundle is qwen35-stamped so this script is a no-op for
@@ -22,13 +22,13 @@
 #
 # Usage:
 #   ./scripts/heal_hf_pull.sh                                # default tag
-#   TAG=hf.co/FoolDev/Thanatos-27B-Heretic:Q4_K_M ./scripts/heal_hf_pull.sh
 #
 # Requires: ollama, jq, python3 with the `gguf` package, sha256sum.
 set -euo pipefail
 ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
-TAG="${TAG:-hf.co/FoolDev/Thanatos-27B-Heretic:Q4_K_M}"
 OLLAMA_MODELS="${OLLAMA_MODELS:-${HOME}/.ollama/models}"
 red()   { printf "\033[31m%s\033[0m\n" "$*"; }
@@ -50,7 +50,7 @@ done
 # `ollama show --modelfile` writes a FROM line with the absolute blob path.
 # Reliable regardless of which case variant the user pulled with
-# (hf.co's 307 lets `Thanatos-27B-Heretic` and `thanatos-27b-heretic` both resolve to the
 # canonical repo, and ollama stores the manifest under whichever case
 # was first registered).
 #
@@ -79,8 +79,8 @@ blue "[*] blob:   ${MODEL_BLOB}"
 # referenced from exactly one tag in the heal scenario — fresh HF pull
 # of a single :Q4_K_M tag — but if someone has multiple tags pointing
 # at the same blob, we filter down to the one matching ${TAG}.
-TAG_PATH="${TAG#hf.co/}"      # FoolDev/Thanatos-27B-Heretic:Q4_K_M
-NAMESPACE_PATH="${TAG_PATH%:*}" # FoolDev/Thanatos-27B-Heretic
 TAG_FILE="${TAG_PATH##*:}"    # Q4_K_M
 MANIFEST="$(find "${OLLAMA_MODELS}/manifests/hf.co" \

 #!/usr/bin/env bash
+# Thanatos-27B — heal a previously pulled HF-bridge tag whose bundled
 # GGUF is `qwen36`-stamped (legacy v0.6.0-era pulls before `964e418`,
 # 3rd-round-trip-era pulls between `973d7ef` and `978798f`, or
 # 5th-round-trip-era pulls between `ae67ed1` and `e03e10e`).
 #
+# Fresh pulls of `ollama run hf.co/FoolDev/Thanatos-27B` now get the
 # qwen35-stamped bundle and load directly — this script is the
 # recovery path for users who pulled a qwen36-stamped blob into
 # their local Ollama store during one of the qwen36 windows
 # It rebadges the HF-bridge tag's model blob in-place (qwen36 ->
 # qwen35, metadata-only, byte-identical tensors) and rewrites the
 # manifest's model-layer digest to point at the new blob. After
+# running, the cached `hf.co/FoolDev/Thanatos-27B` tag loads.
 #
 # Idempotent: a tag already on qwen35 / qwen35moe is left untouched.
 # The current bundle is qwen35-stamped so this script is a no-op for
 #
 # Usage:
 #   ./scripts/heal_hf_pull.sh                                # default tag
+#   TAG=hf.co/FoolDev/Thanatos-27B:Q4_K_M ./scripts/heal_hf_pull.sh
 #
 # Requires: ollama, jq, python3 with the `gguf` package, sha256sum.
 set -euo pipefail
 ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
+TAG="${TAG:-hf.co/FoolDev/Thanatos-27B:Q4_K_M}"
 OLLAMA_MODELS="${OLLAMA_MODELS:-${HOME}/.ollama/models}"
 red()   { printf "\033[31m%s\033[0m\n" "$*"; }
 # `ollama show --modelfile` writes a FROM line with the absolute blob path.
 # Reliable regardless of which case variant the user pulled with
+# (hf.co's 307 lets `Thanatos-27B` and `thanatos-27b` both resolve to the
 # canonical repo, and ollama stores the manifest under whichever case
 # was first registered).
 #
 # referenced from exactly one tag in the heal scenario — fresh HF pull
 # of a single :Q4_K_M tag — but if someone has multiple tags pointing
 # at the same blob, we filter down to the one matching ${TAG}.
+TAG_PATH="${TAG#hf.co/}"      # FoolDev/Thanatos-27B:Q4_K_M
+NAMESPACE_PATH="${TAG_PATH%:*}" # FoolDev/Thanatos-27B
 TAG_FILE="${TAG_PATH##*:}"    # Q4_K_M
 MANIFEST="$(find "${OLLAMA_MODELS}/manifests/hf.co" \

scripts/install-hooks.sh CHANGED Viewed

@@ -1,5 +1,5 @@
 #!/usr/bin/env bash
-# Thanatos-27B-Heretic — install scripts/check.sh as a git pre-commit hook.
 #
 # Idempotent. Re-runs are safe.
 set -euo pipefail

 #!/usr/bin/env bash
+# Thanatos-27B — install scripts/check.sh as a git pre-commit hook.
 #
 # Idempotent. Re-runs are safe.
 set -euo pipefail

scripts/load_bundle.sh CHANGED Viewed

@@ -1,5 +1,5 @@
 #!/usr/bin/env bash
-# Thanatos-27B-Heretic — load this repo's bundle into Ollama as a local tag.
 #
 # The bundled GGUF (Thanatos-27B.Q4_K_M.gguf) is qwen35-stamped and
 # loads directly on stock llama.cpp / Ollama. This script is the
@@ -15,13 +15,13 @@
 #   3. Run `ollama create <tag> -f <temp Modelfile pointing at the
 #      resolved bundle>`.
 #
-# Useful if you want a bare local tag (`thanatos-27b-heretic`) rather than
-# the `hf.co/FoolDev/Thanatos-27B-Heretic` path. The legacy qwen36 rebadge
 # branch is kept for anyone working from a pre-e03e10e checkout.
 #
 # Usage:
-#   ./scripts/load_bundle.sh                 # default tag: thanatos-27b-heretic
-#   TAG=thanatos-27b-heretic-bundle ./scripts/load_bundle.sh
 #   BUNDLE=/path/to/Thanatos-27B.Q4_K_M.gguf ./scripts/load_bundle.sh
 #
 # Requires: ollama, python3 with the `gguf` package, hf (if the bundle
@@ -30,8 +30,8 @@ set -euo pipefail
 ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
 BUNDLE="${BUNDLE:-${ROOT}/Thanatos-27B.Q4_K_M.gguf}"
-TAG="${TAG:-thanatos-27b-heretic}"
-REPO_ID="${REPO_ID:-FoolDev/Thanatos-27B-Heretic}"
 MODELFILE="${ROOT}/Modelfile"
 red()    { printf "\033[31m%s\033[0m\n" "$*"; }

 #!/usr/bin/env bash
+# Thanatos-27B — load this repo's bundle into Ollama as a local tag.
 #
 # The bundled GGUF (Thanatos-27B.Q4_K_M.gguf) is qwen35-stamped and
 # loads directly on stock llama.cpp / Ollama. This script is the
 #   3. Run `ollama create <tag> -f <temp Modelfile pointing at the
 #      resolved bundle>`.
 #
+# Useful if you want a bare local tag (`thanatos-27b`) rather than
+# the `hf.co/FoolDev/Thanatos-27B` path. The legacy qwen36 rebadge
 # branch is kept for anyone working from a pre-e03e10e checkout.
 #
 # Usage:
+#   ./scripts/load_bundle.sh                 # default tag: thanatos-27b
+#   TAG=thanatos-27b-bundle ./scripts/load_bundle.sh
 #   BUNDLE=/path/to/Thanatos-27B.Q4_K_M.gguf ./scripts/load_bundle.sh
 #
 # Requires: ollama, python3 with the `gguf` package, hf (if the bundle
 ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
 BUNDLE="${BUNDLE:-${ROOT}/Thanatos-27B.Q4_K_M.gguf}"
+TAG="${TAG:-thanatos-27b}"
+REPO_ID="${REPO_ID:-FoolDev/Thanatos-27B}"
 MODELFILE="${ROOT}/Modelfile"
 red()    { printf "\033[31m%s\033[0m\n" "$*"; }

scripts/smoke_test.sh CHANGED Viewed

@@ -1,5 +1,5 @@
 #!/usr/bin/env bash
-# Thanatos-27B-Heretic — smoke test against a running Ollama daemon.
 #
 # Verifies:
 #   1. The Ollama server is reachable.
@@ -14,11 +14,11 @@
 # Usage:
 #   ./scripts/smoke_test.sh                       # fast checks only
 #   TOOLS_TEST=1 ./scripts/smoke_test.sh          # add tool-call round-trip
-#   MODEL=hf.co/FoolDev/Thanatos-27B-Heretic:Q4_K_M ./scripts/smoke_test.sh
 #   HOST=http://localhost:11434 ./scripts/smoke_test.sh
 set -euo pipefail
-MODEL="${MODEL:-thanatos-27b-heretic}"
 HOST="${HOST:-http://localhost:11434}"
 PROMPT="${PROMPT:-Reply with the single word: OK}"
@@ -46,9 +46,9 @@ green "[+] server reachable"
 # 2. Model present? Match case-insensitively: Ollama 0.24 normalizes
 # model names at lookup but preserves whatever case was first registered
-# on disk (e.g. `make load-bundle` may produce `Thanatos-27B-Heretic:latest`
-# even when invoked with TAG=thanatos-27b-heretic, if an earlier session left a
-# Thanatos-27B-Heretic manifest dir behind). The exact tag the user typed is
 # still valid for `ollama run` — the comparison just needs to be
 # case-folded to match.
 if ! curl -fsS "${HOST}/api/tags" | jq -e --arg m "${MODEL}" '.models[] | select((.name | ascii_downcase) | startswith($m | ascii_downcase))' >/dev/null; then

 #!/usr/bin/env bash
+# Thanatos-27B — smoke test against a running Ollama daemon.
 #
 # Verifies:
 #   1. The Ollama server is reachable.
 # Usage:
 #   ./scripts/smoke_test.sh                       # fast checks only
 #   TOOLS_TEST=1 ./scripts/smoke_test.sh          # add tool-call round-trip
+#   MODEL=hf.co/FoolDev/Thanatos-27B:Q4_K_M ./scripts/smoke_test.sh
 #   HOST=http://localhost:11434 ./scripts/smoke_test.sh
 set -euo pipefail
+MODEL="${MODEL:-thanatos-27b}"
 HOST="${HOST:-http://localhost:11434}"
 PROMPT="${PROMPT:-Reply with the single word: OK}"
 # 2. Model present? Match case-insensitively: Ollama 0.24 normalizes
 # model names at lookup but preserves whatever case was first registered
+# on disk (e.g. `make load-bundle` may produce `Thanatos-27B:latest`
+# even when invoked with TAG=thanatos-27b, if an earlier session left a
+# Thanatos-27B manifest dir behind). The exact tag the user typed is
 # still valid for `ollama run` — the comparison just needs to be
 # case-folded to match.
 if ! curl -fsS "${HOST}/api/tags" | jq -e --arg m "${MODEL}" '.models[] | select((.name | ascii_downcase) | startswith($m | ascii_downcase))' >/dev/null; then

scripts/verify_arch.py CHANGED Viewed

@@ -1,6 +1,6 @@
 #!/usr/bin/env python3
 """
-Thanatos-27B-Heretic — verify the README "Architecture" forward-pass bullets
 against the actual GGUF metadata.
 Reads either the qwen35- or qwen36-stamped bundle (or any GGUF that
@@ -69,8 +69,8 @@ def main() -> int:
         return 2
     root = Path(__file__).resolve().parent.parent
     default_paths = [
-        root / "Thanatos-27B-Heretic.Q4_K_M.qwen35.gguf",
-        root / "Thanatos-27B-Heretic.Q4_K_M.qwen36.gguf",
         root / "Thanatos-27B.Q4_K_M.gguf",
     ]
     if len(sys.argv) == 2:
@@ -78,7 +78,7 @@ def main() -> int:
     else:
         path = next((p for p in default_paths if p.exists() and p.stat().st_size > 1024), None)
         if path is None:
-            print("[!] no Thanatos-27B-Heretic GGUF found in repo root; pass a path explicitly", file=sys.stderr)
             return 2
     print(f"[*] reading: {path}")

 #!/usr/bin/env python3
 """
+Thanatos-27B — verify the README "Architecture" forward-pass bullets
 against the actual GGUF metadata.
 Reads either the qwen35- or qwen36-stamped bundle (or any GGUF that
         return 2
     root = Path(__file__).resolve().parent.parent
     default_paths = [
+        root / "Thanatos-27B.Q4_K_M.qwen35.gguf",
+        root / "Thanatos-27B.Q4_K_M.qwen36.gguf",
         root / "Thanatos-27B.Q4_K_M.gguf",
     ]
     if len(sys.argv) == 2:
     else:
         path = next((p for p in default_paths if p.exists() and p.stat().st_size > 1024), None)
         if path is None:
+            print("[!] no Thanatos-27B GGUF found in repo root; pass a path explicitly", file=sys.stderr)
             return 2
     print(f"[*] reading: {path}")