Instructions to use FoolDev/Thanatos-27B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use FoolDev/Thanatos-27B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="FoolDev/Thanatos-27B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("FoolDev/Thanatos-27B", dtype="auto")

llama-cpp-python

How to use FoolDev/Thanatos-27B with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="FoolDev/Thanatos-27B",
	filename="Thanatos-27B.Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use FoolDev/Thanatos-27B with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf FoolDev/Thanatos-27B:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf FoolDev/Thanatos-27B:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf FoolDev/Thanatos-27B:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf FoolDev/Thanatos-27B:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf FoolDev/Thanatos-27B:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf FoolDev/Thanatos-27B:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf FoolDev/Thanatos-27B:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf FoolDev/Thanatos-27B:Q4_K_M

Use Docker

docker model run hf.co/FoolDev/Thanatos-27B:Q4_K_M

LM Studio
Jan

vLLM

How to use FoolDev/Thanatos-27B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "FoolDev/Thanatos-27B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FoolDev/Thanatos-27B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/FoolDev/Thanatos-27B:Q4_K_M

SGLang

How to use FoolDev/Thanatos-27B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "FoolDev/Thanatos-27B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FoolDev/Thanatos-27B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "FoolDev/Thanatos-27B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FoolDev/Thanatos-27B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Ollama
How to use FoolDev/Thanatos-27B with Ollama:
```
ollama run hf.co/FoolDev/Thanatos-27B:Q4_K_M
```

Unsloth Studio

How to use FoolDev/Thanatos-27B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for FoolDev/Thanatos-27B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for FoolDev/Thanatos-27B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for FoolDev/Thanatos-27B to start chatting

How to use FoolDev/Thanatos-27B with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf FoolDev/Thanatos-27B:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "FoolDev/Thanatos-27B:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use FoolDev/Thanatos-27B with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf FoolDev/Thanatos-27B:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default FoolDev/Thanatos-27B:Q4_K_M

Run Hermes

hermes

Atomic Chat new

OpenClaw new

How to use FoolDev/Thanatos-27B with OpenClaw:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf FoolDev/Thanatos-27B:Q4_K_M

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "FoolDev/Thanatos-27B:Q4_K_M" \
  --custom-provider-id llama-cpp \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use FoolDev/Thanatos-27B with Docker Model Runner:
```
docker model run hf.co/FoolDev/Thanatos-27B:Q4_K_M
```

Lemonade

How to use FoolDev/Thanatos-27B with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull FoolDev/Thanatos-27B:Q4_K_M

Run and chat with the model

lemonade run user.Thanatos-27B-Q4_K_M

List all available models

lemonade list

FoolDev commited on May 19

Commit

e1f78fa

verified ·

1 Parent(s): a60eff5

Rebadge bundle qwen35 -> qwen36 + doc the workaround

Browse files

Re-flip general.architecture in the bundled Q4_K_M GGUF to qwen36, the architecturally-honest label. Reverses f0d70ee (which itself reverted 2dbe526). No released llama.cpp / Ollama recognizes qwen36 yet (reconfirmed 2026-05-19 against llama.cpp 389ff61 + Ollama 0.24.0), so the bundle is unloadable on stock loaders until upstream adds the arch entry. README/Modelfile/CHANGELOG updated with the qwen36 -> qwen35 rebadge workaround (scripts/rename_arch.py is metadata-only, tensor data byte-identical). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Files changed (4) hide show

CHANGELOG.md +21 -0
Modelfile +15 -4
README.md +67 -42
Thanatos-27B.Q4_K_M.gguf +1 -1

CHANGELOG.md CHANGED Viewed

@@ -7,6 +7,27 @@ and documentation**, not the underlying base model.
 ## [Unreleased]
 ### Added
 - README "Vision via llama.cpp" subsection now leads with the
   `llama-server --mmproj` HTTP path (always built into stock llama.cpp,

 ## [Unreleased]
+### Changed
+- **BREAKING (loaders).** Bundle re-stamped from
+  `general.architecture: 'qwen35'` to `'qwen36'` — the
+  architecturally-honest label. No released llama.cpp / Ollama
+  recognizes `qwen36` yet (reconfirmed 2026-05-19 against llama.cpp
+  389ff61 and Ollama 0.24.0), so `ollama run hf.co/FoolDev/Thanatos-27B`
+  and direct `llama-server -m Thanatos-27B.Q4_K_M.gguf` both fail with
+  `unknown model architecture: 'qwen36'` until upstream adds the entry.
+  To load *today*, rebadge to qwen35 locally with
+  `scripts/rename_arch.py --from-arch qwen36 --to-arch qwen35` — see
+  the README "Architecture" section for the one-liner. Tensor data
+  stays byte-identical; only metadata flips. Reverses prior commit
+  `f0d70ee` which had restored qwen35 for compatibility; deliberate
+  re-flip with eyes open about the breakage.
+- README "Heads up" callout near the top of the model card, TL;DR,
+  loader matrix, Quick start Ollama block, and Modelfile preamble
+  all updated to flag the qwen36 stamp and point users at the
+  rebadge workaround. Quick-start option B (`make build` from
+  unsloth) called out as the loads-today path, since unsloth still
+  ships qwen35-stamped GGUFs.
 ### Added
 - README "Vision via llama.cpp" subsection now leads with the
   `llama-server --mmproj` HTTP path (always built into stock llama.cpp,

Modelfile CHANGED Viewed

@@ -6,11 +6,22 @@
 # Ollama uses when an mmproj is attached). Use llama.cpp directly for
 # image input, or wait for the fix. See the Vision section in README.md.
 #
-# This repo bundles a single GGUF: Thanatos-27B.Q4_K_M.gguf (~17 GB).
-# The FROM line below points at it, so a fresh clone (with LFS smudge
-# enabled) supports the no-script path:
 #
-#     ollama create thanatos-27b -f Modelfile && ollama run thanatos-27b
 #
 # For other quants (Q3_K_S, Q5_K_M, Q6_K, etc.), `make build QUANT=Q3_K_S`
 # downloads the chosen quant from unsloth/Qwen3.6-27B-GGUF and patches

 # Ollama uses when an mmproj is attached). Use llama.cpp directly for
 # image input, or wait for the fix. See the Vision section in README.md.
 #
+# This repo bundles a single GGUF: Thanatos-27B.Q4_K_M.gguf (~17 GB),
+# stamped `general.architecture: 'qwen36'`. The FROM line below points
+# at it, but no released llama.cpp / Ollama recognizes `qwen36` yet, so
+# `ollama create thanatos-27b -f Modelfile && ollama run thanatos-27b`
+# fails today with `unknown model architecture: 'qwen36'`. To load now,
+# rebadge the bundle to `qwen35` first:
 #
+#     python3 scripts/rename_arch.py --from-arch qwen36 --to-arch qwen35 \
+#         Thanatos-27B.Q4_K_M.gguf Thanatos-27B.Q4_K_M.qwen35.gguf
+#     # then temporarily point FROM at the qwen35 file, or use:
+#     #   echo "FROM $PWD/Thanatos-27B.Q4_K_M.qwen35.gguf" > /tmp/Modelfile.qwen35
+#     #   ollama create thanatos-27b -f /tmp/Modelfile.qwen35
+#
+# Once upstream adds the qwen36 arch entry the workaround disappears
+# and the FROM line below works as-is. See README "Architecture" for
+# the full story.
 #
 # For other quants (Q3_K_S, Q5_K_M, Q6_K, etc.), `make build QUANT=Q3_K_S`
 # downloads the chosen quant from unsloth/Qwen3.6-27B-GGUF and patches

README.md CHANGED Viewed

@@ -61,6 +61,15 @@ pipeline_tag: image-text-to-text
 A personal sibling to [`FoolDev/Janus-35B`](https://huggingface.co/FoolDev/Janus-35B). Same teacher (Claude Opus 4.7), same dataset family, but built on the **dense** [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) base instead of the 35B-A3B MoE. Smaller, easier to deploy, no expert-routing surprises.
 ## TL;DR
 One-liner via Hugging Face (pulls a GGUF + this repo's root-level
@@ -69,12 +78,16 @@ template — HF's Ollama bridge ingests those three files, not
 `Modelfile`):
 ```bash
-ollama run hf.co/FoolDev/Thanatos-27B           # ~17 GB Q4_K_M (the only bundled quant)
 ```
 For other quants (Q3_K_S ~12 GB, Q5_K_M ~20 GB, etc.), `make build
-QUANT=Q3_K_S` downloads from `unsloth/Qwen3.6-27B-GGUF` and creates the
-local Ollama tag. See [Quick start](#quick-start) below.
 Or build locally (uses this repo's `Modelfile`, kept in sync with the
 three bridge files) for any quant:
@@ -131,16 +144,18 @@ The 27B is **dense**: every parameter participates in every forward pass. It's s
 | `README.md` | This file |
 This repo ships a single GGUF to back the HF/Ollama "Use this model"
-widget — `Thanatos-27B.Q4_K_M.gguf` (~17 GB):
 ```bash
-ollama run hf.co/FoolDev/Thanatos-27B           # 17 GB Q4_K_M (only bundled quant)
 ```
-For 16 GB GPUs / unified-memory laptops, `make build QUANT=Q3_K_S`
-downloads the smaller ~12 GB Q3_K_S quant from `unsloth/Qwen3.6-27B-GGUF`
-and creates a local `thanatos-27b` Ollama tag (does not redistribute via
-this repo).
 For other quants or local builds, pull from
 [`unsloth/Qwen3.6-27B-GGUF`](https://huggingface.co/unsloth/Qwen3.6-27B-GGUF)
@@ -169,39 +184,42 @@ If you want the safetensors for `transformers`, fetch them from [`Qwen/Qwen3.6-2
   current loader compatibility.
 - Multi-token prediction (MTP) head trained for speculative decoding
-The bundled GGUF declares `general.architecture: 'qwen35'`, not
-`'qwen36'`, on purpose. Upstream `ggml-org/llama.cpp` and `ollama/ollama`
-both register Qwen 3.5 and Qwen 3.6 under the same `qwen35` / `qwen35moe`
-arch tags because the hybrid SSM + attention stack is shared between
-the two generations — there is no separate `qwen36` arch entry in
-either project (checked 2026-05-19, master / main respectively).
-Rebadging the GGUF to `qwen36` makes it unloadable (`error loading
-model architecture: unknown model architecture: 'qwen36'`).
-`scripts/rename_arch.py` is in the repo for the day that changes
-upstream — flipping `qwen35 → qwen36` (or back) is a metadata-only
-operation, tensor data stays byte-identical.
-Want to retest the qwen36 path on your machine? Smudge LFS, then:
 ```bash
 python3 scripts/rename_arch.py \
     Thanatos-27B.Q4_K_M.gguf \
-    Thanatos-27B.Q4_K_M.qwen36.gguf
-# Use an absolute path — ollama resolves a relative FROM against the
-# Modelfile's directory, not your CWD, so `FROM ./...` in /tmp/ fails
-# with a misleading 400 / "unable to load model" before the loader runs.
-echo "FROM $PWD/Thanatos-27B.Q4_K_M.qwen36.gguf" > /tmp/Modelfile.qwen36
-ollama create test-qwen36 -f /tmp/Modelfile.qwen36
-ollama run test-qwen36 hi
 ```
-Today that last line errors with `500 Internal Server Error: unable
-to load model: <blob path>`. The CLI no longer surfaces the arch
-name; check `journalctl --user -u ollama` (or wherever your server
-logs go) for the underlying `error loading model architecture:
-unknown model architecture: 'qwen36'`. The day that line succeeds is
-the day the bundle can be flipped (reconfirmed 2026-05-19 against
-Ollama 0.24.0).
 ## Quick start
@@ -211,10 +229,13 @@ Two paths:
 ```bash
 # A. Pull straight from HF (uses the bundled Q4_K_M + root-level
-#    template / system / params files):
-ollama run hf.co/FoolDev/Thanatos-27B           # 17 GB Q4_K_M (only bundled quant)
-# B. Build locally for a different quant (downloads from unsloth):
 make build                                              # Q4_K_M  -> thanatos-27b
 make build QUANT=Q3_K_S                                 # 12 GB smaller quant
 make build QUANT=Q5_K_M                                 # 20 GB higher quality
@@ -240,12 +261,16 @@ python examples/ollama_chat.py      # full demo: chat, streaming, tools, OpenAI-
 ### Local apps
-The bundled `Thanatos-27B.Q4_K_M.gguf` works in any GGUF-compatible local
-app — point it at this repo and load.
 | App | How to load this model |
 |---|---|
-| **Ollama** | `ollama run hf.co/FoolDev/Thanatos-27B` (default Q4_K_M). Pulls the GGUF + the root-level `template` / `system` / `params` files in one step (HF's Ollama bridge ingests these three files; it does **not** read `Modelfile`). For other quants, `make build QUANT=Q3_K_S` downloads from unsloth and creates a local Ollama tag using the `Modelfile`, which is kept in sync with the bridge files. |
 | **LM Studio** | Search → `FoolDev/Thanatos-27B` → pick `Thanatos-27B.Q4_K_M.gguf`. Uses the GGUF's embedded jinja chat template (Qwen 3.6 ChatML); set the system prompt manually from the `SYSTEM` block in this repo's `Modelfile`. |
 | **Jan** | Hub → "Import from Hugging Face" → `FoolDev/Thanatos-27B`. Same template behavior as LM Studio. |
 | **llama.cpp** | `hf download FoolDev/Thanatos-27B Thanatos-27B.Q4_K_M.gguf --local-dir .` then `llama-server -m Thanatos-27B.Q4_K_M.gguf` (or `llama-cli`, `llama-mtmd-cli` for vision via the upstream `mmproj-F16.gguf`). |

 A personal sibling to [`FoolDev/Janus-35B`](https://huggingface.co/FoolDev/Janus-35B). Same teacher (Claude Opus 4.7), same dataset family, but built on the **dense** [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) base instead of the 35B-A3B MoE. Smaller, easier to deploy, no expert-routing surprises.
+> ⚠️ **Heads up — bundle is stamped `qwen36`.** As of 2026-05-19 the
+> bundled GGUF declares `general.architecture: 'qwen36'`, which no
+> released llama.cpp / Ollama recognizes yet. `ollama run
+> hf.co/FoolDev/Thanatos-27B` and `llama-server -m
+> Thanatos-27B.Q4_K_M.gguf` both fail today with `unknown model
+> architecture: 'qwen36'`. To load now, rebadge locally to `qwen35`:
+> see [Architecture](#architecture) for the one-liner. Once upstream
+> ships qwen36 the workaround disappears.
 ## TL;DR
 One-liner via Hugging Face (pulls a GGUF + this repo's root-level
 `Modelfile`):
 ```bash
+ollama run hf.co/FoolDev/Thanatos-27B           # ~17 GB Q4_K_M, qwen36-stamped (see Heads-up above)
 ```
+That command fails today with `unknown model architecture: 'qwen36'`
+until you rebadge locally — see [Architecture](#architecture).
 For other quants (Q3_K_S ~12 GB, Q5_K_M ~20 GB, etc.), `make build
+QUANT=Q3_K_S` downloads from `unsloth/Qwen3.6-27B-GGUF` (which still
+ships `qwen35`-stamped GGUFs) and creates the local Ollama tag.
+See [Quick start](#quick-start) below.
 Or build locally (uses this repo's `Modelfile`, kept in sync with the
 three bridge files) for any quant:
 | `README.md` | This file |
 This repo ships a single GGUF to back the HF/Ollama "Use this model"
+widget — `Thanatos-27B.Q4_K_M.gguf` (~17 GB, qwen36-stamped):
 ```bash
+ollama run hf.co/FoolDev/Thanatos-27B           # 17 GB Q4_K_M, qwen36 — fails today, see Heads-up above
 ```
+For 16 GB GPUs / unified-memory laptops — and as a working-today
+fallback while the bundle waits on upstream qwen36 support —
+`make build QUANT=Q3_K_S` downloads the smaller ~12 GB Q3_K_S quant
+from `unsloth/Qwen3.6-27B-GGUF` (qwen35-stamped, loads on every
+current llama.cpp / Ollama build) and creates a local `thanatos-27b`
+Ollama tag. Does not redistribute via this repo.
 For other quants or local builds, pull from
 [`unsloth/Qwen3.6-27B-GGUF`](https://huggingface.co/unsloth/Qwen3.6-27B-GGUF)
   current loader compatibility.
 - Multi-token prediction (MTP) head trained for speculative decoding
+**The bundled GGUF declares `general.architecture: 'qwen36'`** — the
+architecturally-honest stamp. Upstream `ggml-org/llama.cpp` and
+`ollama/ollama` currently only register the hybrid SSM + attention
+stack under `qwen35` / `qwen35moe`; no `qwen36` arch entry exists yet
+(reconfirmed 2026-05-19 against llama.cpp 389ff61 and Ollama 0.24.0).
+Consequence: **the bundle is unloadable on current stock loaders.**
+`ollama run hf.co/FoolDev/Thanatos-27B` and `llama-server -m ...` both
+fail with `error loading model architecture: unknown model
+architecture: 'qwen36'` (Ollama 0.24 surfaces it as a 500 wrapping a
+generic `unable to load model: <blob>` — check `journalctl --user -u
+ollama` for the underlying line).
+This is intentional. The bundle is the *correct* metadata; the
+loaders are the lagging side. The flip is reversible until upstream
+catches up — go the other direction locally with
+`scripts/rename_arch.py` (metadata-only, tensors stay byte-identical):
 ```bash
+# Get the bundle in a loadable state on today's llama.cpp / Ollama:
 python3 scripts/rename_arch.py \
+    --from-arch qwen36 --to-arch qwen35 \
     Thanatos-27B.Q4_K_M.gguf \
+    Thanatos-27B.Q4_K_M.qwen35.gguf
+# Then either build a local Ollama tag (note absolute path —
+# `ollama create` resolves a relative FROM against the Modelfile's
+# directory, not your CWD):
+echo "FROM $PWD/Thanatos-27B.Q4_K_M.qwen35.gguf" > /tmp/Modelfile.qwen35
+ollama create thanatos-27b -f /tmp/Modelfile.qwen35
+ollama run thanatos-27b hi
+# …or point llama-server at the qwen35 file directly:
+llama-server -m Thanatos-27B.Q4_K_M.qwen35.gguf -ngl 99 -c 8192
 ```
+Once upstream adds the `qwen36` arch entry — patch landed in
+`ggml-org/llama.cpp` and propagated into Ollama — the bundle works
+as-is and the workaround above can be deleted.
 ## Quick start
 ```bash
 # A. Pull straight from HF (uses the bundled Q4_K_M + root-level
+#    template / system / params files). Fails today with
+#    `unknown model architecture: 'qwen36'`; see Architecture for
+#    the qwen36 → qwen35 rebadge workaround.
+ollama run hf.co/FoolDev/Thanatos-27B           # 17 GB Q4_K_M, qwen36-stamped
+# B. Build locally for a different quant (downloads qwen35-stamped
+#    GGUFs from unsloth — these load on today's llama.cpp / Ollama):
 make build                                              # Q4_K_M  -> thanatos-27b
 make build QUANT=Q3_K_S                                 # 12 GB smaller quant
 make build QUANT=Q5_K_M                                 # 20 GB higher quality
 ### Local apps
+The bundled `Thanatos-27B.Q4_K_M.gguf` is `qwen36`-stamped — every row
+below assumes you've rebadged it to `qwen35` per
+[Architecture](#architecture), or that you're pulling a `qwen35`-stamped
+GGUF from `unsloth/Qwen3.6-27B-GGUF` instead. The "fails today with
+`unknown model architecture: 'qwen36'`" caveat applies to every row
+until that's done.
 | App | How to load this model |
 |---|---|
+| **Ollama** | `ollama run hf.co/FoolDev/Thanatos-27B` (default Q4_K_M). Pulls the GGUF + the root-level `template` / `system` / `params` files in one step (HF's Ollama bridge ingests these three files; it does **not** read `Modelfile`). For other quants, or to bypass the qwen36 block today, `make build QUANT=Q3_K_S` downloads from unsloth (qwen35-stamped) and creates a local Ollama tag using the `Modelfile`, which is kept in sync with the bridge files. |
 | **LM Studio** | Search → `FoolDev/Thanatos-27B` → pick `Thanatos-27B.Q4_K_M.gguf`. Uses the GGUF's embedded jinja chat template (Qwen 3.6 ChatML); set the system prompt manually from the `SYSTEM` block in this repo's `Modelfile`. |
 | **Jan** | Hub → "Import from Hugging Face" → `FoolDev/Thanatos-27B`. Same template behavior as LM Studio. |
 | **llama.cpp** | `hf download FoolDev/Thanatos-27B Thanatos-27B.Q4_K_M.gguf --local-dir .` then `llama-server -m Thanatos-27B.Q4_K_M.gguf` (or `llama-cli`, `llama-mtmd-cli` for vision via the upstream `mmproj-F16.gguf`). |

Thanatos-27B.Q4_K_M.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5ed60d0af4650a854b1755bd392f9aef4872643dc25a254bc68043fa638392a0
 size 16817244384

 version https://git-lfs.github.com/spec/v1
+oid sha256:b2b63b941d714b0aff5e9afc9b337e7607b84e371ba991c182d92d0f7805c0e9
 size 16817244384