Instructions to use FoolDev/Thanatos-27B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use FoolDev/Thanatos-27B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="FoolDev/Thanatos-27B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("FoolDev/Thanatos-27B", dtype="auto")

llama-cpp-python

How to use FoolDev/Thanatos-27B with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="FoolDev/Thanatos-27B",
	filename="Thanatos-27B.Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use FoolDev/Thanatos-27B with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf FoolDev/Thanatos-27B:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf FoolDev/Thanatos-27B:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf FoolDev/Thanatos-27B:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf FoolDev/Thanatos-27B:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf FoolDev/Thanatos-27B:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf FoolDev/Thanatos-27B:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf FoolDev/Thanatos-27B:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf FoolDev/Thanatos-27B:Q4_K_M

Use Docker

docker model run hf.co/FoolDev/Thanatos-27B:Q4_K_M

LM Studio
Jan

vLLM

How to use FoolDev/Thanatos-27B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "FoolDev/Thanatos-27B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FoolDev/Thanatos-27B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/FoolDev/Thanatos-27B:Q4_K_M

SGLang

How to use FoolDev/Thanatos-27B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "FoolDev/Thanatos-27B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FoolDev/Thanatos-27B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "FoolDev/Thanatos-27B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FoolDev/Thanatos-27B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Ollama
How to use FoolDev/Thanatos-27B with Ollama:
```
ollama run hf.co/FoolDev/Thanatos-27B:Q4_K_M
```

Unsloth Studio

How to use FoolDev/Thanatos-27B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for FoolDev/Thanatos-27B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for FoolDev/Thanatos-27B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for FoolDev/Thanatos-27B to start chatting

How to use FoolDev/Thanatos-27B with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf FoolDev/Thanatos-27B:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "FoolDev/Thanatos-27B:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use FoolDev/Thanatos-27B with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf FoolDev/Thanatos-27B:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default FoolDev/Thanatos-27B:Q4_K_M

Run Hermes

hermes

Atomic Chat new

OpenClaw new

How to use FoolDev/Thanatos-27B with OpenClaw:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf FoolDev/Thanatos-27B:Q4_K_M

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "FoolDev/Thanatos-27B:Q4_K_M" \
  --custom-provider-id llama-cpp \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use FoolDev/Thanatos-27B with Docker Model Runner:
```
docker model run hf.co/FoolDev/Thanatos-27B:Q4_K_M
```

Lemonade

How to use FoolDev/Thanatos-27B with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull FoolDev/Thanatos-27B:Q4_K_M

Run and chat with the model

lemonade run user.Thanatos-27B-Q4_K_M

List all available models

lemonade list

FoolDev Claude Opus 4.7 commited on May 19

Commit

d87bc64

1 Parent(s): ef3c5d9

feat: make heal-hf (rebadge a qwen36 hf.co/... pull in place)

Browse files

`ollama run hf.co/FoolDev/Thanatos-27B` fails with the qwen36 500
(`unable to load model: <blob>`), and the recovery so far has been
`ollama rm <tag>` followed by `make load-bundle` to build a separate
`thanatos-27b` tag. That works but leaves the canonical
`hf.co/FoolDev/Thanatos-27B` name in a broken state and forces every
caller to use a different tag — easy to forget, easy to re-hit when
muscle memory types the HF form.

`scripts/heal_hf_pull.sh` rebadges the already-pulled blob in store
(qwen36 -> qwen35, metadata-only, byte-identical tensors via
`scripts/rename_arch.py`) and rewrites the manifest's model-layer
digest to point at the new blob. After the heal, the same
`hf.co/FoolDev/Thanatos-27B` tag loads via stock Ollama. Wired via
`make heal-hf`.

The script:
1. Resolves the model blob and manifest path. Uses `ollama show
--modelfile <tag>` to read the FROM line — robust across the
case variants ollama preserves (the lowercase `thanatos-27b`
pull and the canonical `Thanatos-27B` pull register under
different manifest dirs).
2. Inspects general.architecture via gguf.GGUFReader. Skips
idempotently if already qwen35 / qwen35moe; refuses anything
else.
3. Runs scripts/rename_arch.py qwen36 -> qwen35 into
${ROOT}/.cache/thanatos-heal.<rand>.gguf. .cache/ rather than
/tmp because the rebadged copy is ~17 GB — a half-RAM tmpfs
/tmp blows up partway through (errno 50 on Arch with 32 GB
RAM). .cache/ is on the same filesystem as ~/.ollama on a
normal Linux home layout, so the final `mv` into blobs/ stays
an atomic same-filesystem rename.
4. Computes the rebadged blob's sha256 and either moves it into
${OLLAMA_MODELS}/blobs/sha256-<new> or — if a blob with that
hash already exists (e.g. from a prior `make load-bundle` run
against the same bundle) — reuses it without double-allocating
~17 GB. Content-addressed dedup means the second qwen36 -> qwen35
rebadge in a session is free.
5. Rewrites the manifest's model-layer digest + size via jq into a
temp JSON, sanity-checks the rewrite, then atomically moves it
into place over the original manifest.
6. Removes the old qwen36 blob if no other manifest references it.

Verified end-to-end on this box: pulled `ollama run
hf.co/FoolDev/thanatos-27b:Q4_K_M` (fails with qwen36 500), ran
`make heal-hf`, dedup-reused an existing qwen35 blob from a prior
load-bundle, manifest rewrite landed, `MODEL=hf.co/FoolDev/thanatos-27b:Q4_K_M
make smoke-tools` passes (round-trip OK, no token leakage, tool-call
round-trip emits name=get_weather city=Tokyo). Old qwen36 blob was
removed since no other tag referenced it.

README TL;DR Ollama section now lists three paths instead of two
(heal-hf for the already-pulled case, load-bundle for the
fresh-from-this-repo's-bundle case, build for the unsloth qwen35
alternative). New `scripts/heal_hf_pull.sh` row added to "What's
here". CHANGELOG entry at top of [Unreleased].

Once upstream adds the qwen36 arch entry, this script (and the
whole rebadge dance) can be deleted; the bundle works as-is.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Files changed (4) hide show

CHANGELOG.md +23 -0
Makefile +4 -1
README.md +19 -9
scripts/heal_hf_pull.sh +173 -0

CHANGELOG.md CHANGED Viewed

@@ -8,6 +8,29 @@ and documentation**, not the underlying base model.
 ## [Unreleased]
 ### Added
 - `scripts/load_bundle.sh` + `make load-bundle`: one-shot path from
   the qwen36-stamped bundle → loadable Ollama tag. Handles the LFS
   smudge (`hf download FoolDev/Thanatos-27B Thanatos-27B.Q4_K_M.gguf`

 ## [Unreleased]
 ### Added
+- `scripts/heal_hf_pull.sh` + `make heal-hf`: heal an already-pulled
+  `hf.co/FoolDev/Thanatos-27B:...` tag in-store by rebadging its
+  model blob (qwen36 → qwen35, metadata-only, byte-identical
+  tensors) and rewriting the manifest's model-layer digest. Covers
+  the user pain when `ollama run hf.co/FoolDev/Thanatos-27B` is
+  typed from muscle memory, fails with the qwen36 500, and leaves
+  ~17 GB of unloadable blob sitting in the store; before this, the
+  only recovery was `ollama rm <tag>` + switching to the separate
+  `thanatos-27b` tag that `make load-bundle` builds. `make heal-hf`
+  makes the same `hf.co/...` tag loadable in place. Idempotent
+  (tags already on qwen35 / qwen35moe are skipped);
+  content-addressed dedup means if the rebadged blob already exists
+  in the store (e.g. from a prior `make load-bundle` run) the heal
+  reuses it instead of double-allocating ~17 GB. Removes the old
+  qwen36 blob if no other manifest references it. Stages the
+  rebadge in `.cache/` rather than `/tmp` so the ~17 GB write
+  doesn't blow past tmpfs (`mv` into `blobs/` stays an atomic
+  same-filesystem rename on a normal Linux home-dir layout).
+- README TL;DR Ollama section now lists **three** paths: heal an
+  already-pulled HF tag (`make heal-hf`), build from the bundle
+  (`make load-bundle`), or bypass the bundle entirely
+  (`make build`). New `scripts/heal_hf_pull.sh` entry added to the
+  "What's here" table.
 - `scripts/load_bundle.sh` + `make load-bundle`: one-shot path from
   the qwen36-stamped bundle → loadable Ollama tag. Handles the LFS
   smudge (`hf download FoolDev/Thanatos-27B Thanatos-27B.Q4_K_M.gguf`

Makefile CHANGED Viewed

@@ -26,7 +26,7 @@ MODEL ?= $(TAG)
 PRECISION ?= F16
-.PHONY: help build load-bundle smoke smoke-tools bench check hooks mmproj clean
 help:  ## Show this help.
 	@awk 'BEGIN {FS = ":.*##"; printf "Targets:\n"} /^[a-zA-Z_-]+:.*?##/ { printf "  \033[36m%-12s\033[0m %s\n", $$1, $$2 }' $(MAKEFILE_LIST)
@@ -43,6 +43,9 @@ build:  ## Download qwen35-stamped GGUF from unsloth and run 'ollama create' (lo
 load-bundle:  ## Load THIS repo's qwen36-stamped bundle (smudge LFS + rebadge to qwen35 + ollama create).
 	TAG=$(TAG) ./scripts/load_bundle.sh
 smoke:  ## Verify the model is reachable and round-trips.
 	MODEL=$(MODEL) ./scripts/smoke_test.sh

 PRECISION ?= F16
+.PHONY: help build load-bundle heal-hf smoke smoke-tools bench check hooks mmproj clean
 help:  ## Show this help.
 	@awk 'BEGIN {FS = ":.*##"; printf "Targets:\n"} /^[a-zA-Z_-]+:.*?##/ { printf "  \033[36m%-12s\033[0m %s\n", $$1, $$2 }' $(MAKEFILE_LIST)
 load-bundle:  ## Load THIS repo's qwen36-stamped bundle (smudge LFS + rebadge to qwen35 + ollama create).
 	TAG=$(TAG) ./scripts/load_bundle.sh
+heal-hf:  ## Heal an already-pulled hf.co/FoolDev/Thanatos-27B tag in-store (rebadge blob + manifest digest).
+	./scripts/heal_hf_pull.sh
 smoke:  ## Verify the model is reachable and round-trips.
 	MODEL=$(MODEL) ./scripts/smoke_test.sh

README.md CHANGED Viewed

@@ -83,26 +83,35 @@ ollama run hf.co/FoolDev/Thanatos-27B           # ~17 GB Q4_K_M, qwen36-stamped
 ```
 That command fails today with `unknown model architecture: 'qwen36'`
-because the bundle is qwen36-stamped. Two paths around it (both
-clone the repo first):
 ```bash
 git clone https://huggingface.co/FoolDev/Thanatos-27B && cd Thanatos-27B
-# A. Load *this repo's* qwen36-stamped bundle (smudges LFS if needed,
-#    rebadges to qwen35, runs `ollama create thanatos-27b`):
 make load-bundle
 ollama run thanatos-27b
-# B. Bypass the bundle entirely: download a qwen35-stamped GGUF from
-#    unsloth (loads on every current llama.cpp / Ollama):
 make build                              # Q4_K_M from unsloth
-make build QUANT=Q5_K_M                  # higher quality
 ollama run thanatos-27b
 ```
-Once upstream adds the qwen36 arch entry, both paths collapse to the
-direct `ollama run hf.co/FoolDev/Thanatos-27B` one-liner above.
 For other quants (Q3_K_S ~12 GB, Q5_K_M ~20 GB, etc.), `make build
 QUANT=Q3_K_S` is the simplest path. See [Quick start](#quick-start)
@@ -142,6 +151,7 @@ The 27B is **dense**: every parameter participates in every forward pass. It's s
 | `examples/` | Ready-to-run Python clients for Ollama, Transformers, and llama-cpp-python |
 | `scripts/build.sh` | Pulls a qwen35-stamped GGUF from `unsloth/Qwen3.6-27B-GGUF` and runs `ollama create` (loads on today's llama.cpp / Ollama; see `make build`) |
 | `scripts/load_bundle.sh` | One-shot path from *this repo's* qwen36-stamped bundle → loadable Ollama tag (smudges LFS pointer via `hf download` if needed, rebadges qwen36 → qwen35, runs `ollama create`; see `make load-bundle`) |
 | `scripts/smoke_test.sh` | Verifies an Ollama daemon + model, runs a round-trip, asserts no chat-template tokens leak into the response. With `TOOLS_TEST=1`, also exercises an end-to-end tool-call round-trip and checks the response shape |
 | `scripts/bench.sh` | Measures real tok/s using Ollama's `eval_count` / `eval_duration` metadata over a 3-prompt mix (run `make bench`) |
 | `scripts/fetch_vision.sh` | Pulls the vision projector (`mmproj-F16.gguf`) for llama.cpp (Ollama vision is broken upstream — see [Vision](#vision)). Renamed from `fetch_mmproj.sh` because HF's Ollama bridge auto-indexed the script as a vision projector layer (filename pattern match). |

 ```
 That command fails today with `unknown model architecture: 'qwen36'`
+because the bundle is qwen36-stamped. Three paths around it (all
+require this repo cloned):
 ```bash
 git clone https://huggingface.co/FoolDev/Thanatos-27B && cd Thanatos-27B
+# A. Already ran the broken pull? Heal it in place — rebadges the
+#    already-downloaded blob's arch metadata + rewrites the manifest
+#    digest so `ollama run hf.co/FoolDev/Thanatos-27B` loads:
+make heal-hf
+ollama run hf.co/FoolDev/Thanatos-27B
+# B. Haven't pulled yet — load *this repo's* qwen36-stamped bundle
+#    via the rebadge helper (smudges LFS if needed, rebadges
+#    qwen36 → qwen35, runs `ollama create thanatos-27b`):
 make load-bundle
 ollama run thanatos-27b
+# C. Bypass the bundle: download a qwen35-stamped GGUF from unsloth
+#    and build locally. Loads on every current llama.cpp / Ollama.
 make build                              # Q4_K_M from unsloth
+make build QUANT=Q3_K_S                  # 12 GB smaller quant
+make build QUANT=Q5_K_M                  # 20 GB higher quality
 ollama run thanatos-27b
 ```
+Once upstream adds the qwen36 arch entry, all three paths collapse
+to the direct `ollama run hf.co/FoolDev/Thanatos-27B` one-liner
+above.
 For other quants (Q3_K_S ~12 GB, Q5_K_M ~20 GB, etc.), `make build
 QUANT=Q3_K_S` is the simplest path. See [Quick start](#quick-start)
 | `examples/` | Ready-to-run Python clients for Ollama, Transformers, and llama-cpp-python |
 | `scripts/build.sh` | Pulls a qwen35-stamped GGUF from `unsloth/Qwen3.6-27B-GGUF` and runs `ollama create` (loads on today's llama.cpp / Ollama; see `make build`) |
 | `scripts/load_bundle.sh` | One-shot path from *this repo's* qwen36-stamped bundle → loadable Ollama tag (smudges LFS pointer via `hf download` if needed, rebadges qwen36 → qwen35, runs `ollama create`; see `make load-bundle`) |
+| `scripts/heal_hf_pull.sh` | Heal an already-pulled `hf.co/FoolDev/Thanatos-27B:...` tag in-store: rebadges its model blob qwen36 → qwen35 and rewrites the manifest's model-layer digest so the same tag becomes loadable in place. Use after `ollama run hf.co/FoolDev/Thanatos-27B` has failed once and left ~17 GB in the blob store; see `make heal-hf`. Idempotent — tags already on qwen35 are skipped. |
 | `scripts/smoke_test.sh` | Verifies an Ollama daemon + model, runs a round-trip, asserts no chat-template tokens leak into the response. With `TOOLS_TEST=1`, also exercises an end-to-end tool-call round-trip and checks the response shape |
 | `scripts/bench.sh` | Measures real tok/s using Ollama's `eval_count` / `eval_duration` metadata over a 3-prompt mix (run `make bench`) |
 | `scripts/fetch_vision.sh` | Pulls the vision projector (`mmproj-F16.gguf`) for llama.cpp (Ollama vision is broken upstream — see [Vision](#vision)). Renamed from `fetch_mmproj.sh` because HF's Ollama bridge auto-indexed the script as a vision projector layer (filename pattern match). |

scripts/heal_hf_pull.sh ADDED Viewed

	@@ -0,0 +1,173 @@

+#!/usr/bin/env bash
+# Thanatos-27B — heal a freshly pulled HF-bridge tag whose bundled GGUF
+# is `qwen36`-stamped.
+#
+# Background. `ollama run hf.co/FoolDev/Thanatos-27B` (or any other
+# qwen36-stamped HF-bridge tag of this repo) pulls a fresh copy of the
+# bundled GGUF every time. Until upstream registers the `qwen36` arch,
+# every such pull fails with `unable to load model: <blob>` (see
+# README "Architecture"). `make load-bundle` works around this by
+# building a *separate* local `thanatos-27b` tag from a rebadged copy,
+# but the canonical HF-bridge tag stays broken.
+#
+# This script rebadges the HF-bridge tag's model blob in-place
+# (qwen36 -> qwen35, metadata-only, byte-identical tensors) and
+# rewrites the manifest's model-layer digest to point at the new
+# blob. After running it, `ollama run hf.co/FoolDev/Thanatos-27B`
+# loads.
+#
+# Idempotent: a tag already on qwen35 / qwen35moe is left untouched.
+# Re-runnable after a fresh HF pull (the pull resets the manifest
+# digest back to the qwen36 blob).
+#
+# Once upstream adds the qwen36 arch entry this script (and the
+# whole rebadge dance) can be deleted; the bundle works as-is.
+#
+# Usage:
+#   ./scripts/heal_hf_pull.sh                                # default tag
+#   TAG=hf.co/FoolDev/Thanatos-27B:Q4_K_M ./scripts/heal_hf_pull.sh
+#
+# Requires: ollama, jq, python3 with the `gguf` package, sha256sum.
+set -euo pipefail
+ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
+TAG="${TAG:-hf.co/FoolDev/Thanatos-27B:Q4_K_M}"
+OLLAMA_MODELS="${OLLAMA_MODELS:-${HOME}/.ollama/models}"
+red()   { printf "\033[31m%s\033[0m\n" "$*"; }
+green() { printf "\033[32m%s\033[0m\n" "$*"; }
+blue()  { printf "\033[34m%s\033[0m\n" "$*"; }
+blue "[*] tag:    ${TAG}"
+blue "[*] store:  ${OLLAMA_MODELS}"
+# ---- 1. Sanity ---------------------------------------------------------------
+for bin in ollama jq python3 sha256sum; do
+    if ! command -v "${bin}" >/dev/null 2>&1; then
+        red "[!] missing dependency: ${bin}"; exit 1
+    fi
+done
+# ---- 2. Locate the model blob and manifest ----------------------------------
+# `ollama show --modelfile` writes a FROM line with the absolute blob path.
+# Reliable regardless of which case variant the user pulled with
+# (hf.co's 307 lets `Thanatos-27B` and `thanatos-27b` both resolve to the
+# canonical repo, and ollama stores the manifest under whichever case
+# was first registered).
+MODEL_BLOB="$(ollama show --modelfile "${TAG}" 2>/dev/null | awk '/^FROM[[:space:]]/ {print $2; exit}')"
+if [[ -z "${MODEL_BLOB}" || ! -f "${MODEL_BLOB}" ]]; then
+    red "[!] could not resolve model blob for tag '${TAG}'."
+    red "    Is the tag pulled? Try: ollama pull ${TAG}"
+    exit 1
+fi
+MODEL_HASH="$(basename "${MODEL_BLOB}" | sed 's/^sha256-//')"
+blue "[*] blob:   ${MODEL_BLOB}"
+# Find the manifest by grepping for the model digest. The blob is
+# referenced from exactly one tag in the heal scenario — fresh HF pull
+# of a single :Q4_K_M tag — but if someone has multiple tags pointing
+# at the same blob, we filter down to the one matching ${TAG}.
+TAG_PATH="${TAG#hf.co/}"      # FoolDev/Thanatos-27B:Q4_K_M
+NAMESPACE_PATH="${TAG_PATH%:*}" # FoolDev/Thanatos-27B
+TAG_FILE="${TAG_PATH##*:}"    # Q4_K_M
+MANIFEST="$(find "${OLLAMA_MODELS}/manifests/hf.co" \
+              -type f \
+              -ipath "*/${NAMESPACE_PATH}/${TAG_FILE}" 2>/dev/null | head -1)"
+if [[ -z "${MANIFEST}" || ! -f "${MANIFEST}" ]]; then
+    red "[!] manifest not found under ${OLLAMA_MODELS}/manifests/hf.co for tag '${TAG}'."
+    exit 1
+fi
+blue "[*] manifest: ${MANIFEST}"
+# ---- 3. Inspect arch ---------------------------------------------------------
+ARCH="$(python3 - "${MODEL_BLOB}" <<'PY'
+import sys
+from gguf import GGUFReader, constants
+r = GGUFReader(sys.argv[1], "r")
+f = r.get_field(constants.Keys.General.ARCHITECTURE)
+print(bytes(f.parts[f.data[0]]).decode())
+PY
+)"
+blue "[*] arch:    ${ARCH}"
+if [[ "${ARCH}" == "qwen35" || "${ARCH}" == "qwen35moe" ]]; then
+    green "[=] already on a loadable arch (${ARCH}) — nothing to heal."
+    exit 0
+fi
+if [[ "${ARCH}" != "qwen36" ]]; then
+    red "[!] unexpected arch '${ARCH}' — refusing to heal. Edit this script if intentional."
+    exit 1
+fi
+# ---- 4. Rebadge to a temp blob and stage it in the store --------------------
+# Stage in the repo's .cache/ rather than /tmp: the rebadged copy is the same
+# size as the original (~17 GB), which blows past a typical tmpfs /tmp budget.
+# .cache/ is on the same filesystem as ~/.ollama on a normal Linux home dir
+# layout, so the final move into blobs/ is an atomic rename, not a copy.
+SCRATCH_DIR="${ROOT}/.cache"
+mkdir -p "${SCRATCH_DIR}"
+TMP_BLOB="$(mktemp -p "${SCRATCH_DIR}" thanatos-heal.XXXXXX.gguf)"
+trap 'rm -f "${TMP_BLOB}"' EXIT
+blue "[*] rebadging qwen36 -> qwen35 (metadata only, tensors byte-identical) ..."
+python3 "${ROOT}/scripts/rename_arch.py" \
+    --from-arch qwen36 --to-arch qwen35 \
+    "${MODEL_BLOB}" "${TMP_BLOB}"
+NEW_HASH="$(sha256sum "${TMP_BLOB}" | awk '{print $1}')"
+NEW_SIZE="$(stat -c '%s' "${TMP_BLOB}")"
+NEW_BLOB="${OLLAMA_MODELS}/blobs/sha256-${NEW_HASH}"
+blue "[*] new digest: sha256:${NEW_HASH}"
+blue "[*] new size:   ${NEW_SIZE}"
+if [[ -f "${NEW_BLOB}" ]]; then
+    blue "[=] target blob already in store — reusing."
+    rm -f "${TMP_BLOB}"
+else
+    mv "${TMP_BLOB}" "${NEW_BLOB}"
+fi
+trap - EXIT
+# ---- 5. Rewrite the manifest's model layer ----------------------------------
+TMP_MANIFEST="$(mktemp -t thanatos-heal-manifest.XXXXXX.json)"
+trap 'rm -f "${TMP_MANIFEST}"' EXIT
+jq --arg new "sha256:${NEW_HASH}" \
+   --argjson size "${NEW_SIZE}" '
+    .layers |= map(
+        if .mediaType == "application/vnd.ollama.image.model"
+        then .digest = $new | .size = $size
+        else .
+        end
+    )
+' "${MANIFEST}" > "${TMP_MANIFEST}"
+NEW_DIGEST_IN_MANIFEST="$(jq -r '
+    .layers[] | select(.mediaType == "application/vnd.ollama.image.model") | .digest
+' "${TMP_MANIFEST}")"
+if [[ "${NEW_DIGEST_IN_MANIFEST}" != "sha256:${NEW_HASH}" ]]; then
+    red "[!] manifest rewrite failed (digest mismatch); not committing."
+    exit 1
+fi
+mv "${TMP_MANIFEST}" "${MANIFEST}"
+trap - EXIT
+# ---- 6. Remove the old qwen36 blob if no other manifest references it -------
+OLD_DIGEST="sha256:${MODEL_HASH}"
+if ! grep -rlF -- "${OLD_DIGEST}" "${OLLAMA_MODELS}/manifests/" >/dev/null 2>&1; then
+    blue "[*] no other manifest references the old qwen36 blob — removing ${MODEL_BLOB}"
+    rm -f "${MODEL_BLOB}"
+else
+    blue "[=] old qwen36 blob still referenced by another manifest — leaving in place."
+fi
+echo
+green "[+] healed. Try it:"
+echo "    ollama run ${TAG}"
+echo "    MODEL=${TAG} make smoke"