Instructions to use FoolDev/Thanatos-27B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use FoolDev/Thanatos-27B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="FoolDev/Thanatos-27B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("FoolDev/Thanatos-27B", dtype="auto") - llama-cpp-python
How to use FoolDev/Thanatos-27B with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="FoolDev/Thanatos-27B", filename="Thanatos-27B.Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use FoolDev/Thanatos-27B with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf FoolDev/Thanatos-27B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf FoolDev/Thanatos-27B:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf FoolDev/Thanatos-27B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf FoolDev/Thanatos-27B:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf FoolDev/Thanatos-27B:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf FoolDev/Thanatos-27B:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf FoolDev/Thanatos-27B:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf FoolDev/Thanatos-27B:Q4_K_M
Use Docker
docker model run hf.co/FoolDev/Thanatos-27B:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use FoolDev/Thanatos-27B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "FoolDev/Thanatos-27B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FoolDev/Thanatos-27B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/FoolDev/Thanatos-27B:Q4_K_M
- SGLang
How to use FoolDev/Thanatos-27B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "FoolDev/Thanatos-27B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FoolDev/Thanatos-27B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "FoolDev/Thanatos-27B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FoolDev/Thanatos-27B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Ollama
How to use FoolDev/Thanatos-27B with Ollama:
ollama run hf.co/FoolDev/Thanatos-27B:Q4_K_M
- Unsloth Studio new
How to use FoolDev/Thanatos-27B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for FoolDev/Thanatos-27B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for FoolDev/Thanatos-27B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for FoolDev/Thanatos-27B to start chatting
- Pi new
How to use FoolDev/Thanatos-27B with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf FoolDev/Thanatos-27B:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "FoolDev/Thanatos-27B:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use FoolDev/Thanatos-27B with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf FoolDev/Thanatos-27B:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default FoolDev/Thanatos-27B:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use FoolDev/Thanatos-27B with Docker Model Runner:
docker model run hf.co/FoolDev/Thanatos-27B:Q4_K_M
- Lemonade
How to use FoolDev/Thanatos-27B with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull FoolDev/Thanatos-27B:Q4_K_M
Run and chat with the model
lemonade run user.Thanatos-27B-Q4_K_M
List all available models
lemonade list
feat: make heal-hf (rebadge a qwen36 hf.co/... pull in place)
Browse files`ollama run hf.co/FoolDev/Thanatos-27B` fails with the qwen36 500
(`unable to load model: <blob>`), and the recovery so far has been
`ollama rm <tag>` followed by `make load-bundle` to build a separate
`thanatos-27b` tag. That works but leaves the canonical
`hf.co/FoolDev/Thanatos-27B` name in a broken state and forces every
caller to use a different tag — easy to forget, easy to re-hit when
muscle memory types the HF form.
`scripts/heal_hf_pull.sh` rebadges the already-pulled blob in store
(qwen36 -> qwen35, metadata-only, byte-identical tensors via
`scripts/rename_arch.py`) and rewrites the manifest's model-layer
digest to point at the new blob. After the heal, the same
`hf.co/FoolDev/Thanatos-27B` tag loads via stock Ollama. Wired via
`make heal-hf`.
The script:
1. Resolves the model blob and manifest path. Uses `ollama show
--modelfile <tag>` to read the FROM line — robust across the
case variants ollama preserves (the lowercase `thanatos-27b`
pull and the canonical `Thanatos-27B` pull register under
different manifest dirs).
2. Inspects general.architecture via gguf.GGUFReader. Skips
idempotently if already qwen35 / qwen35moe; refuses anything
else.
3. Runs scripts/rename_arch.py qwen36 -> qwen35 into
${ROOT}/.cache/thanatos-heal.<rand>.gguf. .cache/ rather than
/tmp because the rebadged copy is ~17 GB — a half-RAM tmpfs
/tmp blows up partway through (errno 50 on Arch with 32 GB
RAM). .cache/ is on the same filesystem as ~/.ollama on a
normal Linux home layout, so the final `mv` into blobs/ stays
an atomic same-filesystem rename.
4. Computes the rebadged blob's sha256 and either moves it into
${OLLAMA_MODELS}/blobs/sha256-<new> or — if a blob with that
hash already exists (e.g. from a prior `make load-bundle` run
against the same bundle) — reuses it without double-allocating
~17 GB. Content-addressed dedup means the second qwen36 -> qwen35
rebadge in a session is free.
5. Rewrites the manifest's model-layer digest + size via jq into a
temp JSON, sanity-checks the rewrite, then atomically moves it
into place over the original manifest.
6. Removes the old qwen36 blob if no other manifest references it.
Verified end-to-end on this box: pulled `ollama run
hf.co/FoolDev/thanatos-27b:Q4_K_M` (fails with qwen36 500), ran
`make heal-hf`, dedup-reused an existing qwen35 blob from a prior
load-bundle, manifest rewrite landed, `MODEL=hf.co/FoolDev/thanatos-27b:Q4_K_M
make smoke-tools` passes (round-trip OK, no token leakage, tool-call
round-trip emits name=get_weather city=Tokyo). Old qwen36 blob was
removed since no other tag referenced it.
README TL;DR Ollama section now lists three paths instead of two
(heal-hf for the already-pulled case, load-bundle for the
fresh-from-this-repo's-bundle case, build for the unsloth qwen35
alternative). New `scripts/heal_hf_pull.sh` row added to "What's
here". CHANGELOG entry at top of [Unreleased].
Once upstream adds the qwen36 arch entry, this script (and the
whole rebadge dance) can be deleted; the bundle works as-is.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- CHANGELOG.md +23 -0
- Makefile +4 -1
- README.md +19 -9
- scripts/heal_hf_pull.sh +173 -0
|
@@ -8,6 +8,29 @@ and documentation**, not the underlying base model.
|
|
| 8 |
## [Unreleased]
|
| 9 |
|
| 10 |
### Added
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
- `scripts/load_bundle.sh` + `make load-bundle`: one-shot path from
|
| 12 |
the qwen36-stamped bundle → loadable Ollama tag. Handles the LFS
|
| 13 |
smudge (`hf download FoolDev/Thanatos-27B Thanatos-27B.Q4_K_M.gguf`
|
|
|
|
| 8 |
## [Unreleased]
|
| 9 |
|
| 10 |
### Added
|
| 11 |
+
- `scripts/heal_hf_pull.sh` + `make heal-hf`: heal an already-pulled
|
| 12 |
+
`hf.co/FoolDev/Thanatos-27B:...` tag in-store by rebadging its
|
| 13 |
+
model blob (qwen36 → qwen35, metadata-only, byte-identical
|
| 14 |
+
tensors) and rewriting the manifest's model-layer digest. Covers
|
| 15 |
+
the user pain when `ollama run hf.co/FoolDev/Thanatos-27B` is
|
| 16 |
+
typed from muscle memory, fails with the qwen36 500, and leaves
|
| 17 |
+
~17 GB of unloadable blob sitting in the store; before this, the
|
| 18 |
+
only recovery was `ollama rm <tag>` + switching to the separate
|
| 19 |
+
`thanatos-27b` tag that `make load-bundle` builds. `make heal-hf`
|
| 20 |
+
makes the same `hf.co/...` tag loadable in place. Idempotent
|
| 21 |
+
(tags already on qwen35 / qwen35moe are skipped);
|
| 22 |
+
content-addressed dedup means if the rebadged blob already exists
|
| 23 |
+
in the store (e.g. from a prior `make load-bundle` run) the heal
|
| 24 |
+
reuses it instead of double-allocating ~17 GB. Removes the old
|
| 25 |
+
qwen36 blob if no other manifest references it. Stages the
|
| 26 |
+
rebadge in `.cache/` rather than `/tmp` so the ~17 GB write
|
| 27 |
+
doesn't blow past tmpfs (`mv` into `blobs/` stays an atomic
|
| 28 |
+
same-filesystem rename on a normal Linux home-dir layout).
|
| 29 |
+
- README TL;DR Ollama section now lists **three** paths: heal an
|
| 30 |
+
already-pulled HF tag (`make heal-hf`), build from the bundle
|
| 31 |
+
(`make load-bundle`), or bypass the bundle entirely
|
| 32 |
+
(`make build`). New `scripts/heal_hf_pull.sh` entry added to the
|
| 33 |
+
"What's here" table.
|
| 34 |
- `scripts/load_bundle.sh` + `make load-bundle`: one-shot path from
|
| 35 |
the qwen36-stamped bundle → loadable Ollama tag. Handles the LFS
|
| 36 |
smudge (`hf download FoolDev/Thanatos-27B Thanatos-27B.Q4_K_M.gguf`
|
|
@@ -26,7 +26,7 @@ MODEL ?= $(TAG)
|
|
| 26 |
|
| 27 |
PRECISION ?= F16
|
| 28 |
|
| 29 |
-
.PHONY: help build load-bundle smoke smoke-tools bench check hooks mmproj clean
|
| 30 |
|
| 31 |
help: ## Show this help.
|
| 32 |
@awk 'BEGIN {FS = ":.*##"; printf "Targets:\n"} /^[a-zA-Z_-]+:.*?##/ { printf " \033[36m%-12s\033[0m %s\n", $$1, $$2 }' $(MAKEFILE_LIST)
|
|
@@ -43,6 +43,9 @@ build: ## Download qwen35-stamped GGUF from unsloth and run 'ollama create' (lo
|
|
| 43 |
load-bundle: ## Load THIS repo's qwen36-stamped bundle (smudge LFS + rebadge to qwen35 + ollama create).
|
| 44 |
TAG=$(TAG) ./scripts/load_bundle.sh
|
| 45 |
|
|
|
|
|
|
|
|
|
|
| 46 |
smoke: ## Verify the model is reachable and round-trips.
|
| 47 |
MODEL=$(MODEL) ./scripts/smoke_test.sh
|
| 48 |
|
|
|
|
| 26 |
|
| 27 |
PRECISION ?= F16
|
| 28 |
|
| 29 |
+
.PHONY: help build load-bundle heal-hf smoke smoke-tools bench check hooks mmproj clean
|
| 30 |
|
| 31 |
help: ## Show this help.
|
| 32 |
@awk 'BEGIN {FS = ":.*##"; printf "Targets:\n"} /^[a-zA-Z_-]+:.*?##/ { printf " \033[36m%-12s\033[0m %s\n", $$1, $$2 }' $(MAKEFILE_LIST)
|
|
|
|
| 43 |
load-bundle: ## Load THIS repo's qwen36-stamped bundle (smudge LFS + rebadge to qwen35 + ollama create).
|
| 44 |
TAG=$(TAG) ./scripts/load_bundle.sh
|
| 45 |
|
| 46 |
+
heal-hf: ## Heal an already-pulled hf.co/FoolDev/Thanatos-27B tag in-store (rebadge blob + manifest digest).
|
| 47 |
+
./scripts/heal_hf_pull.sh
|
| 48 |
+
|
| 49 |
smoke: ## Verify the model is reachable and round-trips.
|
| 50 |
MODEL=$(MODEL) ./scripts/smoke_test.sh
|
| 51 |
|
|
@@ -83,26 +83,35 @@ ollama run hf.co/FoolDev/Thanatos-27B # ~17 GB Q4_K_M, qwen36-stamped
|
|
| 83 |
```
|
| 84 |
|
| 85 |
That command fails today with `unknown model architecture: 'qwen36'`
|
| 86 |
-
because the bundle is qwen36-stamped.
|
| 87 |
-
|
| 88 |
|
| 89 |
```bash
|
| 90 |
git clone https://huggingface.co/FoolDev/Thanatos-27B && cd Thanatos-27B
|
| 91 |
|
| 92 |
-
# A.
|
| 93 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 94 |
make load-bundle
|
| 95 |
ollama run thanatos-27b
|
| 96 |
|
| 97 |
-
#
|
| 98 |
-
#
|
| 99 |
make build # Q4_K_M from unsloth
|
| 100 |
-
make build QUANT=
|
|
|
|
| 101 |
ollama run thanatos-27b
|
| 102 |
```
|
| 103 |
|
| 104 |
-
Once upstream adds the qwen36 arch entry,
|
| 105 |
-
direct `ollama run hf.co/FoolDev/Thanatos-27B` one-liner
|
|
|
|
| 106 |
|
| 107 |
For other quants (Q3_K_S ~12 GB, Q5_K_M ~20 GB, etc.), `make build
|
| 108 |
QUANT=Q3_K_S` is the simplest path. See [Quick start](#quick-start)
|
|
@@ -142,6 +151,7 @@ The 27B is **dense**: every parameter participates in every forward pass. It's s
|
|
| 142 |
| `examples/` | Ready-to-run Python clients for Ollama, Transformers, and llama-cpp-python |
|
| 143 |
| `scripts/build.sh` | Pulls a qwen35-stamped GGUF from `unsloth/Qwen3.6-27B-GGUF` and runs `ollama create` (loads on today's llama.cpp / Ollama; see `make build`) |
|
| 144 |
| `scripts/load_bundle.sh` | One-shot path from *this repo's* qwen36-stamped bundle → loadable Ollama tag (smudges LFS pointer via `hf download` if needed, rebadges qwen36 → qwen35, runs `ollama create`; see `make load-bundle`) |
|
|
|
|
| 145 |
| `scripts/smoke_test.sh` | Verifies an Ollama daemon + model, runs a round-trip, asserts no chat-template tokens leak into the response. With `TOOLS_TEST=1`, also exercises an end-to-end tool-call round-trip and checks the response shape |
|
| 146 |
| `scripts/bench.sh` | Measures real tok/s using Ollama's `eval_count` / `eval_duration` metadata over a 3-prompt mix (run `make bench`) |
|
| 147 |
| `scripts/fetch_vision.sh` | Pulls the vision projector (`mmproj-F16.gguf`) for llama.cpp (Ollama vision is broken upstream — see [Vision](#vision)). Renamed from `fetch_mmproj.sh` because HF's Ollama bridge auto-indexed the script as a vision projector layer (filename pattern match). |
|
|
|
|
| 83 |
```
|
| 84 |
|
| 85 |
That command fails today with `unknown model architecture: 'qwen36'`
|
| 86 |
+
because the bundle is qwen36-stamped. Three paths around it (all
|
| 87 |
+
require this repo cloned):
|
| 88 |
|
| 89 |
```bash
|
| 90 |
git clone https://huggingface.co/FoolDev/Thanatos-27B && cd Thanatos-27B
|
| 91 |
|
| 92 |
+
# A. Already ran the broken pull? Heal it in place — rebadges the
|
| 93 |
+
# already-downloaded blob's arch metadata + rewrites the manifest
|
| 94 |
+
# digest so `ollama run hf.co/FoolDev/Thanatos-27B` loads:
|
| 95 |
+
make heal-hf
|
| 96 |
+
ollama run hf.co/FoolDev/Thanatos-27B
|
| 97 |
+
|
| 98 |
+
# B. Haven't pulled yet — load *this repo's* qwen36-stamped bundle
|
| 99 |
+
# via the rebadge helper (smudges LFS if needed, rebadges
|
| 100 |
+
# qwen36 → qwen35, runs `ollama create thanatos-27b`):
|
| 101 |
make load-bundle
|
| 102 |
ollama run thanatos-27b
|
| 103 |
|
| 104 |
+
# C. Bypass the bundle: download a qwen35-stamped GGUF from unsloth
|
| 105 |
+
# and build locally. Loads on every current llama.cpp / Ollama.
|
| 106 |
make build # Q4_K_M from unsloth
|
| 107 |
+
make build QUANT=Q3_K_S # 12 GB smaller quant
|
| 108 |
+
make build QUANT=Q5_K_M # 20 GB higher quality
|
| 109 |
ollama run thanatos-27b
|
| 110 |
```
|
| 111 |
|
| 112 |
+
Once upstream adds the qwen36 arch entry, all three paths collapse
|
| 113 |
+
to the direct `ollama run hf.co/FoolDev/Thanatos-27B` one-liner
|
| 114 |
+
above.
|
| 115 |
|
| 116 |
For other quants (Q3_K_S ~12 GB, Q5_K_M ~20 GB, etc.), `make build
|
| 117 |
QUANT=Q3_K_S` is the simplest path. See [Quick start](#quick-start)
|
|
|
|
| 151 |
| `examples/` | Ready-to-run Python clients for Ollama, Transformers, and llama-cpp-python |
|
| 152 |
| `scripts/build.sh` | Pulls a qwen35-stamped GGUF from `unsloth/Qwen3.6-27B-GGUF` and runs `ollama create` (loads on today's llama.cpp / Ollama; see `make build`) |
|
| 153 |
| `scripts/load_bundle.sh` | One-shot path from *this repo's* qwen36-stamped bundle → loadable Ollama tag (smudges LFS pointer via `hf download` if needed, rebadges qwen36 → qwen35, runs `ollama create`; see `make load-bundle`) |
|
| 154 |
+
| `scripts/heal_hf_pull.sh` | Heal an already-pulled `hf.co/FoolDev/Thanatos-27B:...` tag in-store: rebadges its model blob qwen36 → qwen35 and rewrites the manifest's model-layer digest so the same tag becomes loadable in place. Use after `ollama run hf.co/FoolDev/Thanatos-27B` has failed once and left ~17 GB in the blob store; see `make heal-hf`. Idempotent — tags already on qwen35 are skipped. |
|
| 155 |
| `scripts/smoke_test.sh` | Verifies an Ollama daemon + model, runs a round-trip, asserts no chat-template tokens leak into the response. With `TOOLS_TEST=1`, also exercises an end-to-end tool-call round-trip and checks the response shape |
|
| 156 |
| `scripts/bench.sh` | Measures real tok/s using Ollama's `eval_count` / `eval_duration` metadata over a 3-prompt mix (run `make bench`) |
|
| 157 |
| `scripts/fetch_vision.sh` | Pulls the vision projector (`mmproj-F16.gguf`) for llama.cpp (Ollama vision is broken upstream — see [Vision](#vision)). Renamed from `fetch_mmproj.sh` because HF's Ollama bridge auto-indexed the script as a vision projector layer (filename pattern match). |
|
|
@@ -0,0 +1,173 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env bash
|
| 2 |
+
# Thanatos-27B — heal a freshly pulled HF-bridge tag whose bundled GGUF
|
| 3 |
+
# is `qwen36`-stamped.
|
| 4 |
+
#
|
| 5 |
+
# Background. `ollama run hf.co/FoolDev/Thanatos-27B` (or any other
|
| 6 |
+
# qwen36-stamped HF-bridge tag of this repo) pulls a fresh copy of the
|
| 7 |
+
# bundled GGUF every time. Until upstream registers the `qwen36` arch,
|
| 8 |
+
# every such pull fails with `unable to load model: <blob>` (see
|
| 9 |
+
# README "Architecture"). `make load-bundle` works around this by
|
| 10 |
+
# building a *separate* local `thanatos-27b` tag from a rebadged copy,
|
| 11 |
+
# but the canonical HF-bridge tag stays broken.
|
| 12 |
+
#
|
| 13 |
+
# This script rebadges the HF-bridge tag's model blob in-place
|
| 14 |
+
# (qwen36 -> qwen35, metadata-only, byte-identical tensors) and
|
| 15 |
+
# rewrites the manifest's model-layer digest to point at the new
|
| 16 |
+
# blob. After running it, `ollama run hf.co/FoolDev/Thanatos-27B`
|
| 17 |
+
# loads.
|
| 18 |
+
#
|
| 19 |
+
# Idempotent: a tag already on qwen35 / qwen35moe is left untouched.
|
| 20 |
+
# Re-runnable after a fresh HF pull (the pull resets the manifest
|
| 21 |
+
# digest back to the qwen36 blob).
|
| 22 |
+
#
|
| 23 |
+
# Once upstream adds the qwen36 arch entry this script (and the
|
| 24 |
+
# whole rebadge dance) can be deleted; the bundle works as-is.
|
| 25 |
+
#
|
| 26 |
+
# Usage:
|
| 27 |
+
# ./scripts/heal_hf_pull.sh # default tag
|
| 28 |
+
# TAG=hf.co/FoolDev/Thanatos-27B:Q4_K_M ./scripts/heal_hf_pull.sh
|
| 29 |
+
#
|
| 30 |
+
# Requires: ollama, jq, python3 with the `gguf` package, sha256sum.
|
| 31 |
+
set -euo pipefail
|
| 32 |
+
|
| 33 |
+
ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
|
| 34 |
+
TAG="${TAG:-hf.co/FoolDev/Thanatos-27B:Q4_K_M}"
|
| 35 |
+
OLLAMA_MODELS="${OLLAMA_MODELS:-${HOME}/.ollama/models}"
|
| 36 |
+
|
| 37 |
+
red() { printf "\033[31m%s\033[0m\n" "$*"; }
|
| 38 |
+
green() { printf "\033[32m%s\033[0m\n" "$*"; }
|
| 39 |
+
blue() { printf "\033[34m%s\033[0m\n" "$*"; }
|
| 40 |
+
|
| 41 |
+
blue "[*] tag: ${TAG}"
|
| 42 |
+
blue "[*] store: ${OLLAMA_MODELS}"
|
| 43 |
+
|
| 44 |
+
# ---- 1. Sanity ---------------------------------------------------------------
|
| 45 |
+
|
| 46 |
+
for bin in ollama jq python3 sha256sum; do
|
| 47 |
+
if ! command -v "${bin}" >/dev/null 2>&1; then
|
| 48 |
+
red "[!] missing dependency: ${bin}"; exit 1
|
| 49 |
+
fi
|
| 50 |
+
done
|
| 51 |
+
|
| 52 |
+
# ---- 2. Locate the model blob and manifest ----------------------------------
|
| 53 |
+
|
| 54 |
+
# `ollama show --modelfile` writes a FROM line with the absolute blob path.
|
| 55 |
+
# Reliable regardless of which case variant the user pulled with
|
| 56 |
+
# (hf.co's 307 lets `Thanatos-27B` and `thanatos-27b` both resolve to the
|
| 57 |
+
# canonical repo, and ollama stores the manifest under whichever case
|
| 58 |
+
# was first registered).
|
| 59 |
+
MODEL_BLOB="$(ollama show --modelfile "${TAG}" 2>/dev/null | awk '/^FROM[[:space:]]/ {print $2; exit}')"
|
| 60 |
+
if [[ -z "${MODEL_BLOB}" || ! -f "${MODEL_BLOB}" ]]; then
|
| 61 |
+
red "[!] could not resolve model blob for tag '${TAG}'."
|
| 62 |
+
red " Is the tag pulled? Try: ollama pull ${TAG}"
|
| 63 |
+
exit 1
|
| 64 |
+
fi
|
| 65 |
+
MODEL_HASH="$(basename "${MODEL_BLOB}" | sed 's/^sha256-//')"
|
| 66 |
+
blue "[*] blob: ${MODEL_BLOB}"
|
| 67 |
+
|
| 68 |
+
# Find the manifest by grepping for the model digest. The blob is
|
| 69 |
+
# referenced from exactly one tag in the heal scenario — fresh HF pull
|
| 70 |
+
# of a single :Q4_K_M tag — but if someone has multiple tags pointing
|
| 71 |
+
# at the same blob, we filter down to the one matching ${TAG}.
|
| 72 |
+
TAG_PATH="${TAG#hf.co/}" # FoolDev/Thanatos-27B:Q4_K_M
|
| 73 |
+
NAMESPACE_PATH="${TAG_PATH%:*}" # FoolDev/Thanatos-27B
|
| 74 |
+
TAG_FILE="${TAG_PATH##*:}" # Q4_K_M
|
| 75 |
+
|
| 76 |
+
MANIFEST="$(find "${OLLAMA_MODELS}/manifests/hf.co" \
|
| 77 |
+
-type f \
|
| 78 |
+
-ipath "*/${NAMESPACE_PATH}/${TAG_FILE}" 2>/dev/null | head -1)"
|
| 79 |
+
|
| 80 |
+
if [[ -z "${MANIFEST}" || ! -f "${MANIFEST}" ]]; then
|
| 81 |
+
red "[!] manifest not found under ${OLLAMA_MODELS}/manifests/hf.co for tag '${TAG}'."
|
| 82 |
+
exit 1
|
| 83 |
+
fi
|
| 84 |
+
blue "[*] manifest: ${MANIFEST}"
|
| 85 |
+
|
| 86 |
+
# ---- 3. Inspect arch ---------------------------------------------------------
|
| 87 |
+
|
| 88 |
+
ARCH="$(python3 - "${MODEL_BLOB}" <<'PY'
|
| 89 |
+
import sys
|
| 90 |
+
from gguf import GGUFReader, constants
|
| 91 |
+
r = GGUFReader(sys.argv[1], "r")
|
| 92 |
+
f = r.get_field(constants.Keys.General.ARCHITECTURE)
|
| 93 |
+
print(bytes(f.parts[f.data[0]]).decode())
|
| 94 |
+
PY
|
| 95 |
+
)"
|
| 96 |
+
blue "[*] arch: ${ARCH}"
|
| 97 |
+
|
| 98 |
+
if [[ "${ARCH}" == "qwen35" || "${ARCH}" == "qwen35moe" ]]; then
|
| 99 |
+
green "[=] already on a loadable arch (${ARCH}) — nothing to heal."
|
| 100 |
+
exit 0
|
| 101 |
+
fi
|
| 102 |
+
if [[ "${ARCH}" != "qwen36" ]]; then
|
| 103 |
+
red "[!] unexpected arch '${ARCH}' — refusing to heal. Edit this script if intentional."
|
| 104 |
+
exit 1
|
| 105 |
+
fi
|
| 106 |
+
|
| 107 |
+
# ---- 4. Rebadge to a temp blob and stage it in the store --------------------
|
| 108 |
+
|
| 109 |
+
# Stage in the repo's .cache/ rather than /tmp: the rebadged copy is the same
|
| 110 |
+
# size as the original (~17 GB), which blows past a typical tmpfs /tmp budget.
|
| 111 |
+
# .cache/ is on the same filesystem as ~/.ollama on a normal Linux home dir
|
| 112 |
+
# layout, so the final move into blobs/ is an atomic rename, not a copy.
|
| 113 |
+
SCRATCH_DIR="${ROOT}/.cache"
|
| 114 |
+
mkdir -p "${SCRATCH_DIR}"
|
| 115 |
+
TMP_BLOB="$(mktemp -p "${SCRATCH_DIR}" thanatos-heal.XXXXXX.gguf)"
|
| 116 |
+
trap 'rm -f "${TMP_BLOB}"' EXIT
|
| 117 |
+
blue "[*] rebadging qwen36 -> qwen35 (metadata only, tensors byte-identical) ..."
|
| 118 |
+
python3 "${ROOT}/scripts/rename_arch.py" \
|
| 119 |
+
--from-arch qwen36 --to-arch qwen35 \
|
| 120 |
+
"${MODEL_BLOB}" "${TMP_BLOB}"
|
| 121 |
+
|
| 122 |
+
NEW_HASH="$(sha256sum "${TMP_BLOB}" | awk '{print $1}')"
|
| 123 |
+
NEW_SIZE="$(stat -c '%s' "${TMP_BLOB}")"
|
| 124 |
+
NEW_BLOB="${OLLAMA_MODELS}/blobs/sha256-${NEW_HASH}"
|
| 125 |
+
blue "[*] new digest: sha256:${NEW_HASH}"
|
| 126 |
+
blue "[*] new size: ${NEW_SIZE}"
|
| 127 |
+
|
| 128 |
+
if [[ -f "${NEW_BLOB}" ]]; then
|
| 129 |
+
blue "[=] target blob already in store — reusing."
|
| 130 |
+
rm -f "${TMP_BLOB}"
|
| 131 |
+
else
|
| 132 |
+
mv "${TMP_BLOB}" "${NEW_BLOB}"
|
| 133 |
+
fi
|
| 134 |
+
trap - EXIT
|
| 135 |
+
|
| 136 |
+
# ---- 5. Rewrite the manifest's model layer ----------------------------------
|
| 137 |
+
|
| 138 |
+
TMP_MANIFEST="$(mktemp -t thanatos-heal-manifest.XXXXXX.json)"
|
| 139 |
+
trap 'rm -f "${TMP_MANIFEST}"' EXIT
|
| 140 |
+
jq --arg new "sha256:${NEW_HASH}" \
|
| 141 |
+
--argjson size "${NEW_SIZE}" '
|
| 142 |
+
.layers |= map(
|
| 143 |
+
if .mediaType == "application/vnd.ollama.image.model"
|
| 144 |
+
then .digest = $new | .size = $size
|
| 145 |
+
else .
|
| 146 |
+
end
|
| 147 |
+
)
|
| 148 |
+
' "${MANIFEST}" > "${TMP_MANIFEST}"
|
| 149 |
+
|
| 150 |
+
NEW_DIGEST_IN_MANIFEST="$(jq -r '
|
| 151 |
+
.layers[] | select(.mediaType == "application/vnd.ollama.image.model") | .digest
|
| 152 |
+
' "${TMP_MANIFEST}")"
|
| 153 |
+
if [[ "${NEW_DIGEST_IN_MANIFEST}" != "sha256:${NEW_HASH}" ]]; then
|
| 154 |
+
red "[!] manifest rewrite failed (digest mismatch); not committing."
|
| 155 |
+
exit 1
|
| 156 |
+
fi
|
| 157 |
+
mv "${TMP_MANIFEST}" "${MANIFEST}"
|
| 158 |
+
trap - EXIT
|
| 159 |
+
|
| 160 |
+
# ---- 6. Remove the old qwen36 blob if no other manifest references it -------
|
| 161 |
+
|
| 162 |
+
OLD_DIGEST="sha256:${MODEL_HASH}"
|
| 163 |
+
if ! grep -rlF -- "${OLD_DIGEST}" "${OLLAMA_MODELS}/manifests/" >/dev/null 2>&1; then
|
| 164 |
+
blue "[*] no other manifest references the old qwen36 blob — removing ${MODEL_BLOB}"
|
| 165 |
+
rm -f "${MODEL_BLOB}"
|
| 166 |
+
else
|
| 167 |
+
blue "[=] old qwen36 blob still referenced by another manifest — leaving in place."
|
| 168 |
+
fi
|
| 169 |
+
|
| 170 |
+
echo
|
| 171 |
+
green "[+] healed. Try it:"
|
| 172 |
+
echo " ollama run ${TAG}"
|
| 173 |
+
echo " MODEL=${TAG} make smoke"
|