Instructions to use FoolDev/Thanatos-27B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use FoolDev/Thanatos-27B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="FoolDev/Thanatos-27B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("FoolDev/Thanatos-27B", dtype="auto")

llama-cpp-python

How to use FoolDev/Thanatos-27B with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="FoolDev/Thanatos-27B",
	filename="Thanatos-27B.Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use FoolDev/Thanatos-27B with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf FoolDev/Thanatos-27B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf FoolDev/Thanatos-27B:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf FoolDev/Thanatos-27B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf FoolDev/Thanatos-27B:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf FoolDev/Thanatos-27B:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf FoolDev/Thanatos-27B:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf FoolDev/Thanatos-27B:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf FoolDev/Thanatos-27B:Q4_K_M

Use Docker

docker model run hf.co/FoolDev/Thanatos-27B:Q4_K_M

LM Studio
Jan

vLLM

How to use FoolDev/Thanatos-27B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "FoolDev/Thanatos-27B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FoolDev/Thanatos-27B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/FoolDev/Thanatos-27B:Q4_K_M

SGLang

How to use FoolDev/Thanatos-27B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "FoolDev/Thanatos-27B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FoolDev/Thanatos-27B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "FoolDev/Thanatos-27B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FoolDev/Thanatos-27B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Ollama
How to use FoolDev/Thanatos-27B with Ollama:
```
ollama run hf.co/FoolDev/Thanatos-27B:Q4_K_M
```

Unsloth Studio

How to use FoolDev/Thanatos-27B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for FoolDev/Thanatos-27B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for FoolDev/Thanatos-27B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for FoolDev/Thanatos-27B to start chatting

How to use FoolDev/Thanatos-27B with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf FoolDev/Thanatos-27B:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "FoolDev/Thanatos-27B:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use FoolDev/Thanatos-27B with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf FoolDev/Thanatos-27B:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default FoolDev/Thanatos-27B:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use FoolDev/Thanatos-27B with Docker Model Runner:
```
docker model run hf.co/FoolDev/Thanatos-27B:Q4_K_M
```

Lemonade

How to use FoolDev/Thanatos-27B with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull FoolDev/Thanatos-27B:Q4_K_M

Run and chat with the model

lemonade run user.Thanatos-27B-Q4_K_M

List all available models

lemonade list

FoolDev commited on May 6

Commit

bc0cbc6

1 Parent(s): c843f11

Rename Janus-27B → Thanatos-27B + update artwork

Browse files

Splits the dense 27B off from the Janus family naming. The 35B-A3B MoE
sibling (FoolDev/janus) keeps its name; this repo becomes its own
identity. In-repo rewrite: README, Modelfile, params/system/template,
Makefile, CITATION.cff, .gitignore, examples/, scripts/. Bundled GGUF
git-mv'd Janus-27B.Q4_K_M.gguf → Thanatos-27B.Q4_K_M.gguf (LFS pointer
preserved, no re-upload). Default Ollama tag flips
janus-27b → thanatos-27b. System prompt identity flips
"You are Janus" → "You are Thanatos" everywhere.

Sibling references back to FoolDev/janus and Janus-35B-A3B are
preserved on purpose.

banner.svg wordmark JANUS-27B → THANATOS-27B; font-size dropped 26 → 22
to keep the longer 12-char wordmark inside the same 383×77 viewBox
without encroaching on the activation-grid dots. Tokyo Night palette
unchanged. banner.png regenerated via rsvg-convert. dense-flow.svg has
no wordmark and is left unchanged.

Also adds the top-row Buy Me a Coffee badge (Tokyo Night yellow,
e0af68 on 1a1b26 labelColor) linking to buymeacoffee.com/Thanatos-27B.

HF-side action still required (cannot be done from this repo): rename
the HF repository FoolDev/janus-27b → FoolDev/thanatos-27b in the HF
Settings UI. Existing ollama pull hf.co/FoolDev/janus-27b callers will
404 after that — one-time URL break, unavoidable.

Files changed (22) hide show

.gitignore +3 -3
CHANGELOG.md +39 -0
CITATION.cff +4 -4
Makefile +3 -3
Modelfile +5 -5
README.md +30 -29
Janus-27B.Q4_K_M.gguf → Thanatos-27B.Q4_K_M.gguf +0 -0
banner.png +0 -0
banner.svg +3 -3
examples/README.md +6 -6
examples/llama_cpp_quickstart.py +4 -4
examples/llama_cpp_vision.py +4 -4
examples/ollama_chat.py +6 -6
examples/transformers_quickstart.py +5 -5
scripts/bench.sh +4 -4
scripts/build.sh +4 -4
scripts/check.sh +1 -1
scripts/check_bridge_sync.py +2 -2
scripts/fetch_vision.sh +1 -1
scripts/install-hooks.sh +1 -1
scripts/smoke_test.sh +3 -3
system +1 -1

.gitignore CHANGED Viewed

@@ -7,10 +7,10 @@ venv/
 # Local model weights. We don't redistribute the upstream Qwen GGUFs
 # here — `make build` fetches one from unsloth/Qwen3.6-27B-GGUF locally.
-# The single Janus-27B.*.gguf we DO ship backs the HF/Ollama
-# "Use this model" widget (ollama run hf.co/FoolDev/janus-27b).
 *.gguf
-!Janus-27B.*.gguf
 *.safetensors
 *.bin

 # Local model weights. We don't redistribute the upstream Qwen GGUFs
 # here — `make build` fetches one from unsloth/Qwen3.6-27B-GGUF locally.
+# The single Thanatos-27B.*.gguf we DO ship backs the HF/Ollama
+# "Use this model" widget (ollama run hf.co/FoolDev/thanatos-27b).
 *.gguf
+!Thanatos-27B.*.gguf
 *.safetensors
 *.bin

CHANGELOG.md CHANGED Viewed

@@ -7,6 +7,45 @@ and documentation**, not the underlying base model.
 ## [Unreleased]
 ### Removed
 - `Janus-27B.Q3_K_S.gguf` no longer redistributed in this repo.
   Removing it leaves `Janus-27B.Q4_K_M.gguf` as the only GGUF, which

 ## [Unreleased]
+### Renamed
+- **Project renamed `Janus-27B` → `Thanatos-27B`.** The 35B-A3B MoE
+  sibling (`FoolDev/janus`) keeps the Janus name; this repo splits off
+  as its own identity. In-repo rewrite covered: `README.md`,
+  `Modelfile`, `params` / `system` / `template`, `Makefile`,
+  `CITATION.cff`, `.gitignore`, `examples/**`, `scripts/**`. The
+  bundled GGUF was renamed `Janus-27B.Q4_K_M.gguf` →
+  `Thanatos-27B.Q4_K_M.gguf` (`git mv`, LFS pointer preserved). Default
+  Ollama tag flipped `janus-27b` → `thanatos-27b`. System prompt
+  identity flipped "You are Janus" → "You are Thanatos" everywhere
+  it appears (`Modelfile`, `system`, `examples/*.py`). Sibling
+  references back to `FoolDev/janus` and `Janus-35B-A3B` are
+  preserved on purpose — those still point at the MoE.
+- **HF-side action still required (cannot be done from this repo):**
+  rename the HF repository `FoolDev/janus-27b` → `FoolDev/thanatos-27b`
+  in HF Settings → "Rename or transfer this repository". One-time URL
+  break is unavoidable; existing
+  `ollama pull hf.co/FoolDev/janus-27b` callers will 404 after.
+### Changed
+- `banner.svg` wordmark `JANUS-27B` → `THANATOS-27B`. Font-size dropped
+  26 → 22 to keep the longer 12-char wordmark inside the same 383×77
+  viewBox without encroaching on the activation-grid dots in the upper
+  right. Tokyo Night palette unchanged: `#c0caf5` mark, `#bb9af7`
+  highlight on the suffix, gradient backdrop, animated activation grid
+  + token-stream beam preserved.
+- `banner.png` regenerated from the updated `banner.svg` via
+  `rsvg-convert` at the same 383×77 dimensions.
+- `dense-flow.svg` left unchanged — it has no wordmark, only the
+  64-layer hybrid-attention pulse visualization.
+### Added
+- README top-row badge linking to
+  [`buymeacoffee.com/Thanatos-27B`](https://buymeacoffee.com/Thanatos-27B).
+  Tokyo Night yellow (`e0af68` on `1a1b26` labelColor) so it sits
+  alongside the existing License/Base/Arch/Sibling badges without
+  visual fight. Tip-jar only — no functional change to the model or
+  tooling.
 ### Removed
 - `Janus-27B.Q3_K_S.gguf` no longer redistributed in this repo.
   Removing it leaves `Janus-27B.Q4_K_M.gguf` as the only GGUF, which

CITATION.cff CHANGED Viewed

@@ -1,14 +1,14 @@
 cff-version: 1.2.0
-title: "Janus-27B: A Dense Distillation Wrapper for Qwen 3.6 27B"
 message: "If you use this model card or its accompanying files, please cite as below."
 type: software
 authors:
   - name: FoolDev
     website: "https://huggingface.co/FoolDev"
-repository-code: "https://huggingface.co/FoolDev/janus-27b"
-url: "https://huggingface.co/FoolDev/janus-27b"
 abstract: >-
-  Janus-27B is a personal repackaging of the dense Qwen 3.6 27B base model
   with Claude Opus 4.7 in the reasoning teacher slot. The repository ships
   an Ollama Modelfile, sampling defaults, usage examples, and a single
   ready-to-run GGUF (Q4_K_M ~17 GB) so the HF "Use this model" widget

 cff-version: 1.2.0
+title: "Thanatos-27B: A Dense Distillation Wrapper for Qwen 3.6 27B"
 message: "If you use this model card or its accompanying files, please cite as below."
 type: software
 authors:
   - name: FoolDev
     website: "https://huggingface.co/FoolDev"
+repository-code: "https://huggingface.co/FoolDev/thanatos-27b"
+url: "https://huggingface.co/FoolDev/thanatos-27b"
 abstract: >-
+  Thanatos-27B is a personal repackaging of the dense Qwen 3.6 27B base model
   with Claude Opus 4.7 in the reasoning teacher slot. The repository ships
   an Ollama Modelfile, sampling defaults, usage examples, and a single
   ready-to-run GGUF (Q4_K_M ~17 GB) so the HF "Use this model" widget

Makefile CHANGED Viewed

@@ -1,11 +1,11 @@
-# Janus-27B convenience Makefile.
 #
 # All work is delegated to scripts/* — this file just gives common
 # operations short, discoverable names.
 #
 # Variables you can override on the command line:
 #   QUANT     GGUF quant suffix       (default: Q4_K_M)
-#   TAG       Ollama model tag        (default: janus-27b)
 #   GGUF_PATH path to existing GGUF   (skip the download)
 #   MODEL     model tag for smoke     (default: $(TAG))
 #
@@ -18,7 +18,7 @@
 #   make clean
 QUANT ?= Q4_K_M
-TAG   ?= janus-27b
 MODEL ?= $(TAG)
 .DEFAULT_GOAL := help

+# Thanatos-27B convenience Makefile.
 #
 # All work is delegated to scripts/* — this file just gives common
 # operations short, discoverable names.
 #
 # Variables you can override on the command line:
 #   QUANT     GGUF quant suffix       (default: Q4_K_M)
+#   TAG       Ollama model tag        (default: thanatos-27b)
 #   GGUF_PATH path to existing GGUF   (skip the download)
 #   MODEL     model tag for smoke     (default: $(TAG))
 #
 #   make clean
 QUANT ?= Q4_K_M
+TAG   ?= thanatos-27b
 MODEL ?= $(TAG)
 .DEFAULT_GOAL := help

Modelfile CHANGED Viewed

@@ -1,15 +1,15 @@
-# Janus-27B — Ollama wrapper around Qwen 3.6 27B (dense)
 #
 # Text + tool calling. Vision via Ollama is currently broken for this
 # architecture (ollama/ollama#15898 — the vendored llama.cpp fork is
 # missing the qwen35 arch entries). Use llama.cpp directly for image
 # input, or wait for the fix. See the Vision section in README.md.
 #
-# This repo bundles a single GGUF: Janus-27B.Q4_K_M.gguf (~17 GB).
 # The FROM line below points at it, so a fresh clone (with LFS smudge
 # enabled) supports the no-script path:
 #
-#     ollama create janus-27b -f Modelfile && ollama run janus-27b
 #
 # For other quants (Q3_K_S, Q5_K_M, Q6_K, etc.), `make build QUANT=Q3_K_S`
 # downloads the chosen quant from unsloth/Qwen3.6-27B-GGUF and patches
@@ -21,7 +21,7 @@
 #     https://huggingface.co/unsloth/Qwen3.6-27B-GGUF
 #     https://huggingface.co/rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled-GGUF
-FROM ./Janus-27B.Q4_K_M.gguf
 # Chat template — Qwen 3.6 ChatML in Ollama Go-template form, with the
 # tool-calling blocks Ollama's capability detector looks for. Without a
@@ -98,7 +98,7 @@ PARAMETER stop "<|im_end|>"
 PARAMETER stop "<|endoftext|>"
 PARAMETER stop "<|im_start|>"
-SYSTEM """You are Janus, a precise and capable assistant for reasoning, writing, coding, and long-form dialogue.
 Behavior rules:
 - Answer the user's actual request directly.

+# Thanatos-27B — Ollama wrapper around Qwen 3.6 27B (dense)
 #
 # Text + tool calling. Vision via Ollama is currently broken for this
 # architecture (ollama/ollama#15898 — the vendored llama.cpp fork is
 # missing the qwen35 arch entries). Use llama.cpp directly for image
 # input, or wait for the fix. See the Vision section in README.md.
 #
+# This repo bundles a single GGUF: Thanatos-27B.Q4_K_M.gguf (~17 GB).
 # The FROM line below points at it, so a fresh clone (with LFS smudge
 # enabled) supports the no-script path:
 #
+#     ollama create thanatos-27b -f Modelfile && ollama run thanatos-27b
 #
 # For other quants (Q3_K_S, Q5_K_M, Q6_K, etc.), `make build QUANT=Q3_K_S`
 # downloads the chosen quant from unsloth/Qwen3.6-27B-GGUF and patches
 #     https://huggingface.co/unsloth/Qwen3.6-27B-GGUF
 #     https://huggingface.co/rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled-GGUF
+FROM ./Thanatos-27B.Q4_K_M.gguf
 # Chat template — Qwen 3.6 ChatML in Ollama Go-template form, with the
 # tool-calling blocks Ollama's capability detector looks for. Without a
 PARAMETER stop "<|endoftext|>"
 PARAMETER stop "<|im_start|>"
+SYSTEM """You are Thanatos, a precise and capable assistant for reasoning, writing, coding, and long-form dialogue.
 Behavior rules:
 - Answer the user's actual request directly.

README.md CHANGED Viewed

@@ -44,14 +44,15 @@ library_name: transformers
 pipeline_tag: image-text-to-text
 ---
-<img src="https://huggingface.co/FoolDev/janus-27b/resolve/main/banner.svg" alt="Janus-27B banner" width="100%" />
 [![License](https://img.shields.io/badge/License-Apache_2.0-7aa2f7?style=flat&labelColor=1a1b26)](https://opensource.org/licenses/Apache-2.0)
 [![Base Model](https://img.shields.io/badge/Base-Qwen3.6--27B-bb9af7?style=flat&labelColor=1a1b26)](https://huggingface.co/Qwen/Qwen3.6-27B)
 [![Architecture](https://img.shields.io/badge/Arch-Dense_27B-ff9e64?style=flat&labelColor=1a1b26)](#architecture)
 [![Sibling](https://img.shields.io/badge/Sibling-Janus--35B-7dcfff?style=flat&labelColor=1a1b26)](https://huggingface.co/FoolDev/janus)
-# Janus-27B
 > **Dense Reasoning. Friendlier Footprint.**
 > *Qwen 3.6 27B (dense) repackaged with Claude Opus 4.7 in the teacher slot.*
@@ -68,7 +69,7 @@ template — HF's Ollama bridge ingests those three files, not
 `Modelfile`):
 ```bash
-ollama run hf.co/FoolDev/janus-27b           # ~17 GB Q4_K_M (the only bundled quant)
 ```
 For other quants (Q3_K_S ~12 GB, Q5_K_M ~20 GB, etc.), `make build
@@ -79,10 +80,10 @@ Or build locally (uses this repo's `Modelfile`, kept in sync with the
 three bridge files) for any quant:
 ```bash
-git clone https://huggingface.co/FoolDev/janus-27b && cd janus-27b
-make build                            # uses the bundled Janus-27B.Q4_K_M.gguf
 make build QUANT=Q5_K_M                # downloads from unsloth/Qwen3.6-27B-GGUF
-ollama run janus-27b
 ```
 For image input use llama.cpp directly — Ollama vision is broken for
@@ -94,7 +95,7 @@ The 35B-A3B is a sparse mixture-of-experts model: 35B parameters total but only
 The 27B is **dense**: every parameter participates in every forward pass. It's slower per token than 35B-A3B — on a Ryzen AI Max+ 395 / Radeon 8060S iGPU the dense 27B at Q3_K_S clocks ~10 tok/s, versus ~27 tok/s for the MoE 35B at ~Q4 (`make bench`, 3-prompt mix) — but the working set fits comfortably on commodity GPUs and avoids the MoE-specific load-balance failure modes.
-| | Janus-27B (this) | [Janus-35B](https://huggingface.co/FoolDev/janus) |
 |---|---|---|
 | Architecture | Dense transformer | MoE 256 experts, 8 active |
 | Total params | 27 B | 35 B |
@@ -115,7 +116,7 @@ The 27B is **dense**: every parameter participates in every forward pass. It's s
 |---|---|
 | `banner.svg` / `banner.png` | Repo header, Tokyo Night themed |
 | `Modelfile` | Ollama wrapper around the bundled Qwen 3.6 27B GGUF — used by `make build` / `ollama create` for **local** builds |
-| `template`, `system`, `params` | Used by HF's Ollama bridge when users `ollama run hf.co/FoolDev/janus-27b` directly (the bridge does **not** read `Modelfile` — see [HF Ollama docs](https://huggingface.co/docs/hub/en/ollama)). Mirrors the `Modelfile`'s template / system prompt / sampling params. |
 | `examples/` | Ready-to-run Python clients for Ollama, Transformers, and llama-cpp-python |
 | `scripts/build.sh` | One-shot helper: pulls a GGUF and runs `ollama create` for you |
 | `scripts/smoke_test.sh` | Verifies an Ollama daemon + model, runs a round-trip, asserts no chat-template tokens leak into the response. With `TOOLS_TEST=1`, also exercises an end-to-end tool-call round-trip and checks the response shape |
@@ -130,15 +131,15 @@ The 27B is **dense**: every parameter participates in every forward pass. It's s
 | `README.md` | This file |
 This repo ships two GGUFs to back the HF/Ollama "Use this model"
-widget — `Janus-27B.Q4_K_M.gguf` (~17 GB):
 ```bash
-ollama run hf.co/FoolDev/janus-27b           # 17 GB Q4_K_M (only bundled quant)
 ```
 For 16 GB GPUs / unified-memory laptops, `make build QUANT=Q3_K_S`
 downloads the smaller ~12 GB Q3_K_S quant from `unsloth/Qwen3.6-27B-GGUF`
-and creates a local `janus-27b` Ollama tag (does not redistribute via
 this repo).
 For other quants or local builds, pull from
@@ -152,7 +153,7 @@ If you want the safetensors for `transformers`, fetch them from [`Qwen/Qwen3.6-2
 ## Architecture
 <p align="left">
-  <img src="https://huggingface.co/FoolDev/janus-27b/resolve/main/dense-flow.svg" alt="animated dense forward-pass visualization: 64-layer hybrid attention stack with a pulse traversing left-to-right, illuminating Gated DeltaNet (purple) and Gated Attention (cyan) layers in turn" width="800" />
 </p>
 - Qwen 3.6 dense, 27B parameters, 64 transformer layers
@@ -177,14 +178,14 @@ Two paths:
 ```bash
 # A. Pull straight from HF (uses the bundled Q4_K_M + root-level
 #    template / system / params files):
-ollama run hf.co/FoolDev/janus-27b           # 17 GB Q4_K_M (only bundled quant)
 # B. Build locally for a different quant (downloads from unsloth):
-make build                                              # Q4_K_M  -> janus-27b
 make build QUANT=Q3_K_S                                 # 12 GB smaller quant
 make build QUANT=Q5_K_M                                 # 20 GB higher quality
 make build GGUF_PATH=~/models/Qwen3.6-27B-Q4_K_M.gguf   # skip download
-ollama run janus-27b
 ```
 Under the hood, `make build` calls `scripts/build.sh`, which downloads the
@@ -192,7 +193,7 @@ GGUF if missing (set `GGUF_PATH` to point at one you already have) and
 runs `ollama create` with the matching `Modelfile`.
 If you'd rather do it by hand: edit the `FROM` line in `Modelfile` and
-run `ollama create janus-27b -f Modelfile && ollama run janus-27b`.
 Confirm everything works:
@@ -205,15 +206,15 @@ python examples/ollama_chat.py      # full demo: chat, streaming, tools, OpenAI-
 ### Local apps
-The bundled `Janus-27B.Q4_K_M.gguf` works in any GGUF-compatible local
 app — point it at this repo and load.
 | App | How to load this model |
 |---|---|
-| **Ollama** | `ollama run hf.co/FoolDev/janus-27b` (default Q4_K_M). Pulls the GGUF + the root-level `template` / `system` / `params` files in one step (HF's Ollama bridge ingests these three files; it does **not** read `Modelfile`). For other quants, `make build QUANT=Q3_K_S` downloads from unsloth and creates a local Ollama tag using the `Modelfile`, which is kept in sync with the bridge files. |
-| **LM Studio** | Search → `FoolDev/janus-27b` → pick `Janus-27B.Q4_K_M.gguf`. Uses the GGUF's embedded jinja chat template (Qwen 3.6 ChatML); set the system prompt manually from the `SYSTEM` block in this repo's `Modelfile`. |
-| **Jan** | Hub → "Import from Hugging Face" → `FoolDev/janus-27b`. Same template behavior as LM Studio. |
-| **llama.cpp** | `hf download FoolDev/janus-27b Janus-27B.Q4_K_M.gguf --local-dir .` then `llama-server -m Janus-27B.Q4_K_M.gguf` (or `llama-cli`, `llama-mtmd-cli` for vision via the upstream `mmproj-F16.gguf`). |
 | **llama-cpp-python** | See `examples/llama_cpp_quickstart.py` (text) and `examples/llama_cpp_vision.py` (image input). |
 | **Open WebUI / KoboldCpp / text-generation-webui** | Standard llama.cpp loader path — point at the GGUF, use the embedded chat template. |
@@ -231,9 +232,9 @@ external schema.
 curl -s http://localhost:11434/v1/chat/completions \
   -H 'Content-Type: application/json' \
   -d '{
-    "model": "janus-27b",
     "messages": [
-      {"role": "system", "content": "You are Janus, a precise reasoning assistant."},
       {"role": "user", "content": "Explain the Burrows-Wheeler transform in 200 words."}
     ],
     "temperature": 0.6
@@ -255,7 +256,7 @@ The Modelfile bakes this in. Override per-request via the `system` role
 in your client:
 ```text
-You are Janus, a precise and capable assistant for reasoning, writing, coding, and long-form dialogue.
 Behavior rules:
 - Answer the user's actual request directly.
@@ -313,7 +314,7 @@ for this model.
 ## Hardware requirements
-The dense 27B is the easier of the two Janus models to deploy.
 | Hardware | Status |
 |---|---|
@@ -344,10 +345,10 @@ Ollama is the exception: its conversion of the embedded jinja loses the
 `.Tools` / `.ToolCalls` blocks Ollama's capability detector requires.
 Two paths fix this, depending on how you pull the model:
-- **`ollama run hf.co/FoolDev/janus-27b`** — HF's Ollama bridge applies
   the root-level `template` / `system` / `params` files in this repo
   (the bridge does **not** read `Modelfile`).
-- **`make build` / `ollama create janus-27b -f Modelfile`** — uses the
   `Modelfile`'s `TEMPLATE` block.
 Both routes wire `.Tools` / `.ToolCalls` and tools work end-to-end on
@@ -358,7 +359,7 @@ kept in sync: edit them together if you change one.
 ```text
 <|im_start|>system
-You are Janus, a precise and capable assistant…<|im_end|>
 <|im_start|>user
 What is the time complexity of mergesort?<|im_end|>
 <|im_start|>assistant
@@ -390,7 +391,7 @@ the model adapts to whichever shape the system prompt prescribes.
 **Ollama path** (this repo's `Modelfile`). The `TEMPLATE` directive
 prompts the model to emit JSON-in-XML, the form Ollama's tool-call
 extractor parses into a structured `tool_calls` array. After
-`make build`, `ollama show janus-27b` lists `tools` and `thinking`
 under **Capabilities**, and both `/api/chat` and `/v1/chat/completions`
 accept a `tools` array.

 pipeline_tag: image-text-to-text
 ---
+<img src="https://huggingface.co/FoolDev/thanatos-27b/resolve/main/banner.svg" alt="Thanatos-27B banner" width="100%" />
 [![License](https://img.shields.io/badge/License-Apache_2.0-7aa2f7?style=flat&labelColor=1a1b26)](https://opensource.org/licenses/Apache-2.0)
 [![Base Model](https://img.shields.io/badge/Base-Qwen3.6--27B-bb9af7?style=flat&labelColor=1a1b26)](https://huggingface.co/Qwen/Qwen3.6-27B)
 [![Architecture](https://img.shields.io/badge/Arch-Dense_27B-ff9e64?style=flat&labelColor=1a1b26)](#architecture)
 [![Sibling](https://img.shields.io/badge/Sibling-Janus--35B-7dcfff?style=flat&labelColor=1a1b26)](https://huggingface.co/FoolDev/janus)
+[![Buy me a coffee](https://img.shields.io/badge/Buy_me_a_coffee-e0af68?style=flat&labelColor=1a1b26&logo=buymeacoffee&logoColor=1a1b26)](https://buymeacoffee.com/Thanatos-27B)
+# Thanatos-27B
 > **Dense Reasoning. Friendlier Footprint.**
 > *Qwen 3.6 27B (dense) repackaged with Claude Opus 4.7 in the teacher slot.*
 `Modelfile`):
 ```bash
+ollama run hf.co/FoolDev/thanatos-27b           # ~17 GB Q4_K_M (the only bundled quant)
 ```
 For other quants (Q3_K_S ~12 GB, Q5_K_M ~20 GB, etc.), `make build
 three bridge files) for any quant:
 ```bash
+git clone https://huggingface.co/FoolDev/thanatos-27b && cd thanatos-27b
+make build                            # uses the bundled Thanatos-27B.Q4_K_M.gguf
 make build QUANT=Q5_K_M                # downloads from unsloth/Qwen3.6-27B-GGUF
+ollama run thanatos-27b
 ```
 For image input use llama.cpp directly — Ollama vision is broken for
 The 27B is **dense**: every parameter participates in every forward pass. It's slower per token than 35B-A3B — on a Ryzen AI Max+ 395 / Radeon 8060S iGPU the dense 27B at Q3_K_S clocks ~10 tok/s, versus ~27 tok/s for the MoE 35B at ~Q4 (`make bench`, 3-prompt mix) — but the working set fits comfortably on commodity GPUs and avoids the MoE-specific load-balance failure modes.
+| | Thanatos-27B (this) | [Janus-35B](https://huggingface.co/FoolDev/janus) |
 |---|---|---|
 | Architecture | Dense transformer | MoE 256 experts, 8 active |
 | Total params | 27 B | 35 B |
 |---|---|
 | `banner.svg` / `banner.png` | Repo header, Tokyo Night themed |
 | `Modelfile` | Ollama wrapper around the bundled Qwen 3.6 27B GGUF — used by `make build` / `ollama create` for **local** builds |
+| `template`, `system`, `params` | Used by HF's Ollama bridge when users `ollama run hf.co/FoolDev/thanatos-27b` directly (the bridge does **not** read `Modelfile` — see [HF Ollama docs](https://huggingface.co/docs/hub/en/ollama)). Mirrors the `Modelfile`'s template / system prompt / sampling params. |
 | `examples/` | Ready-to-run Python clients for Ollama, Transformers, and llama-cpp-python |
 | `scripts/build.sh` | One-shot helper: pulls a GGUF and runs `ollama create` for you |
 | `scripts/smoke_test.sh` | Verifies an Ollama daemon + model, runs a round-trip, asserts no chat-template tokens leak into the response. With `TOOLS_TEST=1`, also exercises an end-to-end tool-call round-trip and checks the response shape |
 | `README.md` | This file |
 This repo ships two GGUFs to back the HF/Ollama "Use this model"
+widget — `Thanatos-27B.Q4_K_M.gguf` (~17 GB):
 ```bash
+ollama run hf.co/FoolDev/thanatos-27b           # 17 GB Q4_K_M (only bundled quant)
 ```
 For 16 GB GPUs / unified-memory laptops, `make build QUANT=Q3_K_S`
 downloads the smaller ~12 GB Q3_K_S quant from `unsloth/Qwen3.6-27B-GGUF`
+and creates a local `thanatos-27b` Ollama tag (does not redistribute via
 this repo).
 For other quants or local builds, pull from
 ## Architecture
 <p align="left">
+  <img src="https://huggingface.co/FoolDev/thanatos-27b/resolve/main/dense-flow.svg" alt="animated dense forward-pass visualization: 64-layer hybrid attention stack with a pulse traversing left-to-right, illuminating Gated DeltaNet (purple) and Gated Attention (cyan) layers in turn" width="800" />
 </p>
 - Qwen 3.6 dense, 27B parameters, 64 transformer layers
 ```bash
 # A. Pull straight from HF (uses the bundled Q4_K_M + root-level
 #    template / system / params files):
+ollama run hf.co/FoolDev/thanatos-27b           # 17 GB Q4_K_M (only bundled quant)
 # B. Build locally for a different quant (downloads from unsloth):
+make build                                              # Q4_K_M  -> thanatos-27b
 make build QUANT=Q3_K_S                                 # 12 GB smaller quant
 make build QUANT=Q5_K_M                                 # 20 GB higher quality
 make build GGUF_PATH=~/models/Qwen3.6-27B-Q4_K_M.gguf   # skip download
+ollama run thanatos-27b
 ```
 Under the hood, `make build` calls `scripts/build.sh`, which downloads the
 runs `ollama create` with the matching `Modelfile`.
 If you'd rather do it by hand: edit the `FROM` line in `Modelfile` and
+run `ollama create thanatos-27b -f Modelfile && ollama run thanatos-27b`.
 Confirm everything works:
 ### Local apps
+The bundled `Thanatos-27B.Q4_K_M.gguf` works in any GGUF-compatible local
 app — point it at this repo and load.
 | App | How to load this model |
 |---|---|
+| **Ollama** | `ollama run hf.co/FoolDev/thanatos-27b` (default Q4_K_M). Pulls the GGUF + the root-level `template` / `system` / `params` files in one step (HF's Ollama bridge ingests these three files; it does **not** read `Modelfile`). For other quants, `make build QUANT=Q3_K_S` downloads from unsloth and creates a local Ollama tag using the `Modelfile`, which is kept in sync with the bridge files. |
+| **LM Studio** | Search → `FoolDev/thanatos-27b` → pick `Thanatos-27B.Q4_K_M.gguf`. Uses the GGUF's embedded jinja chat template (Qwen 3.6 ChatML); set the system prompt manually from the `SYSTEM` block in this repo's `Modelfile`. |
+| **Jan** | Hub → "Import from Hugging Face" → `FoolDev/thanatos-27b`. Same template behavior as LM Studio. |
+| **llama.cpp** | `hf download FoolDev/thanatos-27b Thanatos-27B.Q4_K_M.gguf --local-dir .` then `llama-server -m Thanatos-27B.Q4_K_M.gguf` (or `llama-cli`, `llama-mtmd-cli` for vision via the upstream `mmproj-F16.gguf`). |
 | **llama-cpp-python** | See `examples/llama_cpp_quickstart.py` (text) and `examples/llama_cpp_vision.py` (image input). |
 | **Open WebUI / KoboldCpp / text-generation-webui** | Standard llama.cpp loader path — point at the GGUF, use the embedded chat template. |
 curl -s http://localhost:11434/v1/chat/completions \
   -H 'Content-Type: application/json' \
   -d '{
+    "model": "thanatos-27b",
     "messages": [
+      {"role": "system", "content": "You are Thanatos, a precise reasoning assistant."},
       {"role": "user", "content": "Explain the Burrows-Wheeler transform in 200 words."}
     ],
     "temperature": 0.6
 in your client:
 ```text
+You are Thanatos, a precise and capable assistant for reasoning, writing, coding, and long-form dialogue.
 Behavior rules:
 - Answer the user's actual request directly.
 ## Hardware requirements
+The dense 27B is the lighter sibling to Janus-35B and the easier of the two to deploy.
 | Hardware | Status |
 |---|---|
 `.Tools` / `.ToolCalls` blocks Ollama's capability detector requires.
 Two paths fix this, depending on how you pull the model:
+- **`ollama run hf.co/FoolDev/thanatos-27b`** — HF's Ollama bridge applies
   the root-level `template` / `system` / `params` files in this repo
   (the bridge does **not** read `Modelfile`).
+- **`make build` / `ollama create thanatos-27b -f Modelfile`** — uses the
   `Modelfile`'s `TEMPLATE` block.
 Both routes wire `.Tools` / `.ToolCalls` and tools work end-to-end on
 ```text
 <|im_start|>system
+You are Thanatos, a precise and capable assistant…<|im_end|>
 <|im_start|>user
 What is the time complexity of mergesort?<|im_end|>
 <|im_start|>assistant
 **Ollama path** (this repo's `Modelfile`). The `TEMPLATE` directive
 prompts the model to emit JSON-in-XML, the form Ollama's tool-call
 extractor parses into a structured `tool_calls` array. After
+`make build`, `ollama show thanatos-27b` lists `tools` and `thinking`
 under **Capabilities**, and both `/api/chat` and `/v1/chat/completions`
 accept a `tools` array.

Janus-27B.Q4_K_M.gguf → Thanatos-27B.Q4_K_M.gguf RENAMED Viewed

File without changes

banner.png CHANGED Viewed

banner.svg CHANGED Viewed

examples/README.md CHANGED Viewed

@@ -1,15 +1,15 @@
-# Janus-27B examples
 Four minimal entry points. Pick the one that matches how you run models.
 | File | Backend | When to use |
 |---|---|---|
-| `ollama_chat.py` | Ollama HTTP API | You already have `ollama serve` running and the `janus-27b` model created from the project `Modelfile`. **Text + tool calling** — vision via Ollama is broken upstream for this arch. |
 | `transformers_quickstart.py` | Hugging Face Transformers | You want to run the upstream safetensors (`Qwen/Qwen3.6-27B`) on GPU, optionally in 4-bit via bitsandbytes. |
 | `llama_cpp_quickstart.py` | llama-cpp-python | You want to invoke a local GGUF directly without a daemon (CI, batch jobs, scripts). Text only. |
 | `llama_cpp_vision.py` | llama-cpp-python + mmproj | **Image input.** Loads a text GGUF + `mmproj-F16.gguf` and answers questions about an image. The only working vision path right now. |
-All three apply the same Janus system prompt and sampling defaults
 (`temp=0.6, top_p=0.95, top_k=20, repeat_penalty=1.05`) so behavior should
 be consistent across backends modulo quantization noise.
@@ -21,15 +21,15 @@ Easiest path — pull straight from HF (gets the bundled Q4_K_M GGUF +
 this repo's Modelfile in one step):
 ```bash
-ollama pull hf.co/FoolDev/janus-27b           # 17 GB Q4_K_M (only bundled quant)
 pip install requests
-MODEL=hf.co/FoolDev/janus-27b python ollama_chat.py
 ```
 For the smaller-footprint Q3_K_S (~12 GB) or other quants, build
 locally instead — see the parent repo's `make build QUANT=...` flow.
-Or build locally from this repo (uses the bundled `Janus-27B.Q4_K_M.gguf`,
 no edits required):
 ```bash

+# Thanatos-27B examples
 Four minimal entry points. Pick the one that matches how you run models.
 | File | Backend | When to use |
 |---|---|---|
+| `ollama_chat.py` | Ollama HTTP API | You already have `ollama serve` running and the `thanatos-27b` model created from the project `Modelfile`. **Text + tool calling** — vision via Ollama is broken upstream for this arch. |
 | `transformers_quickstart.py` | Hugging Face Transformers | You want to run the upstream safetensors (`Qwen/Qwen3.6-27B`) on GPU, optionally in 4-bit via bitsandbytes. |
 | `llama_cpp_quickstart.py` | llama-cpp-python | You want to invoke a local GGUF directly without a daemon (CI, batch jobs, scripts). Text only. |
 | `llama_cpp_vision.py` | llama-cpp-python + mmproj | **Image input.** Loads a text GGUF + `mmproj-F16.gguf` and answers questions about an image. The only working vision path right now. |
+All three apply the same Thanatos system prompt and sampling defaults
 (`temp=0.6, top_p=0.95, top_k=20, repeat_penalty=1.05`) so behavior should
 be consistent across backends modulo quantization noise.
 this repo's Modelfile in one step):
 ```bash
+ollama pull hf.co/FoolDev/thanatos-27b           # 17 GB Q4_K_M (only bundled quant)
 pip install requests
+MODEL=hf.co/FoolDev/thanatos-27b python ollama_chat.py
 ```
 For the smaller-footprint Q3_K_S (~12 GB) or other quants, build
 locally instead — see the parent repo's `make build QUANT=...` flow.
+Or build locally from this repo (uses the bundled `Thanatos-27B.Q4_K_M.gguf`,
 no edits required):
 ```bash

examples/llama_cpp_quickstart.py CHANGED Viewed

@@ -1,6 +1,6 @@
 #!/usr/bin/env python3
 """
-Janus-27B — llama-cpp-python quickstart.
 Skip Ollama entirely and call the GGUF directly through llama-cpp-python.
 Useful for batch jobs, CI, or environments where you don't want a daemon.
@@ -29,8 +29,8 @@ except ImportError:  # pragma: no cover
     sys.exit("Missing llama-cpp-python. Install with: pip install llama-cpp-python")
-JANUS_SYSTEM = (
-    "You are Janus, a precise and capable assistant for reasoning, writing, "
     "coding, and long-form dialogue.\n\n"
     "Behavior rules:\n"
     "- Answer the user's actual request directly.\n"
@@ -68,7 +68,7 @@ def main() -> None:
     out = llm.create_chat_completion(
         messages=[
-            {"role": "system", "content": JANUS_SYSTEM},
             {"role": "user", "content": args.prompt},
         ],
         temperature=0.6,

 #!/usr/bin/env python3
 """
+Thanatos-27B — llama-cpp-python quickstart.
 Skip Ollama entirely and call the GGUF directly through llama-cpp-python.
 Useful for batch jobs, CI, or environments where you don't want a daemon.
     sys.exit("Missing llama-cpp-python. Install with: pip install llama-cpp-python")
+THANATOS_SYSTEM = (
+    "You are Thanatos, a precise and capable assistant for reasoning, writing, "
     "coding, and long-form dialogue.\n\n"
     "Behavior rules:\n"
     "- Answer the user's actual request directly.\n"
     out = llm.create_chat_completion(
         messages=[
+            {"role": "system", "content": THANATOS_SYSTEM},
             {"role": "user", "content": args.prompt},
         ],
         temperature=0.6,

examples/llama_cpp_vision.py CHANGED Viewed

@@ -1,6 +1,6 @@
 #!/usr/bin/env python3
 """
-Janus-27B — vision (image-text-to-text) via llama-cpp-python.
 Why this script exists:
     Ollama 0.22's vendored llama.cpp fork is missing the qwen35/qwen35moe
@@ -56,8 +56,8 @@ except ImportError:  # pragma: no cover
     )
-JANUS_SYSTEM = (
-    "You are Janus, a precise vision-language assistant. Describe images "
     "accurately, do not invent details, and ground every claim in the "
     "pixels you can actually see."
 )
@@ -104,7 +104,7 @@ def main() -> None:
     out = llm.create_chat_completion(
         messages=[
-            {"role": "system", "content": JANUS_SYSTEM},
             {
                 "role": "user",
                 "content": [

 #!/usr/bin/env python3
 """
+Thanatos-27B — vision (image-text-to-text) via llama-cpp-python.
 Why this script exists:
     Ollama 0.22's vendored llama.cpp fork is missing the qwen35/qwen35moe
     )
+THANATOS_SYSTEM = (
+    "You are Thanatos, a precise vision-language assistant. Describe images "
     "accurately, do not invent details, and ground every claim in the "
     "pixels you can actually see."
 )
     out = llm.create_chat_completion(
         messages=[
+            {"role": "system", "content": THANATOS_SYSTEM},
             {
                 "role": "user",
                 "content": [

examples/ollama_chat.py CHANGED Viewed

@@ -1,17 +1,17 @@
 #!/usr/bin/env python3
 """
-Janus-27B — Ollama chat examples.
 Prerequisites (pick one):
     A. From the bundled GGUFs (default flow):
-        $ make build                     # uses Janus-27B.Q4_K_M.gguf
         # or:
-        $ ollama create janus-27b -f ../Modelfile
     B. Pull straight from HF (Q4_K_M is the only bundled quant):
-        $ ollama run hf.co/FoolDev/janus-27b
-        # then set MODEL=hf.co/FoolDev/janus-27b below
 Then:
     $ ollama serve         # usually already running
@@ -36,7 +36,7 @@ from typing import Any, Iterator
 import requests
-MODEL = os.environ.get("MODEL", "janus-27b")
 HOST = os.environ.get("HOST", "http://localhost:11434")
 _THINK_RE = re.compile(r"<think>.*?</think>\s*", re.DOTALL)

 #!/usr/bin/env python3
 """
+Thanatos-27B — Ollama chat examples.
 Prerequisites (pick one):
     A. From the bundled GGUFs (default flow):
+        $ make build                     # uses Thanatos-27B.Q4_K_M.gguf
         # or:
+        $ ollama create thanatos-27b -f ../Modelfile
     B. Pull straight from HF (Q4_K_M is the only bundled quant):
+        $ ollama run hf.co/FoolDev/thanatos-27b
+        # then set MODEL=hf.co/FoolDev/thanatos-27b below
 Then:
     $ ollama serve         # usually already running
 import requests
+MODEL = os.environ.get("MODEL", "thanatos-27b")
 HOST = os.environ.get("HOST", "http://localhost:11434")
 _THINK_RE = re.compile(r"<think>.*?</think>\s*", re.DOTALL)

examples/transformers_quickstart.py CHANGED Viewed

@@ -1,9 +1,9 @@
 #!/usr/bin/env python3
 """
-Janus-27B — Hugging Face Transformers quickstart.
 Loads the upstream Qwen 3.6 27B safetensors directly and runs a single
-chat turn using its embedded chat template. Janus-27B is a *wrapper*
 around that base, so for the transformers route there is nothing to
 download from this repo — point at Qwen/Qwen3.6-27B and apply the same
 system prompt the Modelfile uses.
@@ -38,8 +38,8 @@ except ImportError as e:  # pragma: no cover
 MODEL_ID = "Qwen/Qwen3.6-27B"
-JANUS_SYSTEM = (
-    "You are Janus, a precise and capable assistant for reasoning, writing, "
     "coding, and long-form dialogue.\n\n"
     "Behavior rules:\n"
     "- Answer the user's actual request directly.\n"
@@ -75,7 +75,7 @@ def load(use_4bit: bool):
 def generate(tok, model, prompt: str, max_new_tokens: int = 512) -> str:
     messages = [
-        {"role": "system", "content": JANUS_SYSTEM},
         {"role": "user", "content": prompt},
     ]
     inputs = tok.apply_chat_template(

 #!/usr/bin/env python3
 """
+Thanatos-27B — Hugging Face Transformers quickstart.
 Loads the upstream Qwen 3.6 27B safetensors directly and runs a single
+chat turn using its embedded chat template. Thanatos-27B is a *wrapper*
 around that base, so for the transformers route there is nothing to
 download from this repo — point at Qwen/Qwen3.6-27B and apply the same
 system prompt the Modelfile uses.
 MODEL_ID = "Qwen/Qwen3.6-27B"
+THANATOS_SYSTEM = (
+    "You are Thanatos, a precise and capable assistant for reasoning, writing, "
     "coding, and long-form dialogue.\n\n"
     "Behavior rules:\n"
     "- Answer the user's actual request directly.\n"
 def generate(tok, model, prompt: str, max_new_tokens: int = 512) -> str:
     messages = [
+        {"role": "system", "content": THANATOS_SYSTEM},
         {"role": "user", "content": prompt},
     ]
     inputs = tok.apply_chat_template(

scripts/bench.sh CHANGED Viewed

@@ -1,5 +1,5 @@
 #!/usr/bin/env bash
-# Janus-27B — tok/s benchmark via Ollama.
 #
 # Reads timing from Ollama's /api/chat response metadata (eval_count and
 # eval_duration are authoritative — no client-side stopwatch noise) and
@@ -7,14 +7,14 @@
 # number generalises a bit beyond a single shape.
 #
 # Usage:
-#   ./scripts/bench.sh                       # uses MODEL=janus-27b
-#   MODEL=janus-27b ./scripts/bench.sh
 #   HOST=http://localhost:11434 ./scripts/bench.sh
 #
 # Requires: curl, jq, a running Ollama daemon with the model created.
 set -euo pipefail
-MODEL="${MODEL:-janus-27b}"
 HOST="${HOST:-http://localhost:11434}"
 red()   { printf "\033[31m%s\033[0m\n" "$*" >&2; }

 #!/usr/bin/env bash
+# Thanatos-27B — tok/s benchmark via Ollama.
 #
 # Reads timing from Ollama's /api/chat response metadata (eval_count and
 # eval_duration are authoritative — no client-side stopwatch noise) and
 # number generalises a bit beyond a single shape.
 #
 # Usage:
+#   ./scripts/bench.sh                       # uses MODEL=thanatos-27b
+#   MODEL=thanatos-27b ./scripts/bench.sh
 #   HOST=http://localhost:11434 ./scripts/bench.sh
 #
 # Requires: curl, jq, a running Ollama daemon with the model created.
 set -euo pipefail
+MODEL="${MODEL:-thanatos-27b}"
 HOST="${HOST:-http://localhost:11434}"
 red()   { printf "\033[31m%s\033[0m\n" "$*" >&2; }

scripts/build.sh CHANGED Viewed

@@ -1,5 +1,5 @@
 #!/usr/bin/env bash
-# Janus-27B — fetch a Qwen 3.6 27B GGUF and build the Ollama model.
 #
 # Usage:
 #   ./scripts/build.sh                       # default: Q4_K_M
@@ -28,7 +28,7 @@ ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
 GGUF_PATH="${GGUF_PATH:-${ROOT}/${GGUF_NAME}}"
 MODELFILE="${ROOT}/Modelfile"
-TAG="${TAG:-janus-27b}"
 echo "[*] repo:     ${REPO_ID}"
 echo "[*] quant:    ${QUANT}"
@@ -81,7 +81,7 @@ fi
 # ---- 3. Patch the Modelfile FROM line in a temp copy -------------------------
-TMP_MODELFILE="$(mktemp -t janus27b-modelfile.XXXXXX)"
 trap 'rm -f "${TMP_MODELFILE}"' EXIT
 awk -v p="${GGUF_PATH}" '
     /^FROM[[:space:]]/ && !done { print "FROM " p; done=1; next }
@@ -96,4 +96,4 @@ ollama create "${TAG}" -f "${TMP_MODELFILE}"
 echo
 echo "[+] Done. Try it:"
 echo "    ollama run ${TAG}"
-echo "    python ${ROOT}/examples/ollama_chat.py   # update MODEL constant if not 'janus-27b'"

 #!/usr/bin/env bash
+# Thanatos-27B — fetch a Qwen 3.6 27B GGUF and build the Ollama model.
 #
 # Usage:
 #   ./scripts/build.sh                       # default: Q4_K_M
 GGUF_PATH="${GGUF_PATH:-${ROOT}/${GGUF_NAME}}"
 MODELFILE="${ROOT}/Modelfile"
+TAG="${TAG:-thanatos-27b}"
 echo "[*] repo:     ${REPO_ID}"
 echo "[*] quant:    ${QUANT}"
 # ---- 3. Patch the Modelfile FROM line in a temp copy -------------------------
+TMP_MODELFILE="$(mktemp -t thanatos27b-modelfile.XXXXXX)"
 trap 'rm -f "${TMP_MODELFILE}"' EXIT
 awk -v p="${GGUF_PATH}" '
     /^FROM[[:space:]]/ && !done { print "FROM " p; done=1; next }
 echo
 echo "[+] Done. Try it:"
 echo "    ollama run ${TAG}"
+echo "    python ${ROOT}/examples/ollama_chat.py   # update MODEL constant if not 'thanatos-27b'"

scripts/check.sh CHANGED Viewed

@@ -1,5 +1,5 @@
 #!/usr/bin/env bash
-# Janus-27B — repo-local sanity checks.
 #
 # Runs everything that's cheap and catches a real-world bug we've already hit:
 #

 #!/usr/bin/env bash
+# Thanatos-27B — repo-local sanity checks.
 #
 # Runs everything that's cheap and catches a real-world bug we've already hit:
 #

scripts/check_bridge_sync.py CHANGED Viewed

@@ -1,13 +1,13 @@
 #!/usr/bin/env python3
 """
-Janus-27B — verify Modelfile and HF Ollama bridge files stay in sync.
 The repo ships two parallel Ollama configurations:
   - ``Modelfile`` is consumed by the local-build path (``ollama create -f Modelfile``).
     It contains ``TEMPLATE`` / ``SYSTEM`` / ``PARAMETER`` directives.
   - ``template`` / ``system`` / ``params`` at the repo root are consumed by HF's
-    Ollama bridge when users ``ollama run hf.co/FoolDev/janus-27b`` directly. HF
     does NOT read the Modelfile (per https://huggingface.co/docs/hub/en/ollama).
 If the two configurations drift apart, ``hf.co/...`` users and ``make build``

 #!/usr/bin/env python3
 """
+Thanatos-27B — verify Modelfile and HF Ollama bridge files stay in sync.
 The repo ships two parallel Ollama configurations:
   - ``Modelfile`` is consumed by the local-build path (``ollama create -f Modelfile``).
     It contains ``TEMPLATE`` / ``SYSTEM`` / ``PARAMETER`` directives.
   - ``template`` / ``system`` / ``params`` at the repo root are consumed by HF's
+    Ollama bridge when users ``ollama run hf.co/FoolDev/thanatos-27b`` directly. HF
     does NOT read the Modelfile (per https://huggingface.co/docs/hub/en/ollama).
 If the two configurations drift apart, ``hf.co/...`` users and ``make build``

scripts/fetch_vision.sh CHANGED Viewed

@@ -1,5 +1,5 @@
 #!/usr/bin/env bash
-# Janus-27B — fetch the vision projector (mmproj) for image input.
 #
 # Why this is separate from build.sh:
 #   build.sh is for the Ollama text path. The mmproj is only useful for

 #!/usr/bin/env bash
+# Thanatos-27B — fetch the vision projector (mmproj) for image input.
 #
 # Why this is separate from build.sh:
 #   build.sh is for the Ollama text path. The mmproj is only useful for

scripts/install-hooks.sh CHANGED Viewed

@@ -1,5 +1,5 @@
 #!/usr/bin/env bash
-# Janus-27B — install scripts/check.sh as a git pre-commit hook.
 #
 # Idempotent. Re-runs are safe.
 set -euo pipefail

 #!/usr/bin/env bash
+# Thanatos-27B — install scripts/check.sh as a git pre-commit hook.
 #
 # Idempotent. Re-runs are safe.
 set -euo pipefail

scripts/smoke_test.sh CHANGED Viewed

@@ -1,5 +1,5 @@
 #!/usr/bin/env bash
-# Janus-27B — smoke test against a running Ollama daemon.
 #
 # Verifies:
 #   1. The Ollama server is reachable.
@@ -14,11 +14,11 @@
 # Usage:
 #   ./scripts/smoke_test.sh                       # fast checks only
 #   TOOLS_TEST=1 ./scripts/smoke_test.sh          # add tool-call round-trip
-#   MODEL=hf.co/FoolDev/janus-27b:Q4_K_M ./scripts/smoke_test.sh
 #   HOST=http://localhost:11434 ./scripts/smoke_test.sh
 set -euo pipefail
-MODEL="${MODEL:-janus-27b}"
 HOST="${HOST:-http://localhost:11434}"
 PROMPT="${PROMPT:-Reply with the single word: OK}"

 #!/usr/bin/env bash
+# Thanatos-27B — smoke test against a running Ollama daemon.
 #
 # Verifies:
 #   1. The Ollama server is reachable.
 # Usage:
 #   ./scripts/smoke_test.sh                       # fast checks only
 #   TOOLS_TEST=1 ./scripts/smoke_test.sh          # add tool-call round-trip
+#   MODEL=hf.co/FoolDev/thanatos-27b:Q4_K_M ./scripts/smoke_test.sh
 #   HOST=http://localhost:11434 ./scripts/smoke_test.sh
 set -euo pipefail
+MODEL="${MODEL:-thanatos-27b}"
 HOST="${HOST:-http://localhost:11434}"
 PROMPT="${PROMPT:-Reply with the single word: OK}"

system CHANGED Viewed

@@ -1,4 +1,4 @@
-You are Janus, a precise and capable assistant for reasoning, writing, coding, and long-form dialogue.
 Behavior rules:
 - Answer the user's actual request directly.

+You are Thanatos, a precise and capable assistant for reasoning, writing, coding, and long-form dialogue.
 Behavior rules:
 - Answer the user's actual request directly.