Instructions to use FoolDev/Thanatos-27B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use FoolDev/Thanatos-27B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="FoolDev/Thanatos-27B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("FoolDev/Thanatos-27B", dtype="auto") - llama-cpp-python
How to use FoolDev/Thanatos-27B with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="FoolDev/Thanatos-27B", filename="Thanatos-27B.Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use FoolDev/Thanatos-27B with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf FoolDev/Thanatos-27B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf FoolDev/Thanatos-27B:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf FoolDev/Thanatos-27B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf FoolDev/Thanatos-27B:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf FoolDev/Thanatos-27B:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf FoolDev/Thanatos-27B:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf FoolDev/Thanatos-27B:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf FoolDev/Thanatos-27B:Q4_K_M
Use Docker
docker model run hf.co/FoolDev/Thanatos-27B:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use FoolDev/Thanatos-27B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "FoolDev/Thanatos-27B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FoolDev/Thanatos-27B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/FoolDev/Thanatos-27B:Q4_K_M
- SGLang
How to use FoolDev/Thanatos-27B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "FoolDev/Thanatos-27B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FoolDev/Thanatos-27B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "FoolDev/Thanatos-27B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FoolDev/Thanatos-27B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Ollama
How to use FoolDev/Thanatos-27B with Ollama:
ollama run hf.co/FoolDev/Thanatos-27B:Q4_K_M
- Unsloth Studio
How to use FoolDev/Thanatos-27B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for FoolDev/Thanatos-27B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for FoolDev/Thanatos-27B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for FoolDev/Thanatos-27B to start chatting
- Pi
How to use FoolDev/Thanatos-27B with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf FoolDev/Thanatos-27B:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "FoolDev/Thanatos-27B:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use FoolDev/Thanatos-27B with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf FoolDev/Thanatos-27B:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default FoolDev/Thanatos-27B:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use FoolDev/Thanatos-27B with Docker Model Runner:
docker model run hf.co/FoolDev/Thanatos-27B:Q4_K_M
- Lemonade
How to use FoolDev/Thanatos-27B with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull FoolDev/Thanatos-27B:Q4_K_M
Run and chat with the model
lemonade run user.Thanatos-27B-Q4_K_M
List all available models
lemonade list
| #!/usr/bin/env python3 | |
| """ | |
| Thanatos-27B — Ollama chat examples. | |
| Prerequisites (pick one): | |
| A. From the bundled GGUFs (default flow): | |
| $ make build # uses Thanatos-27B.Q4_K_M.gguf | |
| # or: | |
| $ ollama create thanatos-27b -f ../Modelfile | |
| B. Pull straight from HF (Q4_K_M is the only bundled quant): | |
| $ ollama run hf.co/FoolDev/Thanatos-27B | |
| # then set MODEL=hf.co/FoolDev/Thanatos-27B below | |
| Then: | |
| $ ollama serve # usually already running | |
| $ python ollama_chat.py | |
| The model emits <think>...</think> reasoning blocks before its answer. | |
| Current Ollama (0.24, especially with `OLLAMA_NEW_ENGINE=1`) returns the | |
| reasoning in a separate `message.thinking` field and keeps `content` | |
| clean. Older builds put the whole `<think>...</think>` block inside | |
| `content`. The demo below reads `message.thinking` first and falls | |
| back to parsing `<think>` tags out of `content` so it works against | |
| either path. | |
| Endpoints used: | |
| - Native Ollama: http://localhost:11434/api/chat | |
| - OpenAI-compat: http://localhost:11434/v1/chat/completions | |
| """ | |
| from __future__ import annotations | |
| import json | |
| import os | |
| import re | |
| import sys | |
| from typing import Any, Iterator | |
| import requests | |
| MODEL = os.environ.get("MODEL", "thanatos-27b") | |
| HOST = os.environ.get("HOST", "http://localhost:11434") | |
| _THINK_RE = re.compile(r"<think>.*?</think>\s*", re.DOTALL) | |
| def split_thinking(content: str) -> tuple[str, str]: | |
| """Return (thinking, final_answer) from a content string.""" | |
| parts = re.findall(r"<think>(.*?)</think>", content, re.DOTALL) | |
| thinking = "\n".join(p.strip() for p in parts).strip() | |
| answer = _THINK_RE.sub("", content).strip() | |
| return thinking, answer | |
| # ---------- 1. Simple chat ---------- | |
| def chat(prompt: str, system: str | None = None) -> dict[str, Any]: | |
| msgs: list[dict[str, Any]] = [] | |
| if system: | |
| msgs.append({"role": "system", "content": system}) | |
| msgs.append({"role": "user", "content": prompt}) | |
| r = requests.post( | |
| f"{HOST}/api/chat", | |
| json={"model": MODEL, "messages": msgs, "stream": False}, | |
| timeout=600, | |
| ) | |
| r.raise_for_status() | |
| return r.json() | |
| # ---------- 2. Streaming ---------- | |
| def chat_stream(prompt: str) -> Iterator[str]: | |
| """Yield content tokens as they arrive.""" | |
| with requests.post( | |
| f"{HOST}/api/chat", | |
| json={ | |
| "model": MODEL, | |
| "messages": [{"role": "user", "content": prompt}], | |
| "stream": True, | |
| }, | |
| stream=True, | |
| timeout=600, | |
| ) as r: | |
| r.raise_for_status() | |
| for line in r.iter_lines(): | |
| if not line: | |
| continue | |
| chunk = json.loads(line) | |
| if "message" in chunk and "content" in chunk["message"]: | |
| yield chunk["message"]["content"] | |
| if chunk.get("done"): | |
| break | |
| # ---------- 3. Tool calling ---------- | |
| WEATHER_TOOL = { | |
| "type": "function", | |
| "function": { | |
| "name": "get_current_weather", | |
| "description": "Get the current weather in a given city", | |
| "parameters": { | |
| "type": "object", | |
| "properties": { | |
| "city": {"type": "string", "description": "City name"}, | |
| "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}, | |
| }, | |
| "required": ["city", "unit"], | |
| }, | |
| }, | |
| } | |
| def fake_weather(city: str, unit: str) -> str: | |
| """Stand-in tool implementation.""" | |
| return json.dumps( | |
| {"city": city, "temperature": 14, "unit": unit, "conditions": "light rain"} | |
| ) | |
| def tool_round_trip(prompt: str) -> str: | |
| """Single-shot tool call: model -> tool -> model -> final answer.""" | |
| history: list[dict[str, Any]] = [{"role": "user", "content": prompt}] | |
| r = requests.post( | |
| f"{HOST}/api/chat", | |
| json={ | |
| "model": MODEL, | |
| "messages": history, | |
| "tools": [WEATHER_TOOL], | |
| "stream": False, | |
| }, | |
| timeout=600, | |
| ) | |
| r.raise_for_status() | |
| msg = r.json()["message"] | |
| if not msg.get("tool_calls"): | |
| return msg["content"] | |
| history.append({"role": "assistant", "tool_calls": msg["tool_calls"]}) | |
| for tc in msg["tool_calls"]: | |
| fn = tc["function"] | |
| if fn["name"] == "get_current_weather": | |
| result = fake_weather(**fn["arguments"]) | |
| else: | |
| result = json.dumps({"error": f"unknown tool {fn['name']}"}) | |
| history.append({"role": "tool", "tool_name": fn["name"], "content": result}) | |
| r = requests.post( | |
| f"{HOST}/api/chat", | |
| json={ | |
| "model": MODEL, | |
| "messages": history, | |
| "tools": [WEATHER_TOOL], | |
| "stream": False, | |
| }, | |
| timeout=600, | |
| ) | |
| r.raise_for_status() | |
| return r.json()["message"]["content"] | |
| # ---------- 4. OpenAI-compatible endpoint ---------- | |
| def openai_chat(prompt: str) -> str: | |
| r = requests.post( | |
| f"{HOST}/v1/chat/completions", | |
| json={ | |
| "model": MODEL, | |
| "messages": [{"role": "user", "content": prompt}], | |
| "temperature": 0.6, | |
| }, | |
| timeout=600, | |
| ) | |
| r.raise_for_status() | |
| return r.json()["choices"][0]["message"]["content"] | |
| # ---------- demo ---------- | |
| def _demo() -> None: | |
| print("=== 1. simple chat ===") | |
| resp = chat("What is 84 * 3 / 2?") | |
| msg = resp["message"] | |
| # Prefer the dedicated `thinking` field (Ollama 0.24+ / new engine); | |
| # fall back to extracting <think>...</think> from `content` for | |
| # older builds that inline the reasoning. | |
| thinking = (msg.get("thinking") or "").strip() | |
| answer = msg.get("content", "") | |
| if not thinking: | |
| thinking, answer = split_thinking(answer) | |
| if thinking: | |
| print(f"[thinking] {thinking[:200]}...") | |
| print(f"[answer] {answer}") | |
| print("\n=== 2. streaming ===") | |
| for tok in chat_stream("Count from 1 to 5 in one line."): | |
| sys.stdout.write(tok) | |
| sys.stdout.flush() | |
| print() | |
| print("\n=== 3. tool round-trip ===") | |
| print(tool_round_trip("What is the weather in Paris in celsius?")) | |
| print("\n=== 4. OpenAI-compat ===") | |
| print(openai_chat("Say 'OpenAI endpoint OK' and nothing else.")) | |
| if __name__ == "__main__": | |
| _demo() | |