Instructions to use FoolDev/Thanatos-27B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use FoolDev/Thanatos-27B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="FoolDev/Thanatos-27B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("FoolDev/Thanatos-27B", dtype="auto") - llama-cpp-python
How to use FoolDev/Thanatos-27B with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="FoolDev/Thanatos-27B", filename="Thanatos-27B.Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use FoolDev/Thanatos-27B with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf FoolDev/Thanatos-27B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf FoolDev/Thanatos-27B:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf FoolDev/Thanatos-27B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf FoolDev/Thanatos-27B:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf FoolDev/Thanatos-27B:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf FoolDev/Thanatos-27B:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf FoolDev/Thanatos-27B:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf FoolDev/Thanatos-27B:Q4_K_M
Use Docker
docker model run hf.co/FoolDev/Thanatos-27B:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use FoolDev/Thanatos-27B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "FoolDev/Thanatos-27B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FoolDev/Thanatos-27B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/FoolDev/Thanatos-27B:Q4_K_M
- SGLang
How to use FoolDev/Thanatos-27B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "FoolDev/Thanatos-27B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FoolDev/Thanatos-27B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "FoolDev/Thanatos-27B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FoolDev/Thanatos-27B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Ollama
How to use FoolDev/Thanatos-27B with Ollama:
ollama run hf.co/FoolDev/Thanatos-27B:Q4_K_M
- Unsloth Studio new
How to use FoolDev/Thanatos-27B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for FoolDev/Thanatos-27B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for FoolDev/Thanatos-27B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for FoolDev/Thanatos-27B to start chatting
- Pi new
How to use FoolDev/Thanatos-27B with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf FoolDev/Thanatos-27B:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "FoolDev/Thanatos-27B:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use FoolDev/Thanatos-27B with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf FoolDev/Thanatos-27B:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default FoolDev/Thanatos-27B:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use FoolDev/Thanatos-27B with Docker Model Runner:
docker model run hf.co/FoolDev/Thanatos-27B:Q4_K_M
- Lemonade
How to use FoolDev/Thanatos-27B with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull FoolDev/Thanatos-27B:Q4_K_M
Run and chat with the model
lemonade run user.Thanatos-27B-Q4_K_M
List all available models
lemonade list
| #!/usr/bin/env python3 | |
| """ | |
| Thanatos-27B — verify Modelfile and HF Ollama bridge files stay in sync. | |
| The repo ships two parallel Ollama configurations: | |
| - ``Modelfile`` is consumed by the local-build path (``ollama create -f Modelfile``). | |
| It contains ``TEMPLATE`` / ``SYSTEM`` / ``PARAMETER`` directives. | |
| - ``template`` / ``system`` / ``params`` at the repo root are consumed by HF's | |
| Ollama bridge when users ``ollama run hf.co/FoolDev/Thanatos-27B`` directly. HF | |
| does NOT read the Modelfile (per https://huggingface.co/docs/hub/en/ollama). | |
| If the two configurations drift apart, ``hf.co/...`` users and ``make build`` | |
| users get different behaviour — exactly the bug we shipped before commits | |
| 33458f7 / 70ccef1 fixed it. This script is the regression guard: it parses the | |
| Modelfile, loads the three bridge files, and fails on any mismatch. | |
| Usage: | |
| python3 scripts/check_bridge_sync.py | |
| # exit 0 if in sync, 1 (with diff details) if not. | |
| Called from scripts/check.sh as part of the standard lint pass, so the | |
| pre-commit hook catches drift before it lands. | |
| """ | |
| from __future__ import annotations | |
| import json | |
| import re | |
| import sys | |
| from pathlib import Path | |
| ROOT = Path(__file__).resolve().parent.parent | |
| # Ollama Modelfile reference: https://github.com/ollama/ollama/blob/main/docs/modelfile.md | |
| TEMPLATE_RE = re.compile(r'^TEMPLATE\s+"""(.*?)"""', re.DOTALL | re.MULTILINE) | |
| SYSTEM_RE = re.compile(r'^SYSTEM\s+"""(.*?)"""', re.DOTALL | re.MULTILINE) | |
| PARAMETER_RE = re.compile(r'^PARAMETER\s+(\S+)\s+(.*?)\s*$', re.MULTILINE) | |
| def parse_modelfile(text: str) -> tuple[str, str, dict[str, object]]: | |
| """Extract TEMPLATE, SYSTEM, and PARAMETER blocks from a Modelfile.""" | |
| tpl_match = TEMPLATE_RE.search(text) | |
| if not tpl_match: | |
| die("Modelfile has no TEMPLATE block") | |
| template = tpl_match.group(1) | |
| sys_match = SYSTEM_RE.search(text) | |
| if not sys_match: | |
| die("Modelfile has no SYSTEM block") | |
| system = sys_match.group(1) | |
| params: dict[str, object] = {} | |
| stops: list[str] = [] | |
| for key, raw in PARAMETER_RE.findall(text): | |
| # Strip outer quotes if present. | |
| value: object = raw.strip() | |
| if isinstance(value, str) and len(value) >= 2 and value[0] == value[-1] == '"': | |
| value = value[1:-1] | |
| # Stop tokens accumulate; everything else is scalar. | |
| if key == "stop": | |
| stops.append(value) # type: ignore[arg-type] | |
| continue | |
| # Cast known numeric params. | |
| if key in {"temperature", "top_p", "top_k", "repeat_penalty", | |
| "num_ctx", "num_predict", "num_gpu", "num_batch", "seed"}: | |
| try: | |
| value = float(value) if "." in str(value) else int(value) # type: ignore[arg-type] | |
| except (TypeError, ValueError): | |
| pass | |
| params[key] = value | |
| if stops: | |
| params["stop"] = stops | |
| return template, system, params | |
| def die(msg: str) -> None: | |
| print(f"[FAIL] {msg}", file=sys.stderr) | |
| sys.exit(1) | |
| def diff_strings(label: str, expected: str, actual: str) -> bool: | |
| if expected == actual: | |
| return True | |
| print(f"[FAIL] {label} drift detected", file=sys.stderr) | |
| print(f" Modelfile len={len(expected)} bridge file len={len(actual)}", file=sys.stderr) | |
| # Show the first diverging line for quick orientation. | |
| e_lines = expected.splitlines() | |
| a_lines = actual.splitlines() | |
| for i, (e, a) in enumerate(zip(e_lines, a_lines)): | |
| if e != a: | |
| print(f" first diff at line {i + 1}:", file=sys.stderr) | |
| print(f" modelfile : {e!r}", file=sys.stderr) | |
| print(f" bridge : {a!r}", file=sys.stderr) | |
| return False | |
| if len(e_lines) != len(a_lines): | |
| print(f" line count differs: modelfile={len(e_lines)} bridge={len(a_lines)}", | |
| file=sys.stderr) | |
| return False | |
| def main() -> int: | |
| modelfile = (ROOT / "Modelfile").read_text() | |
| bridge_template = (ROOT / "template").read_text() | |
| bridge_system = (ROOT / "system").read_text() | |
| bridge_params = json.loads((ROOT / "params").read_text()) | |
| mf_template, mf_system, mf_params = parse_modelfile(modelfile) | |
| ok = True | |
| # 1. TEMPLATE: byte-for-byte. | |
| ok &= diff_strings("TEMPLATE", mf_template, bridge_template) | |
| # 2. SYSTEM: trim trailing whitespace on both ends. The bridge file | |
| # typically has a trailing newline; the Modelfile block doesn't. | |
| ok &= diff_strings("SYSTEM", mf_system.strip(), bridge_system.strip()) | |
| # 3. PARAMETER vs params JSON: compare normalized dicts. | |
| if mf_params != bridge_params: | |
| print("[FAIL] params drift detected", file=sys.stderr) | |
| for k in sorted(set(mf_params) | set(bridge_params)): | |
| mv = mf_params.get(k, "<missing>") | |
| bv = bridge_params.get(k, "<missing>") | |
| if mv != bv: | |
| print(f" {k}: modelfile={mv!r} bridge={bv!r}", file=sys.stderr) | |
| ok = False | |
| if not ok: | |
| print("\n[!] Modelfile and bridge files are out of sync.", file=sys.stderr) | |
| print(" Edit them together: any change to TEMPLATE / SYSTEM /", | |
| file=sys.stderr) | |
| print(" PARAMETER must be reflected in template / system / params.", | |
| file=sys.stderr) | |
| return 1 | |
| print("[ ok ] Modelfile <-> bridge files in sync") | |
| return 0 | |
| if __name__ == "__main__": | |
| sys.exit(main()) | |