metadata
license: mit
GHOSTAI is a horror-themed 7B GGUF release for the llama.cpp ecosystem.
This repo contains quantized GGUFs only (no FP16).
🩸 What’s inside
Quantized GGUF files (7B) ready for llama.cpp-compatible runtimes.
🎃 Files in this release
| File | Quant | Approx size | Rough RAM needed (4k ctx) |
|---|---|---|---|
ghostai-horror-7b.Q8_0.gguf |
Q8_0 | ~7.2 GB | ~10–11 GB |
ghostai-horror-7b.Q6_K.gguf |
Q6_K | ~5.5 GB | ~8–9 GB |
ghostai-horror-7b.Q5_K_M.gguf |
Q5_K_M | ~4.8 GB | ~7–8 GB |
ghostai-horror-7b.Q5_K_S.gguf |
Q5_K_S | ~4.7 GB | ~7–8 GB |
ghostai-horror-7b.Q4_K_M.gguf |
Q4_K_M | ~4.1 GB | ~6–7 GB |
ghostai-horror-7b.Q4_K_S.gguf |
Q4_K_S | ~3.9 GB | ~6–7 GB |
ghostai-horror-7b.Q3_K_M.gguf |
Q3_K_M | ~3.3 GB | ~5–6 GB |
ghostai-horror-7b.Q3_K_S.gguf |
Q3_K_S | ~3.0 GB | ~5–6 GB |
ghostai-horror-7b.Q2_K.gguf |
Q2_K | ~2.5 GB | ~4–5 GB |
ghostai-horror-7b.TQ1_0.gguf |
TQ1_0 | ~1.6 GB | ~3–4 GB |
RAM notes (rough):
- “Rough RAM needed” assumes ~4k context and typical llama.cpp overhead.
- If you run 8k context, add roughly +1–2 GB.
- GPU offload doesn’t remove the need for RAM; it shifts some weight/KV usage to VRAM depending on settings.
🧟 Which quant should I use?
- Best default:
Q4_K_M - Higher quality:
Q5_K_MorQ6_K - If you have plenty of RAM:
Q8_0 - Low RAM:
Q3_K_S/Q2_K - Tiny / experimental:
TQ1_0(expect quality loss)
These formats are not “CPU vs GPU.”
You can run any quant on CPU-only or with GPU offload.
⚰️ Quickstart (llama.cpp)
GPU offload (CUDA build)
./llama-cli \
-m ghostai-horror-7b.Q4_K_M.gguf \
-ngl 99 \
-c 4096 \
-p "You are GHOSTAI. Speak like a calm horror narrator. Keep it tight and vivid."