GHOSTAI-Spooky / README.md
ghostai1's picture
Update README.md
2e39b36 verified
|
raw
history blame
2.68 kB
metadata
license: mit

GGUF 7B Theme Quant

GHOSTAI is a horror-themed 7B GGUF release for the llama.cpp ecosystem.
This repo contains quantized GGUFs only (no FP16).


🩸 What’s inside

Quantized GGUF files (7B) ready for llama.cpp-compatible runtimes.

🎃 Files in this release

File Quant Approx size Rough RAM needed (4k ctx)
ghostai-horror-7b.Q8_0.gguf Q8_0 ~7.2 GB ~10–11 GB
ghostai-horror-7b.Q6_K.gguf Q6_K ~5.5 GB ~8–9 GB
ghostai-horror-7b.Q5_K_M.gguf Q5_K_M ~4.8 GB ~7–8 GB
ghostai-horror-7b.Q5_K_S.gguf Q5_K_S ~4.7 GB ~7–8 GB
ghostai-horror-7b.Q4_K_M.gguf Q4_K_M ~4.1 GB ~6–7 GB
ghostai-horror-7b.Q4_K_S.gguf Q4_K_S ~3.9 GB ~6–7 GB
ghostai-horror-7b.Q3_K_M.gguf Q3_K_M ~3.3 GB ~5–6 GB
ghostai-horror-7b.Q3_K_S.gguf Q3_K_S ~3.0 GB ~5–6 GB
ghostai-horror-7b.Q2_K.gguf Q2_K ~2.5 GB ~4–5 GB
ghostai-horror-7b.TQ1_0.gguf TQ1_0 ~1.6 GB ~3–4 GB

RAM notes (rough):

  • “Rough RAM needed” assumes ~4k context and typical llama.cpp overhead.
  • If you run 8k context, add roughly +1–2 GB.
  • GPU offload doesn’t remove the need for RAM; it shifts some weight/KV usage to VRAM depending on settings.

🧟 Which quant should I use?

  • Best default: Q4_K_M
  • Higher quality: Q5_K_M or Q6_K
  • If you have plenty of RAM: Q8_0
  • Low RAM: Q3_K_S / Q2_K
  • Tiny / experimental: TQ1_0 (expect quality loss)

These formats are not “CPU vs GPU.”
You can run any quant on CPU-only or with GPU offload.


⚰️ Quickstart (llama.cpp)

GPU offload (CUDA build)

./llama-cli \
  -m ghostai-horror-7b.Q4_K_M.gguf \
  -ngl 99 \
  -c 4096 \
  -p "You are GHOSTAI. Speak like a calm horror narrator. Keep it tight and vivid."