GHOSTAI-Spooky / README.md

ghostai1

Update README.md

2e39b36 verified 3 months ago

2.68 kB

license: mit

GHOSTAI is a horror-themed 7B GGUF release for the llama.cpp ecosystem.
This repo contains quantized GGUFs only (no FP16).

🩸 What’s inside

Quantized GGUF files (7B) ready for llama.cpp-compatible runtimes.

🎃 Files in this release

File	Quant	Approx size	Rough RAM needed (4k ctx)
`ghostai-horror-7b.Q8_0.gguf`	Q8_0	~7.2 GB	~10–11 GB
`ghostai-horror-7b.Q6_K.gguf`	Q6_K	~5.5 GB	~8–9 GB
`ghostai-horror-7b.Q5_K_M.gguf`	Q5_K_M	~4.8 GB	~7–8 GB
`ghostai-horror-7b.Q5_K_S.gguf`	Q5_K_S	~4.7 GB	~7–8 GB
`ghostai-horror-7b.Q4_K_M.gguf`	Q4_K_M	~4.1 GB	~6–7 GB
`ghostai-horror-7b.Q4_K_S.gguf`	Q4_K_S	~3.9 GB	~6–7 GB
`ghostai-horror-7b.Q3_K_M.gguf`	Q3_K_M	~3.3 GB	~5–6 GB
`ghostai-horror-7b.Q3_K_S.gguf`	Q3_K_S	~3.0 GB	~5–6 GB
`ghostai-horror-7b.Q2_K.gguf`	Q2_K	~2.5 GB	~4–5 GB
`ghostai-horror-7b.TQ1_0.gguf`	TQ1_0	~1.6 GB	~3–4 GB

RAM notes (rough):

“Rough RAM needed” assumes ~4k context and typical llama.cpp overhead.
If you run 8k context, add roughly +1–2 GB.
GPU offload doesn’t remove the need for RAM; it shifts some weight/KV usage to VRAM depending on settings.

🧟 Which quant should I use?

Best default: Q4_K_M
Higher quality: Q5_K_M or Q6_K
If you have plenty of RAM: Q8_0
Low RAM: Q3_K_S / Q2_K
Tiny / experimental: TQ1_0 (expect quality loss)

These formats are not “CPU vs GPU.”
You can run any quant on CPU-only or with GPU offload.

⚰️ Quickstart (llama.cpp)

GPU offload (CUDA build)

./llama-cli \
  -m ghostai-horror-7b.Q4_K_M.gguf \
  -ngl 99 \
  -c 4096 \
  -p "You are GHOSTAI. Speak like a calm horror narrator. Keep it tight and vivid."