YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Run-adjusted to 16GB VRAM

  • KV cache quantitative and context 8192
./main -m ./gemma-3-12b-it-q8_0.gguf \
  -c 8192 \
  --cache-type-k q8_0 \
  --cache-type-v q8_0 \
  -ngl 999 \
  --batch 96

Key notes:

  • ngl = nº of layers in VRAM. Upload it as far as it does not give OOM. In 12B it is usually over 28–34 with Q8_0 in 16 GB if the context is large.

If OOM with 8192, go down in this order:

  • ngl (e.g., 24).
  • batch (e.g., 64).
  • c (e.g. 6144 or 4096).

The quantified KV (--cache-type-k/--cache-type-v q8_0) reduces a lot of VRAM for long contexts, at the cost of some performance. It's normal.

Downloads last month
17
GGUF
Model size
12B params
Architecture
gemma3
Hardware compatibility
Log In to view the estimation

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support