YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Run-adjusted to 16GB VRAM

./main -m ./gemma-3-12b-it-q8_0.gguf \
  -c 8192 \
  --cache-type-k q8_0 \
  --cache-type-v q8_0 \
  -ngl 999 \
  --batch 96

Key notes:

ngl = nº of layers in VRAM. Upload it as far as it does not give OOM. In 12B it is usually over 28–34 with Q8_0 in 16 GB if the context is large.

If OOM with 8192, go down in this order:

The quantified KV (--cache-type-k/--cache-type-v q8_0) reduces a lot of VRAM for long contexts, at the cost of some performance. It's normal.

GGUF

Model size

12B params

Architecture

gemma3

Hardware compatibility

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support