YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Run-adjusted to 16GB VRAM
- KV cache quantitative and context 8192
./main -m ./gemma-3-12b-it-q8_0.gguf \
-c 8192 \
--cache-type-k q8_0 \
--cache-type-v q8_0 \
-ngl 999 \
--batch 96
Key notes:
- ngl = nº of layers in VRAM. Upload it as far as it does not give OOM. In 12B it is usually over 28–34 with Q8_0 in 16 GB if the context is large.
If OOM with 8192, go down in this order:
- ngl (e.g., 24).
- batch (e.g., 64).
- c (e.g. 6144 or 4096).
The quantified KV (--cache-type-k/--cache-type-v q8_0) reduces a lot of VRAM for long contexts, at the cost of some performance. It's normal.
- Downloads last month
- 17
Hardware compatibility
Log In
to view the estimation
8-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support