gemma-4-12B-it-nvfp4

NVFP4 quantized version of google/gemma-4-12B-it (12B params, unified multimodal model). Produced and maintained by vrfai.

Quantization Details

This model was quantized using NVIDIA ModelOpt with the following configurations:

Property Value
Base model google/gemma-4-12B-it
Quant method NVIDIA ModelOpt (NVFP4)
Weight scheme 4-bit float, block size 16
Input activation 4-bit float, block size 16
Calibration dataset CNN DailyMail (512 samples, max_seq_len 1024)
Size ~11 GB (vs ~23 GB BF16)

Excluded from Quantization

The following modules are kept in full precision (BF16) to preserve accuracy:

  • lm_head
  • model.embed_vision*
  • model.embed_audio*
  • All self_attn layers (layers 0–47)

Quantization Script

The recipes and scripts used to quantize this model can be found in the following repository:

Downloads last month
787
Safetensors
Model size
8B params
Tensor type
BF16
·
F8_E4M3
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vrfai/gemma-4-12B-it-nvfp4

Quantized
(85)
this model

Collection including vrfai/gemma-4-12B-it-nvfp4