This is a preliminary version (and subject to change) of the FP8 quantized google/gemma-4-26B-A4B-it model, distributed by BC Card.
This work was conducted with reference to the Red Hat AI methodology and deployment approach.

The model has both weights and activations quantized to FP8 using vllm-project/llm-compressor.

Run it with:

vllm serve BCcard/gemma-4-26B-A4B-it-FP8-Dynamic --max-model-len 96000

on vLLM main (nightly recommended based on Red Hat AI reference setup).

On a single B200:

lm_eval \
  --model local-chat-completions \
  --tasks gsm8k_platinum_cot_llama \
  --model_args "model=BCcard/gemma-4-26B-A4B-it-FP8-Dynamic,max_length=96000,base_url=http://0.0.0.0:8000/v1/chat/completions,num_concurrent=128,max_retries=3,tokenized_requests=False,tokenizer_backend=None,timeout=2400" \
  --num_fewshot 5 \
  --apply_chat_template \
  --fewshot_as_multiturn \
  --output_path results_gsm8k_platinum.json \
  --seed 1234 \
  --gen_kwargs "do_sample=True,temperature=1.0,top_p=0.95,top_k=64,max_gen_toks=64000,seed=1234"

Original:

|         Tasks          |Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k_platinum_cot_llama|      3|flexible-extract|     5|exact_match|↑  |0.9702|±  |0.0049|
|                        |       |strict-match    |     5|exact_match|↑  |0.9702|±  |0.0049|

FP8 (BC Card distribution, validated with reference to the Red Hat AI approach):

|         Tasks          |Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k_platinum_cot_llama|      3|flexible-extract|     5|exact_match|↑  |0.9669|±  |0.0051|
|                        |       |strict-match    |     5|exact_match|↑  |0.9669|±  |0.0051|
Downloads last month
4
Safetensors
Model size
27B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BCCard/gemma-4-26B-A4B-it-FP8-Dynamic

Quantized
(76)
this model