This is a preliminary version (and subject to change) of the FP8 quantized google/gemma-4-31B-it model, distributed by BC Card.
This work was carried out with reference to the Red Hat AI approach and validation methodology.
The model has both weights and activations quantized to FP8 with vllm-project/llm-compressor.
This model requires a nightly vllm wheel. For the reference installation and execution flow, see the Red Hat AI / vLLM-based guidance:
https://docs.vllm.ai/projects/recipes/en/latest/Google/Gemma4.html#installing-vllm
On a single B200:
lm_eval \
--model local-chat-completions \
--tasks gsm8k_platinum_cot_llama \
--model_args "model=BCcard/gemma-4-31B-it-FP8-Dynamic,max_length=96000,base_url=http://0.0.0.0:8000/v1/chat/completions,num_concurrent=128,max_retries=3,tokenized_requests=False,tokenizer_backend=None,timeout=2400" \
--num_fewshot 5 \
--apply_chat_template \
--fewshot_as_multiturn \
--output_path results_gsm8k_platinum.json \
--seed 1234 \
--gen_kwargs "do_sample=True,temperature=1.0,top_p=0.95,top_k=64,max_gen_toks=64000,seed=1234"
This is a preliminary version (and subject to change) of the FP8 quantized google/gemma-4-31B-it model. The model has both weights and activations quantized to FP8 with vllm-project/llm-compressor. This model requires a nightly vllm wheel, see install instructions at https://docs.vllm.ai/projects/recipes/en/latest/Google/Gemma4.html#installing-vllm
On a single B200:
lm_eval --model local-chat-completions --tasks gsm8k_platinum_cot_llama --model_args "model=RedHatAI/gemma-4-31B-it-FP8-Dynamic,max_length=96000,base_url=http://0.0.0.0:8000/v1/chat/completions,num_concurrent=128,max_retries=3,tokenized_requests=False,tokenizer_backend=None,timeout=2400" --num_fewshot 5 --apply_chat_template --fewshot_as_multiturn --output_path results_gsm8k_platinum.json --seed 1234 --gen_kwargs "do_sample=True,temperature=1.0,top_p=0.95,top_k=64,max_gen_toks=64000,seed=1234"
Original:
| Tasks |Version| Filter |n-shot| Metric | |Value| |Stderr|
|------------------------|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k_platinum_cot_llama| 3|flexible-extract| 5|exact_match|↑ |0.976|± |0.0044|
| | |strict-match | 5|exact_match|↑ |0.976|± |0.0044|
FP8:
| Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr|
|------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k_platinum_cot_llama| 3|flexible-extract| 5|exact_match|↑ |0.9768|± |0.0043|
| | |strict-match | 5|exact_match|↑ |0.9777|± |0.0043|
- Downloads last month
- 1
Model tree for BCCard/gemma-4-31B-it-FP8-Dynamic
Base model
google/gemma-4-31B-it