Update README.md
Browse files
README.md
CHANGED
|
@@ -23,7 +23,7 @@ https://docs.vllm.ai/projects/recipes/en/latest/Google/Gemma4.html#installing-vl
|
|
| 23 |
|
| 24 |
On a single B200:
|
| 25 |
|
| 26 |
-
```
|
| 27 |
lm_eval \
|
| 28 |
--model local-chat-completions \
|
| 29 |
--tasks gsm8k_platinum_cot_llama \
|
|
@@ -34,12 +34,14 @@ lm_eval \
|
|
| 34 |
--output_path results_gsm8k_platinum.json \
|
| 35 |
--seed 1234 \
|
| 36 |
--gen_kwargs "do_sample=True,temperature=1.0,top_p=0.95,top_k=64,max_gen_toks=64000,seed=1234"
|
| 37 |
-
|
| 38 |
|
|
|
|
| 39 |
This model requires a nightly vllm wheel, see install instructions at https://docs.vllm.ai/projects/recipes/en/latest/Google/Gemma4.html#installing-vllm
|
| 40 |
|
| 41 |
On a single B200:
|
| 42 |
|
|
|
|
| 43 |
```
|
| 44 |
lm_eval --model local-chat-completions --tasks gsm8k_platinum_cot_llama --model_args "model=RedHatAI/gemma-4-31B-it-FP8-Dynamic,max_length=96000,base_url=http://0.0.0.0:8000/v1/chat/completions,num_concurrent=128,max_retries=3,tokenized_requests=False,tokenizer_backend=None,timeout=2400" --num_fewshot 5 --apply_chat_template --fewshot_as_multiturn --output_path results_gsm8k_platinum.json --seed 1234 --gen_kwargs "do_sample=True,temperature=1.0,top_p=0.95,top_k=64,max_gen_toks=64000,seed=1234"
|
| 45 |
```
|
|
@@ -56,7 +58,7 @@ Original:
|
|
| 56 |
|
| 57 |
FP8:
|
| 58 |
|
| 59 |
-
```
|
| 60 |
| Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr|
|
| 61 |
|------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|
| 62 |
|gsm8k_platinum_cot_llama| 3|flexible-extract| 5|exact_match|↑ |0.9768|± |0.0043|
|
|
|
|
| 23 |
|
| 24 |
On a single B200:
|
| 25 |
|
| 26 |
+
```
|
| 27 |
lm_eval \
|
| 28 |
--model local-chat-completions \
|
| 29 |
--tasks gsm8k_platinum_cot_llama \
|
|
|
|
| 34 |
--output_path results_gsm8k_platinum.json \
|
| 35 |
--seed 1234 \
|
| 36 |
--gen_kwargs "do_sample=True,temperature=1.0,top_p=0.95,top_k=64,max_gen_toks=64000,seed=1234"
|
| 37 |
+
```
|
| 38 |
|
| 39 |
+
This is a preliminary version (and subject to change) of the FP8 quantized google/gemma-4-31B-it model. The model has both weights and activations quantized to FP8 with vllm-project/llm-compressor.
|
| 40 |
This model requires a nightly vllm wheel, see install instructions at https://docs.vllm.ai/projects/recipes/en/latest/Google/Gemma4.html#installing-vllm
|
| 41 |
|
| 42 |
On a single B200:
|
| 43 |
|
| 44 |
+
|
| 45 |
```
|
| 46 |
lm_eval --model local-chat-completions --tasks gsm8k_platinum_cot_llama --model_args "model=RedHatAI/gemma-4-31B-it-FP8-Dynamic,max_length=96000,base_url=http://0.0.0.0:8000/v1/chat/completions,num_concurrent=128,max_retries=3,tokenized_requests=False,tokenizer_backend=None,timeout=2400" --num_fewshot 5 --apply_chat_template --fewshot_as_multiturn --output_path results_gsm8k_platinum.json --seed 1234 --gen_kwargs "do_sample=True,temperature=1.0,top_p=0.95,top_k=64,max_gen_toks=64000,seed=1234"
|
| 47 |
```
|
|
|
|
| 58 |
|
| 59 |
FP8:
|
| 60 |
|
| 61 |
+
```
|
| 62 |
| Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr|
|
| 63 |
|------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|
| 64 |
|gsm8k_platinum_cot_llama| 3|flexible-extract| 5|exact_match|↑ |0.9768|± |0.0043|
|