sh2orc commited on
Commit
f759e68
·
verified ·
1 Parent(s): 0b56a8f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -23,7 +23,7 @@ https://docs.vllm.ai/projects/recipes/en/latest/Google/Gemma4.html#installing-vl
23
 
24
  On a single B200:
25
 
26
- ```bash
27
  lm_eval \
28
  --model local-chat-completions \
29
  --tasks gsm8k_platinum_cot_llama \
@@ -34,12 +34,14 @@ lm_eval \
34
  --output_path results_gsm8k_platinum.json \
35
  --seed 1234 \
36
  --gen_kwargs "do_sample=True,temperature=1.0,top_p=0.95,top_k=64,max_gen_toks=64000,seed=1234"
37
- This is a preliminary version (and subject to change) of the FP8 quantized google/gemma-4-31B-it model. The model has both weights and activations quantized to FP8 with vllm-project/llm-compressor.
38
 
 
39
  This model requires a nightly vllm wheel, see install instructions at https://docs.vllm.ai/projects/recipes/en/latest/Google/Gemma4.html#installing-vllm
40
 
41
  On a single B200:
42
 
 
43
  ```
44
  lm_eval --model local-chat-completions --tasks gsm8k_platinum_cot_llama --model_args "model=RedHatAI/gemma-4-31B-it-FP8-Dynamic,max_length=96000,base_url=http://0.0.0.0:8000/v1/chat/completions,num_concurrent=128,max_retries=3,tokenized_requests=False,tokenizer_backend=None,timeout=2400" --num_fewshot 5 --apply_chat_template --fewshot_as_multiturn --output_path results_gsm8k_platinum.json --seed 1234 --gen_kwargs "do_sample=True,temperature=1.0,top_p=0.95,top_k=64,max_gen_toks=64000,seed=1234"
45
  ```
@@ -56,7 +58,7 @@ Original:
56
 
57
  FP8:
58
 
59
- ```bash
60
  | Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr|
61
  |------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
62
  |gsm8k_platinum_cot_llama| 3|flexible-extract| 5|exact_match|↑ |0.9768|± |0.0043|
 
23
 
24
  On a single B200:
25
 
26
+ ```
27
  lm_eval \
28
  --model local-chat-completions \
29
  --tasks gsm8k_platinum_cot_llama \
 
34
  --output_path results_gsm8k_platinum.json \
35
  --seed 1234 \
36
  --gen_kwargs "do_sample=True,temperature=1.0,top_p=0.95,top_k=64,max_gen_toks=64000,seed=1234"
37
+ ```
38
 
39
+ This is a preliminary version (and subject to change) of the FP8 quantized google/gemma-4-31B-it model. The model has both weights and activations quantized to FP8 with vllm-project/llm-compressor.
40
  This model requires a nightly vllm wheel, see install instructions at https://docs.vllm.ai/projects/recipes/en/latest/Google/Gemma4.html#installing-vllm
41
 
42
  On a single B200:
43
 
44
+
45
  ```
46
  lm_eval --model local-chat-completions --tasks gsm8k_platinum_cot_llama --model_args "model=RedHatAI/gemma-4-31B-it-FP8-Dynamic,max_length=96000,base_url=http://0.0.0.0:8000/v1/chat/completions,num_concurrent=128,max_retries=3,tokenized_requests=False,tokenizer_backend=None,timeout=2400" --num_fewshot 5 --apply_chat_template --fewshot_as_multiturn --output_path results_gsm8k_platinum.json --seed 1234 --gen_kwargs "do_sample=True,temperature=1.0,top_p=0.95,top_k=64,max_gen_toks=64000,seed=1234"
47
  ```
 
58
 
59
  FP8:
60
 
61
+ ```
62
  | Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr|
63
  |------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
64
  |gsm8k_platinum_cot_llama| 3|flexible-extract| 5|exact_match|↑ |0.9768|± |0.0043|