legolasyiu commited on
Commit
5cf4fca
·
verified ·
1 Parent(s): 358ce9f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -0
README.md CHANGED
@@ -95,6 +95,17 @@ Both gpt-oss models can be fine-tuned for a variety of specialized use cases.
95
  - Do not use this model for creating nuclear, biological and chemical weapons.
96
  - Do not allow harmful or malicious outputs
97
 
 
 
 
 
 
 
 
 
 
 
 
98
 
99
  ## Benchmark
100
  hf (pretrained=EpistemeAI/metatune-gpt20b-R1.1,parallelize=True,dtype=bfloat16), gen_kwargs: (temperature=0.9,top_p=0.9,max_new_tokens=2048), limit: 10.0, num_fewshot: 0, batch_size: auto:4
 
95
  - Do not use this model for creating nuclear, biological and chemical weapons.
96
  - Do not allow harmful or malicious outputs
97
 
98
+ Code to duplicate the benchmark (Using +std for final result)
99
+ ```py
100
+
101
+ #gpqa diamond
102
+ !lm_eval --model hf --model_args pretrained=EpistemeAI/metatune-gpt20b-R1.2,parallelize=True,dtype=bfloat16 --tasks gpqa_diamond_cot_zeroshot --num_fewshot 0 --gen_kwargs temperature=0.9,top_p=0.9,max_new_tokens=2048 --batch_size auto:4 --limit 10 --device cuda:0 --output_path ./eval_harness/gpt-oss-20b3
103
+ #gsm8k cot
104
+ !lm_eval --model hf --model_args pretrained=EpistemeAI/metatune-gpt20b-R1.2,parallelize=True,dtype=bfloat16 --tasks gsm8k_cot_llama --apply_chat_template --fewshot_as_multiturn --num_fewshot 0 --gen_kwargs temperature=0.9,top_p=0.9,max_new_tokens=1024 --batch_size auto:4 --limit 10 --device cuda:0 --output_path ./eval_harness/gpt-oss-20b3
105
+ #mmlu computer science
106
+ !lm_eval --model hf --model_args pretrained=EpistemeAI/metatune-gpt20b-R1.2,parallelize=True,dtype=bfloat16 --tasks mmlu_pro_plus_computer_science --apply_chat_template --fewshot_as_multiturn --num_fewshot 0 --gen_kwargs temperature=0.9,top_p=0.9,max_new_tokens=1024 --batch_size auto:4 --limit 10 --device cuda:0 --output_path ./eval_harness/gpt-oss-20b3
107
+
108
+ ```
109
 
110
  ## Benchmark
111
  hf (pretrained=EpistemeAI/metatune-gpt20b-R1.1,parallelize=True,dtype=bfloat16), gen_kwargs: (temperature=0.9,top_p=0.9,max_new_tokens=2048), limit: 10.0, num_fewshot: 0, batch_size: auto:4