linzhao-amd commited on
Commit
23d2de6
·
verified ·
1 Parent(s): c95e5b4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -52,7 +52,7 @@ This model can be deployed efficiently using the [SGLang](https://docs.sglang.ai
52
 
53
  ## Evaluation
54
 
55
- The model was evaluated using [SGLang](https://docs.sglang.ai/) and [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) frameworks.
56
 
57
  ### Accuracy
58
 
@@ -102,7 +102,7 @@ The model was evaluated using [SGLang](https://docs.sglang.ai/) and [lm-evaluati
102
 
103
  ### Reproduction
104
 
105
- The results were obtained using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness), with custom evaluation tasks AIME24 and MMLU_CoT and native task GSM8K.
106
 
107
  ### AIME24
108
  ```
@@ -128,6 +128,8 @@ lm_eval --model local-completions \
128
  --output_path output_data/mmmlu_cot 2>&1 | tee logs/mmmlu_cot.log
129
  ```
130
 
 
 
131
  ### GSM8K
132
  ```
133
  lm_eval --model local-completions \
 
52
 
53
  ## Evaluation
54
 
55
+ The model was evaluated on reasoning tasks including AIME24, MMLU_COT, and GSM8K via [forked lm-evaluation-harness](https://github.com/BowenBao/lm-evaluation-harness/tree/cot) .
56
 
57
  ### Accuracy
58
 
 
102
 
103
  ### Reproduction
104
 
105
+ The results of AIME24 and MMLU_COT were obtained using [SGLang](https://docs.sglang.ai/) via [forked lm-evaluation-harness](https://github.com/BowenBao/lm-evaluation-harness/tree/cot)
106
 
107
  ### AIME24
108
  ```
 
128
  --output_path output_data/mmmlu_cot 2>&1 | tee logs/mmmlu_cot.log
129
  ```
130
 
131
+ The result of GSM8K were obtained using [vLLM](https://docs.vllm.ai/en/latest/) via [forked lm-evaluation-harness](https://github.com/BowenBao/lm-evaluation-harness/tree/cot) .
132
+
133
  ### GSM8K
134
  ```
135
  lm_eval --model local-completions \