Update README.md
Browse files
README.md
CHANGED
|
@@ -102,7 +102,7 @@ The model was evaluated on reasoning tasks including AIME24, MMLU_COT, and GSM8K
|
|
| 102 |
|
| 103 |
### Reproduction
|
| 104 |
|
| 105 |
-
The results of AIME24 and MMLU_COT were obtained using [SGLang](https://docs.sglang.ai/) via forked [lm-evaluation-harness](https://github.com/BowenBao/lm-evaluation-harness/tree/cot).
|
| 106 |
|
| 107 |
### AIME24
|
| 108 |
```
|
|
@@ -146,7 +146,6 @@ lm_eval --model local-completions \
|
|
| 146 |
--output_path output_data/mmmlu_cot 2>&1 | tee logs/mmmlu_cot.log
|
| 147 |
```
|
| 148 |
|
| 149 |
-
The result of GSM8K were obtained using [vLLM](https://docs.vllm.ai/en/latest/) via forked [lm-evaluation-harness](https://github.com/BowenBao/lm-evaluation-harness/tree/cot).
|
| 150 |
|
| 151 |
### GSM8K
|
| 152 |
```
|
|
|
|
| 102 |
|
| 103 |
### Reproduction
|
| 104 |
|
| 105 |
+
The results of AIME24 and MMLU_COT were obtained using [SGLang](https://docs.sglang.ai/) while the result of GSM8K were obtained using [vLLM](https://docs.vllm.ai/en/latest/). All the evaluations were conducted via forked [lm-evaluation-harness](https://github.com/BowenBao/lm-evaluation-harness/tree/cot).
|
| 106 |
|
| 107 |
### AIME24
|
| 108 |
```
|
|
|
|
| 146 |
--output_path output_data/mmmlu_cot 2>&1 | tee logs/mmmlu_cot.log
|
| 147 |
```
|
| 148 |
|
|
|
|
| 149 |
|
| 150 |
### GSM8K
|
| 151 |
```
|