Update README.md
Browse files
README.md
CHANGED
|
@@ -52,7 +52,7 @@ This model can be deployed efficiently using the [SGLang](https://docs.sglang.ai
|
|
| 52 |
|
| 53 |
## Evaluation
|
| 54 |
|
| 55 |
-
The model was evaluated
|
| 56 |
|
| 57 |
### Accuracy
|
| 58 |
|
|
@@ -102,7 +102,7 @@ The model was evaluated using [SGLang](https://docs.sglang.ai/) and [lm-evaluati
|
|
| 102 |
|
| 103 |
### Reproduction
|
| 104 |
|
| 105 |
-
The results were obtained using [lm-evaluation-harness](https://github.com/
|
| 106 |
|
| 107 |
### AIME24
|
| 108 |
```
|
|
@@ -128,6 +128,8 @@ lm_eval --model local-completions \
|
|
| 128 |
--output_path output_data/mmmlu_cot 2>&1 | tee logs/mmmlu_cot.log
|
| 129 |
```
|
| 130 |
|
|
|
|
|
|
|
| 131 |
### GSM8K
|
| 132 |
```
|
| 133 |
lm_eval --model local-completions \
|
|
|
|
| 52 |
|
| 53 |
## Evaluation
|
| 54 |
|
| 55 |
+
The model was evaluated on reasoning tasks including AIME24, MMLU_COT, and GSM8K via [forked lm-evaluation-harness](https://github.com/BowenBao/lm-evaluation-harness/tree/cot) .
|
| 56 |
|
| 57 |
### Accuracy
|
| 58 |
|
|
|
|
| 102 |
|
| 103 |
### Reproduction
|
| 104 |
|
| 105 |
+
The results of AIME24 and MMLU_COT were obtained using [SGLang](https://docs.sglang.ai/) via [forked lm-evaluation-harness](https://github.com/BowenBao/lm-evaluation-harness/tree/cot)
|
| 106 |
|
| 107 |
### AIME24
|
| 108 |
```
|
|
|
|
| 128 |
--output_path output_data/mmmlu_cot 2>&1 | tee logs/mmmlu_cot.log
|
| 129 |
```
|
| 130 |
|
| 131 |
+
The result of GSM8K were obtained using [vLLM](https://docs.vllm.ai/en/latest/) via [forked lm-evaluation-harness](https://github.com/BowenBao/lm-evaluation-harness/tree/cot) .
|
| 132 |
+
|
| 133 |
### GSM8K
|
| 134 |
```
|
| 135 |
lm_eval --model local-completions \
|