Update README.md
Browse files
README.md
CHANGED
|
@@ -201,6 +201,7 @@ Score 5: {orig_score5_description}
|
|
| 201 |
|
| 202 |
All models are evaluated in chat mode (e.g. with the respective conversation template applied). All zero-shot benchmarks follow the same setting as in the AGIEval paper and Orca paper. CoT tasks use the same configuration as Chain-of-Thought Hub, HumanEval is evaluated with EvalPlus, and MT-bench is run using FastChat. To reproduce our results, follow the instructions in [our repository](https://github.com/imoneoi/openchat/#benchmarks).
|
| 203 |
|
|
|
|
| 204 |
</details>
|
| 205 |
<div>
|
| 206 |
<h3>HumanEval+</h3>
|
|
|
|
| 201 |
|
| 202 |
All models are evaluated in chat mode (e.g. with the respective conversation template applied). All zero-shot benchmarks follow the same setting as in the AGIEval paper and Orca paper. CoT tasks use the same configuration as Chain-of-Thought Hub, HumanEval is evaluated with EvalPlus, and MT-bench is run using FastChat. To reproduce our results, follow the instructions in [our repository](https://github.com/imoneoi/openchat/#benchmarks).
|
| 203 |
|
| 204 |
+
|
| 205 |
</details>
|
| 206 |
<div>
|
| 207 |
<h3>HumanEval+</h3>
|