Update README.md
Browse files
README.md
CHANGED
|
@@ -54,9 +54,9 @@ To achieve optimal performance, we recommend the following settings:
|
|
| 54 |
|
| 55 |
3. Standardize output format: We recommend using hints to standardize model outputs when benchmarking.
|
| 56 |
|
| 57 |
-
a. Math questions: Add a statement ```Please reason step by step, and put your final answer within \\boxed{}.``` to the prompt.
|
| 58 |
|
| 59 |
-
b. Code problems: Add
|
| 60 |
|
| 61 |
4. In particular, we use ```latex2sympy2``` and ```sympy``` to assist in judging complex Latex formats for the Math500 evaluation script. For all datasets, we generate 64 responses per query to estimate pass@1.
|
| 62 |
|
|
|
|
| 54 |
|
| 55 |
3. Standardize output format: We recommend using hints to standardize model outputs when benchmarking.
|
| 56 |
|
| 57 |
+
a. Math questions: Add a statement "```Please reason step by step, and put your final answer within \\boxed{}.```" to the prompt.
|
| 58 |
|
| 59 |
+
b. Code problems: Add "### Format: Read the inputs from stdin solve the problem and write the answer to stdout. Enclose your code within delimiters as follows.\n \```python\n# YOUR CODE HERE\n\```\n### Answer: (use the provided format with backticks)" to the prompt.
|
| 60 |
|
| 61 |
4. In particular, we use ```latex2sympy2``` and ```sympy``` to assist in judging complex Latex formats for the Math500 evaluation script. For all datasets, we generate 64 responses per query to estimate pass@1.
|
| 62 |
|