Spaces:
Runtime error
Runtime error
Fix
Browse files
README.md
CHANGED
|
@@ -60,9 +60,9 @@ metrics, results = metric.compute(
|
|
| 60 |
|
| 61 |
The `bc_eval` metric outputs two things:
|
| 62 |
|
| 63 |
-
|
| 64 |
|
| 65 |
-
|
| 66 |
|
| 67 |
#### Values from Popular Papers
|
| 68 |
[PaLM-2](https://arxiv.org/pdf/2305.10403.pdf) Performance on BC-HumanEval (`pass@1` with greedy decoding):
|
|
@@ -87,7 +87,7 @@ The `bc_eval` metric outputs two things:
|
|
| 87 |
Full example with inputs that fail tests, time out, have an error, and pass.
|
| 88 |
|
| 89 |
#### Passing Example
|
| 90 |
-
```
|
| 91 |
import evaluate
|
| 92 |
from datasets import load_dataset
|
| 93 |
import os
|
|
|
|
| 60 |
|
| 61 |
The `bc_eval` metric outputs two things:
|
| 62 |
|
| 63 |
+
`metrics`: a dictionary with the pass rates for each k value defined in the arguments and the mean percent of tests passed per question. The keys are formatted as `{LANGUAGE NAME}/{METRIC NAME}`
|
| 64 |
|
| 65 |
+
`results`: a list of dictionaries with the results from each individual prediction.
|
| 66 |
|
| 67 |
#### Values from Popular Papers
|
| 68 |
[PaLM-2](https://arxiv.org/pdf/2305.10403.pdf) Performance on BC-HumanEval (`pass@1` with greedy decoding):
|
|
|
|
| 87 |
Full example with inputs that fail tests, time out, have an error, and pass.
|
| 88 |
|
| 89 |
#### Passing Example
|
| 90 |
+
```Python
|
| 91 |
import evaluate
|
| 92 |
from datasets import load_dataset
|
| 93 |
import os
|