Update README.md
Browse files
README.md
CHANGED
|
@@ -71,7 +71,34 @@ The model was evaluated using EleutherAI's [lm-evaluation-harness](https://githu
|
|
| 71 |
```
|
| 72 |
anli_r1,anli_r2,anli_r3,arc_challenge,arc_easy,boolq,cb,hellaswag,openbookqa,piqa,rte,truthfulqa_mc,wic,winogrande,wsc
|
| 73 |
```
|
| 74 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |

|
| 76 |
|
| 77 |
## Limitations and biases
|
|
|
|
| 71 |
```
|
| 72 |
anli_r1,anli_r2,anli_r3,arc_challenge,arc_easy,boolq,cb,hellaswag,openbookqa,piqa,rte,truthfulqa_mc,wic,winogrande,wsc
|
| 73 |
```
|
| 74 |
+
```
|
| 75 |
+
| Task |Version| Metric |Value | |Stderr|
|
| 76 |
+
|-------------|------:|--------|-----:|---|-----:|
|
| 77 |
+
|anli_r1 | 0|acc |0.3310|± |0.0149|
|
| 78 |
+
|anli_r2 | 0|acc |0.3360|± |0.0149|
|
| 79 |
+
|anli_r3 | 0|acc |0.3333|± |0.0136|
|
| 80 |
+
|arc_challenge| 0|acc |0.2765|± |0.0131|
|
| 81 |
+
| | |acc_norm|0.3131|± |0.0136|
|
| 82 |
+
|arc_easy | 0|acc |0.6221|± |0.0099|
|
| 83 |
+
| | |acc_norm|0.5652|± |0.0102|
|
| 84 |
+
|boolq | 1|acc |0.6208|± |0.0085|
|
| 85 |
+
|cb | 1|acc |0.2143|± |0.0553|
|
| 86 |
+
| | |f1 |0.1687| | |
|
| 87 |
+
|hellaswag | 0|acc |0.4298|± |0.0049|
|
| 88 |
+
| | |acc_norm|0.5505|± |0.0050|
|
| 89 |
+
|openbookqa | 0|acc |0.2300|± |0.0188|
|
| 90 |
+
| | |acc_norm|0.3420|± |0.0212|
|
| 91 |
+
|piqa | 0|acc |0.7231|± |0.0104|
|
| 92 |
+
| | |acc_norm|0.7334|± |0.0103|
|
| 93 |
+
|rte | 0|acc |0.5235|± |0.0301|
|
| 94 |
+
|truthfulqa_mc| 1|mc1 |0.2448|± |0.0151|
|
| 95 |
+
| | |mc2 |0.3800|± |0.0142|
|
| 96 |
+
|wic | 0|acc |0.5000|± |0.0198|
|
| 97 |
+
|winogrande | 0|acc |0.5675|± |0.0139|
|
| 98 |
+
|wsc | 0|acc |0.3654|± |0.0474|
|
| 99 |
+
|
| 100 |
+
```
|
| 101 |
+
Illustrated comparison of Metharme-1.3B's performance on benchmarks to Pygmalion-6B, Metharme-7B, and [RedPajama-INCITE-Chat-3B-v1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-3B-v1):
|
| 102 |

|
| 103 |
|
| 104 |
## Limitations and biases
|