alpindale commited on
Commit
89cccbb
·
1 Parent(s): 19ff5d9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -1
README.md CHANGED
@@ -71,7 +71,34 @@ The model was evaluated using EleutherAI's [lm-evaluation-harness](https://githu
71
  ```
72
  anli_r1,anli_r2,anli_r3,arc_challenge,arc_easy,boolq,cb,hellaswag,openbookqa,piqa,rte,truthfulqa_mc,wic,winogrande,wsc
73
  ```
74
- Comparison of Metharme-1.3B's performance on benchmarks to Pygmalion-6B, Metharme-7B, and [RedPajama-INCITE-Chat-3B-v1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-3B-v1):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
  ![Metharme 1.3B Evaluation Results](https://files.catbox.moe/mqemcg.png)
76
 
77
  ## Limitations and biases
 
71
  ```
72
  anli_r1,anli_r2,anli_r3,arc_challenge,arc_easy,boolq,cb,hellaswag,openbookqa,piqa,rte,truthfulqa_mc,wic,winogrande,wsc
73
  ```
74
+ ```
75
+ | Task |Version| Metric |Value | |Stderr|
76
+ |-------------|------:|--------|-----:|---|-----:|
77
+ |anli_r1 | 0|acc |0.3310|± |0.0149|
78
+ |anli_r2 | 0|acc |0.3360|± |0.0149|
79
+ |anli_r3 | 0|acc |0.3333|± |0.0136|
80
+ |arc_challenge| 0|acc |0.2765|± |0.0131|
81
+ | | |acc_norm|0.3131|± |0.0136|
82
+ |arc_easy | 0|acc |0.6221|± |0.0099|
83
+ | | |acc_norm|0.5652|± |0.0102|
84
+ |boolq | 1|acc |0.6208|± |0.0085|
85
+ |cb | 1|acc |0.2143|± |0.0553|
86
+ | | |f1 |0.1687| | |
87
+ |hellaswag | 0|acc |0.4298|± |0.0049|
88
+ | | |acc_norm|0.5505|± |0.0050|
89
+ |openbookqa | 0|acc |0.2300|± |0.0188|
90
+ | | |acc_norm|0.3420|± |0.0212|
91
+ |piqa | 0|acc |0.7231|± |0.0104|
92
+ | | |acc_norm|0.7334|± |0.0103|
93
+ |rte | 0|acc |0.5235|± |0.0301|
94
+ |truthfulqa_mc| 1|mc1 |0.2448|± |0.0151|
95
+ | | |mc2 |0.3800|± |0.0142|
96
+ |wic | 0|acc |0.5000|± |0.0198|
97
+ |winogrande | 0|acc |0.5675|± |0.0139|
98
+ |wsc | 0|acc |0.3654|± |0.0474|
99
+
100
+ ```
101
+ Illustrated comparison of Metharme-1.3B's performance on benchmarks to Pygmalion-6B, Metharme-7B, and [RedPajama-INCITE-Chat-3B-v1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-3B-v1):
102
  ![Metharme 1.3B Evaluation Results](https://files.catbox.moe/mqemcg.png)
103
 
104
  ## Limitations and biases