Update README.md
Browse files
README.md
CHANGED
|
@@ -8,3 +8,20 @@ GLORT2 (GLORT2 Low Rank Transformer Transformer) is a transformer model where ev
|
|
| 8 |
|
| 9 |
also sorry I just realized theres some residual from where I copied the model code from in my own projects that includes some "expanded lm head size" stuff just ignore that if you're looking at the config and code this isn't a serious project so I don't care too much that it's there
|
| 10 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
also sorry I just realized theres some residual from where I copied the model code from in my own projects that includes some "expanded lm head size" stuff just ignore that if you're looking at the config and code this isn't a serious project so I don't care too much that it's there
|
| 10 |
|
| 11 |
+
| model | 512-token strided perplexity on a pile test set | tokens |
|
| 12 |
+
| --- | --- | --- |
|
| 13 |
+
| cerebras 111m | 21.550655364990234 | 2.2b |
|
| 14 |
+
| cerebras 256m | 15.203496932983398 | 5.1b |
|
| 15 |
+
| pythia 70m | 22.393400192260742 | 300b |
|
| 16 |
+
| pythia 160m | 13.933751106262207 | 300b |
|
| 17 |
+
| pythia 410m | 9.61842155456543 | 300b |
|
| 18 |
+
| GLORT2 (205m) | 13.051741600036621 | 2.2b |
|
| 19 |
+
| custom llama w same settings as cerebras 111m | 13.882301330566406 | 2.2b |
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|
| 23 |
+
|-------------|------:|------|-----:|--------|-----:|---|-----:|
|
| 24 |
+
|arc_challenge| 1|none | 25|acc |0.1706|± |0.0110|
|
| 25 |
+
| | |none | 25|acc_norm|0.2099|± |0.0119|
|
| 26 |
+
|truthfulqa_mc2| 2|none | 0|acc |0.4599|± |0.0154|
|
| 27 |
+
|winogrande| 1|none | 5|acc |0.5083|± |0.0141|
|