MayaPH
/

GodziLLa-30B

Text Generation

text-generation-inference

Model card Files Files and versions

jaspercatapang commited on Jul 19, 2023

Commit

b3ce9ad

·

1 Parent(s): 9b99d9d

Update README.md

Files changed (1) hide show

README.md +15 -0

README.md CHANGED Viewed

@@ -16,6 +16,21 @@ GodziLLa-30B is an experimental combination of various proprietary Maya LoRAs wi
 ![Godzilla Let Them Fight Meme GIF](https://media.tenor.com/AZkmVImwd5YAAAAC/godzilla-let-them-fight.gif)
 ## Recommended Prompt Format
 Alpaca's instruction is the recommended prompt format, but Vicuna's instruction format may also work.

 ![Godzilla Let Them Fight Meme GIF](https://media.tenor.com/AZkmVImwd5YAAAAC/godzilla-let-them-fight.gif)
+## Open LLM Leaderboard Metrics
+| Metric                | Value |
+|-----------------------|-------|
+| MMLU (5-shot)         | 55.1  |
+| ARC (25-shot)         | 54.2  |
+| HellaSwag (10-shot)   | 79.7  |
+| TruthfulQA (0-shot)   | 53.3  |
+| Average               | 60.6  |
+According to the leaderboard description, here are the benchmarks used for the evaluation:
+- [MMLU](https://arxiv.org/abs/2009.03300) (5-shot) - a test to measure a text model’s multitask accuracy. The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more.
+- [AI2 Reasoning Challenge](https://arxiv.org/abs/1803.05457) -ARC- (25-shot) - a set of grade-school science questions.
+- [HellaSwag](https://arxiv.org/abs/1905.07830) (10-shot) - a test of commonsense inference, which is easy for humans (~95%) but challenging for SOTA models.
+- [TruthfulQA](https://arxiv.org/abs/2109.07958) (0-shot) - a test to measure a model’s propensity to reproduce falsehoods commonly found online.
 ## Recommended Prompt Format
 Alpaca's instruction is the recommended prompt format, but Vicuna's instruction format may also work.