aisquared
/

dlite-v2-774m

Text Generation

text-generation-inference

Model card Files Files and versions

Adding Evaluation Results

#4

by leaderboard-pr-bot - opened Nov 17, 2023

base: refs/heads/main

←

from: refs/pr/4

Discussion Files changed

Files changed (1) hide show

README.md +14 -1

README.md CHANGED Viewed

@@ -115,4 +115,17 @@ state of the art, but rather further show that chat-like behaviors in LLMs can b
 *DLite is an experimental technology and is not designed for use in any environment without significant testing and safety consideration.
 Furthermore, the model can sometimes exhibit undesired behaviors. Some of these behaviors include, but are not limited to: factual
 inaccuracies, biases, offensive responses, toxicity, and hallucinations. Just as with any other LLM, we advise users of this technology
-to exercise good judgment when applying this technology.*

 *DLite is an experimental technology and is not designed for use in any environment without significant testing and safety consideration.
 Furthermore, the model can sometimes exhibit undesired behaviors. Some of these behaviors include, but are not limited to: factual
 inaccuracies, biases, offensive responses, toxicity, and hallucinations. Just as with any other LLM, we advise users of this technology
+to exercise good judgment when applying this technology.*
+# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_aisquared__dlite-v2-774m)
+| Metric                | Value                     |
+|-----------------------|---------------------------|
+| Avg.                  | 29.01   |
+| ARC (25-shot)         | 30.12          |
+| HellaSwag (10-shot)   | 47.68    |
+| MMLU (5-shot)         | 25.37         |
+| TruthfulQA (0-shot)   | 40.0   |
+| Winogrande (5-shot)   | 53.99   |
+| GSM8K (5-shot)        | 0.0        |
+| DROP (3-shot)         | 5.93         |