LLM360
/

CrystalChat

Text Generation

Eval Results (legacy)

Model card Files Files and versions

victormiller commited on Jun 17, 2024

Commit

1de5c8a

·

verified ·

1 Parent(s): fd74aa8

Update README.md

Files changed (1) hide show

README.md +7 -7

README.md CHANGED Viewed

@@ -193,13 +193,13 @@ As always, the training data, training code, and metrics are publicly available.
 # CrystalChat Performance
-|           Model          | Trained Tokens | Avg. of Avg. | Language Avg. | Coding Avg. |  ARC  | HellaSwag | MMLU (5-shot) | GSM8K | Winogrande(5-shot) | TruthfulQA | HumanEval (pass@1) | MBPP (pass@1) |
-|:------------------------:|:--------------:|:------------:|:-------------:|:-----------:|:-----:|:---------:|:-------------:|:-----:|:------------------:|:----------:|:------------------:|:-------------:|
-| CrystalChat 7B           | 1.275T         | 44.96        | 53.29         | 36.62       | 51.71 | 76.12     | 53.22         | 28.05 | 70.64              | 47.29      | 34.12              | 39.11         |
-| Mistral-7B-Instruct-v0.1 | -              | 44.34        | 54.86         | 30.62       | 58.05 | 75.71     | 55.56         | 32.00 | 74.27              | 55.90      | 29.27              | 31.96         |
-| CodeLlama-7b-Instruct    | 2.5T           | 40.91        | 45.29         | 36.52       | 43.35 | 66.14     | 42.75         | 15.92 | 64.33              | 39.23      | 34.12              | 38.91         |
-| Llama-2-7b-Chat          | 2T             | 34.11        | 52.86         | 15.35       | 53.07 | 78.39     | 48.42         | 18.88 | 73.09              | 45.30      | 13.26              | 17.43         |
-| AmberChat 7B             | 1.25T          |     -        | 44.76         |     -       | 42.83 | 74.03     | 38.88         | 5.31  | 66.77              | 40.72      |     -              |       -       |
 |           Model          | Trained Tokens |  ARC  | HellaSwag | MMLU (5-shot) | GSM8K | Winogrande(5-shot) | TruthfulQA | HumanEval (pass@1) | MBPP (pass@1) |
 |:------------------------:|:--------------:|:------------:|:-------------:|:-----------:|:-----:|:---------:|:-------------:|:-----:|:------------------:|

 # CrystalChat Performance
+|           Model          | Trained Tokens | Avg. of Avg. | Language Avg. | Coding Avg.
+|:------------------------:|:--------------:|:------------:|:-------------:|:-----------:|
+| CrystalChat 7B           | 1.275T         | 44.96        | 53.29         | 36.62       |
+| Mistral-7B-Instruct-v0.1 | -              | 44.34        | 54.86         | 30.62       |
+| CodeLlama-7b-Instruct    | 2.5T           | 40.91        | 45.29         | 36.52       |
+| Llama-2-7b-Chat          | 2T             | 34.11        | 52.86         | 15.35       |
+| AmberChat 7B             | 1.25T          |     -        | 44.76         |     -       |
 |           Model          | Trained Tokens |  ARC  | HellaSwag | MMLU (5-shot) | GSM8K | Winogrande(5-shot) | TruthfulQA | HumanEval (pass@1) | MBPP (pass@1) |
 |:------------------------:|:--------------:|:------------:|:-------------:|:-----------:|:-----:|:---------:|:-------------:|:-----:|:------------------:|