pansophic
/

rocket-3B

Text Generation

Model card Files Files and versions

pansophic commited on Mar 1, 2024

Commit

a5cfcd4

·

verified ·

1 Parent(s): ddf1caa

Update README.md

Files changed (1) hide show

README.md +9 -10

README.md CHANGED Viewed

@@ -22,7 +22,7 @@ base_model: stabilityai/stablelm-3b-4e1t
 ## Performance
-Despite its compact dimensions, the model achieves outstanding scores in both MT-Bench [MT-Bench](https://huggingface.co/spaces/lmsys/mt-bench) and [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmarks, surpassing the performance of considerably larger models.
 | Model | Size | Alignment | MT-Bench (score) | AlpacaEval (win rate %) |
 |-------------|-----|----|---------------|--------------|
@@ -63,18 +63,17 @@ In AlpacaEval, Rocket 🦝 achieves a near 80% win rate, coupled with an average
 | **Rocket** 🦝 | **79.75** | **1.42** | **1242** |
-## Other benchmarks
 | Metric                | Value                     |
 |-----------------------|---------------------------|
-| Average               | 51.00             |
-| ARC (25-shot)         | 50.51          |
-| HellaSwag (10-shot)   | 76.45    |
-| MMLU (5-shot)        | 45.51        |
-| TruthfulQA (0-shot)   | 54.38   |
-| Winogrande (5-shot)   | 67.8   |
-| GSM8K (5-shot)        | 37.91        |
-| DROP (3-shot)        | 24.49        |
 ## Intended uses & limitations

 ## Performance
+Despite its compact dimensions, the model achieves outstanding scores in both [MT-Bench](https://huggingface.co/spaces/lmsys/mt-bench) and [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmarks, surpassing the performance of considerably larger models.
 | Model | Size | Alignment | MT-Bench (score) | AlpacaEval (win rate %) |
 |-------------|-----|----|---------------|--------------|
 | **Rocket** 🦝 | **79.75** | **1.42** | **1242** |
+## Open LLM leaderboard
 | Metric                | Value                     |
 |-----------------------|---------------------------|
+| Average               | 55.77             |
+| ARC                   | 50.6          |
+| HellaSwag             | 76.69    |
+| MMLU                  | 47.1        |
+| TruthfulQA            | 55.82   |
+| Winogrande            | 67.96   |
+| GSM8K                 | 36.47        |
 ## Intended uses & limitations