Update README.md
Browse files
README.md
CHANGED
|
@@ -36,7 +36,15 @@ There's a few reasons on why I called this model v0:
|
|
| 36 |
|
| 37 |
# Evaluation
|
| 38 |
|
| 39 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
|
| 41 |
# Usage
|
| 42 |
|
|
|
|
| 36 |
|
| 37 |
# Evaluation
|
| 38 |
|
| 39 |
+
I ran these evaluations using [SmolLM2's evaluation code](https://github.com/huggingface/smollm/tree/main/evaluation) for a more fair comparison.
|
| 40 |
+
|
| 41 |
+
| Metric | SultanR/SmolTulu-1.7b-it-v0 | SmolLM2-1.7B-Instruct | Llama-1B-Instruct | Qwen2.5-1.5B-Instruct | SmolLM1-1.7B-Instruct |
|
| 42 |
+
|:----------------------------|:---------------------:|:---------------------:|:---------------------:|:---------------------:|:---------------------:|
|
| 43 |
+
| IFEval (Average prompt/inst) | **67.7** | 56.7 | 53.5 | 47.4 | 23.1 |
|
| 44 |
+
| GSM8K (5-shot) | **51.6** | 48.2 | 26.8 | 42.8 | 4.6 |
|
| 45 |
+
| ARC (Average) | 51.5 | **51.7** | 41.6 | 46.2 | 43.7 |
|
| 46 |
+
| HellaSwag | 61.1 | **66.1** | 56.1 | 60.9 | 55.5 |
|
| 47 |
+
| MMLU-Pro (MCF) | 17.4 | 19.3 | 12.7 | **24.2** | 11.7 |
|
| 48 |
|
| 49 |
# Usage
|
| 50 |
|