Update README.md
Browse files
README.md
CHANGED
|
@@ -71,6 +71,9 @@ In this section, we report the evaluation results of SmolLM2. All evaluations ar
|
|
| 71 |
|
| 72 |
## Instruction model Vs. Humanized model
|
| 73 |
|
|
|
|
|
|
|
|
|
|
| 74 |
| Metric | SmolLM2-1.7B-Instruct | SmolLM2-1.7B-Humanized | Difference |
|
| 75 |
|:-----------------------------|:---------------------:|:----------------------:|:----------:|
|
| 76 |
| MMLU | **49.5** | 48.8 | -0.7 |
|
|
|
|
| 71 |
|
| 72 |
## Instruction model Vs. Humanized model
|
| 73 |
|
| 74 |
+
### Note
|
| 75 |
+
We observe an unexpectedly worse TriviaQA score compared to the base instruct model. A bit of training on a dataset such as squad-v2 quickly resolves this issue and just one epoch results in a TriviaQA score far above the base instruct model (>21). We did not release this model due to worse scores on different metrics after this one epoch training. If your specific use-case requires a better grasp of trivia, feel free to train on squad-v2.
|
| 76 |
+
|
| 77 |
| Metric | SmolLM2-1.7B-Instruct | SmolLM2-1.7B-Humanized | Difference |
|
| 78 |
|:-----------------------------|:---------------------:|:----------------------:|:----------:|
|
| 79 |
| MMLU | **49.5** | 48.8 | -0.7 |
|