Update README.md
Browse files
README.md
CHANGED
|
@@ -27,7 +27,7 @@ tags:
|
|
| 27 |
|
| 28 |
## Model Summary
|
| 29 |
|
| 30 |
-
**SmolLM2-135M-Humanized** is a fine-tuned version of the [SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) model, optimized using the Direct Preference Optimization (DPO) method. To do this we used the "[Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset)" from [Human-Like LLMs](https://huggingface.co/HumanLLMs).
|
| 31 |
|
| 32 |
Unlike traditional fine-tuning datasets that aim to improve specific benchmarks or metrics, the Human-Like-DPO-Dataset focuses on aligning the model's behavior with human preferences. This process enhances the model's ability to generate more natural, human-like responses, making it particularly well-suited for conversational applications.
|
| 33 |
|
|
@@ -73,17 +73,17 @@ In this section, we report the evaluation results of SmolLM2. All evaluations ar
|
|
| 73 |
|
| 74 |
| Metric | SmolLM2-135M-Instruct | SmolLM2-135M-Humanized | Difference |
|
| 75 |
|:-----------------------------|:---------------------:|:----------------------:|:----------:|
|
| 76 |
-
| MMLU | **23.1** |
|
| 77 |
-
| ARC (Easy) |
|
| 78 |
-
| ARC (Challenge) | **26.1** | 25.
|
| 79 |
-
| HellaSwag | **43.0** |
|
| 80 |
-
| PIQA | **67.2** |
|
| 81 |
-
| WinoGrande | **52.5** | 52.
|
| 82 |
-
| TriviaQA | **0.3** | 0.
|
| 83 |
-
| GSM8K | 0.2 | **0.
|
| 84 |
-
| OpenBookQA |
|
| 85 |
-
|
|
| 86 |
-
|
| 87 |
|
| 88 |
|
| 89 |
## Limitations
|
|
|
|
| 27 |
|
| 28 |
## Model Summary
|
| 29 |
|
| 30 |
+
**SmolLM2-135M-Humanized** is a fine-tuned version of the [SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) model, optimized using the Direct Preference Optimization (DPO) method. To do this we used the "[Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset)" from [Human-Like LLMs](https://huggingface.co/HumanLLMs). To not lose too much quality with this post-training, we also applied some extra training on the ["openbmb/UltraFeedback"](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset.
|
| 31 |
|
| 32 |
Unlike traditional fine-tuning datasets that aim to improve specific benchmarks or metrics, the Human-Like-DPO-Dataset focuses on aligning the model's behavior with human preferences. This process enhances the model's ability to generate more natural, human-like responses, making it particularly well-suited for conversational applications.
|
| 33 |
|
|
|
|
| 73 |
|
| 74 |
| Metric | SmolLM2-135M-Instruct | SmolLM2-135M-Humanized | Difference |
|
| 75 |
|:-----------------------------|:---------------------:|:----------------------:|:----------:|
|
| 76 |
+
| MMLU | **23.1** | 23.0 | -0.1 |
|
| 77 |
+
| ARC (Easy) | 54.3 | **55.0** | +0.7 |
|
| 78 |
+
| ARC (Challenge) | **26.1** | 25.5 | -0.6 |
|
| 79 |
+
| HellaSwag | **43.0** | 42.4 | -0.6 |
|
| 80 |
+
| PIQA | **67.2** | 67.0 | -0.2 |
|
| 81 |
+
| WinoGrande | **52.5** | 52.1 | -0.4 |
|
| 82 |
+
| TriviaQA | **0.3** | 0.2 | -0.1 |
|
| 83 |
+
| GSM8K | 0.2 | **0.8** | +0.6 |
|
| 84 |
+
| OpenBookQA | 32.6 | **33.0** | +0.4 |
|
| 85 |
+
| QuAC (F1) | **14.1** | 13.2 | -0.9 |
|
| 86 |
+
|
| 87 |
|
| 88 |
|
| 89 |
## Limitations
|