Update README.md
Browse files
README.md
CHANGED
|
@@ -27,7 +27,7 @@ tags:
|
|
| 27 |
|
| 28 |
## Model Summary
|
| 29 |
|
| 30 |
-
**SmolLM2-135M-Humanized** is a fine-tuned version of the [SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) model, optimized using the Direct Preference Optimization (DPO) method.
|
| 31 |
|
| 32 |
Unlike traditional fine-tuning approaches that aim to improve specific benchmarks or metrics, DPO fine-tuning focuses on aligning the model's behavior with human preferences. This process enhances the model's ability to generate more natural, human-like responses, making it particularly well-suited for conversational applications.
|
| 33 |
|
|
|
|
| 27 |
|
| 28 |
## Model Summary
|
| 29 |
|
| 30 |
+
**SmolLM2-135M-Humanized** is a fine-tuned version of the [SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) model, optimized using the Direct Preference Optimization (DPO) method. To do this we used the "[Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset)" from [Human-Like LLMs](https://huggingface.co/HumanLLMs).
|
| 31 |
|
| 32 |
Unlike traditional fine-tuning approaches that aim to improve specific benchmarks or metrics, DPO fine-tuning focuses on aligning the model's behavior with human preferences. This process enhances the model's ability to generate more natural, human-like responses, making it particularly well-suited for conversational applications.
|
| 33 |
|