AssistantsLab
/

SmolLM2-135M-humanized

Text Generation

text-generation-inference

Model card Files Files and versions

Michielo commited on Jan 28, 2025

Commit

28cbde0

·

verified ·

1 Parent(s): 367f77f

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -27,7 +27,7 @@ tags:
 ## Model Summary
-**SmolLM2-135M-Humanized** is a fine-tuned version of the [SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) model, optimized using the Direct Preference Optimization (DPO) method.
 Unlike traditional fine-tuning approaches that aim to improve specific benchmarks or metrics, DPO fine-tuning focuses on aligning the model's behavior with human preferences. This process enhances the model's ability to generate more natural, human-like responses, making it particularly well-suited for conversational applications.

 ## Model Summary
+**SmolLM2-135M-Humanized** is a fine-tuned version of the [SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) model, optimized using the Direct Preference Optimization (DPO) method. To do this we used the "[Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset)" from [Human-Like LLMs](https://huggingface.co/HumanLLMs).
 Unlike traditional fine-tuning approaches that aim to improve specific benchmarks or metrics, DPO fine-tuning focuses on aligning the model's behavior with human preferences. This process enhances the model's ability to generate more natural, human-like responses, making it particularly well-suited for conversational applications.