Update README.md
Browse files
README.md
CHANGED
|
@@ -27,7 +27,7 @@ tags:
|
|
| 27 |
|
| 28 |
## Model Summary
|
| 29 |
|
| 30 |
-
**SmolLM2-135M-Humanized** is a fine-tuned version of the [SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) model, optimized using the Direct Preference Optimization (DPO) method. To do this we used the "[Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset)" from [Human-Like LLMs](https://huggingface.co/HumanLLMs). To not lose too much quality with this post-training, we also applied some extra training on the [
|
| 31 |
|
| 32 |
Unlike traditional fine-tuning datasets that aim to improve specific benchmarks or metrics, the Human-Like-DPO-Dataset focuses on aligning the model's behavior with human preferences. This process enhances the model's ability to generate more natural, human-like responses, making it particularly well-suited for conversational applications.
|
| 33 |
|
|
@@ -118,4 +118,16 @@ Human-Like-DPO-Dataset:
|
|
| 118 |
primaryClass={cs.CL},
|
| 119 |
url={https://arxiv.org/abs/2501.05032},
|
| 120 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 121 |
```
|
|
|
|
| 27 |
|
| 28 |
## Model Summary
|
| 29 |
|
| 30 |
+
**SmolLM2-135M-Humanized** is a fine-tuned version of the [SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) model, optimized using the Direct Preference Optimization (DPO) method. To do this we used the "[Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset)" from [Human-Like LLMs](https://huggingface.co/HumanLLMs). To not lose too much quality with this post-training, we also applied some extra training on the "[openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback)" dataset.
|
| 31 |
|
| 32 |
Unlike traditional fine-tuning datasets that aim to improve specific benchmarks or metrics, the Human-Like-DPO-Dataset focuses on aligning the model's behavior with human preferences. This process enhances the model's ability to generate more natural, human-like responses, making it particularly well-suited for conversational applications.
|
| 33 |
|
|
|
|
| 118 |
primaryClass={cs.CL},
|
| 119 |
url={https://arxiv.org/abs/2501.05032},
|
| 120 |
}
|
| 121 |
+
```
|
| 122 |
+
|
| 123 |
+
UltraFeedback dataset:
|
| 124 |
+
```bash
|
| 125 |
+
@misc{cui2023ultrafeedback,
|
| 126 |
+
title={UltraFeedback: Boosting Language Models with High-quality Feedback},
|
| 127 |
+
author={Ganqu Cui and Lifan Yuan and Ning Ding and Guanming Yao and Wei Zhu and Yuan Ni and Guotong Xie and Zhiyuan Liu and Maosong Sun},
|
| 128 |
+
year={2023},
|
| 129 |
+
eprint={2310.01377},
|
| 130 |
+
archivePrefix={arXiv},
|
| 131 |
+
primaryClass={cs.CL}
|
| 132 |
+
}
|
| 133 |
```
|