AssistantsLab
/

SmolLM2-135M-humanized

Text Generation

text-generation-inference

Model card Files Files and versions

Michielo commited on Jan 30, 2025

Commit

ab9c6b2

·

verified ·

1 Parent(s): 8d7053f

Update README.md

Files changed (1) hide show

README.md +13 -1

README.md CHANGED Viewed

@@ -27,7 +27,7 @@ tags:
 ## Model Summary
-**SmolLM2-135M-Humanized** is a fine-tuned version of the [SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) model, optimized using the Direct Preference Optimization (DPO) method. To do this we used the "[Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset)" from [Human-Like LLMs](https://huggingface.co/HumanLLMs). To not lose too much quality with this post-training, we also applied some extra training on the ["openbmb/UltraFeedback"](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset.
 Unlike traditional fine-tuning datasets that aim to improve specific benchmarks or metrics, the Human-Like-DPO-Dataset focuses on aligning the model's behavior with human preferences. This process enhances the model's ability to generate more natural, human-like responses, making it particularly well-suited for conversational applications.
@@ -118,4 +118,16 @@ Human-Like-DPO-Dataset:
       primaryClass={cs.CL},
       url={https://arxiv.org/abs/2501.05032},
 }
 ```

 ## Model Summary
+**SmolLM2-135M-Humanized** is a fine-tuned version of the [SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) model, optimized using the Direct Preference Optimization (DPO) method. To do this we used the "[Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset)" from [Human-Like LLMs](https://huggingface.co/HumanLLMs). To not lose too much quality with this post-training, we also applied some extra training on the "[openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback)" dataset.
 Unlike traditional fine-tuning datasets that aim to improve specific benchmarks or metrics, the Human-Like-DPO-Dataset focuses on aligning the model's behavior with human preferences. This process enhances the model's ability to generate more natural, human-like responses, making it particularly well-suited for conversational applications.
       primaryClass={cs.CL},
       url={https://arxiv.org/abs/2501.05032},
 }
+```
+UltraFeedback dataset:
+```bash
+@misc{cui2023ultrafeedback,
+      title={UltraFeedback: Boosting Language Models with High-quality Feedback},
+      author={Ganqu Cui and Lifan Yuan and Ning Ding and Guanming Yao and Wei Zhu and Yuan Ni and Guotong Xie and Zhiyuan Liu and Maosong Sun},
+      year={2023},
+      eprint={2310.01377},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
 ```