| --- |
| datasets: |
| - togethercomputer/RedPajama-Data-1T |
| language: |
| - en |
| library_name: transformers |
| license: apache-2.0 |
| pipeline_tag: text-generation |
| --- |
| |
| ```markdown |
| ## BSL-1.7B |
| |
| [paper](https://arxiv.org/abs/2410.07064) | [code](https://github.com/microsoft/LMOps/tree/main/data_selection) |
| |
| **BSL-1.7B** is a 1.7B model with [Mistral](https://arxiv.org/abs/2310.06825) achitecture pre-trained from scratch on the CC split of [Redpajama](https://github.com/togethercomputer/RedPajama-Data). |
| |
| **It is used as the baseline for [PDS-1.7B](https://huggingface.co/Data-Selection/PDS-1.7B).** |
| |
| ### Evaluation |
| |
| PDS-selected data improves the performance of language models pre-trained from scratch and saves pre-training comptation. The improvement scales up to large model sizes. |
| |
| <p align='left'> |
| <img src="https://cdn-uploads.huggingface.co/production/uploads/624ac662102fcdff87be51b9/6undIr37d10qD73TDiPDK.png" width="600"> |
| </p> |
| |
| ### Citation |
| |
| ```bibtex |
| @article{gu2024data, |
| title={Data Selection via Optimal Control for Language Models}, |
| author={Gu, Yuxian and Dong, Li and Wang, Hongning and Hao, Yaru and Dong, Qingxiu and Wei, Furu and Huang, Minlie}, |
| journal={arXiv preprint arXiv:2410.07064}, |
| year={2024} |
| } |
| ``` |
| ``` |