| license: mit | |
| datasets: | |
| - HuggingFaceTB/smollm-corpus | |
| language: | |
| - en | |
| # Raw 1B Shared | |
| ## How to Get Started with the Model | |
| Use the code below to get started with the model. | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| model = AutoModelForCausalLM.from_pretrained( | |
| "l2t-project/raw-1b-shared" | |
| ) | |
| tokenizer = AutoTokenizer.from_pretrained( | |
| "l2t-project/raw-1b-shared" | |
| ) | |
| ``` | |
| ## Citation | |
| ``` | |
| @article{yamaguchi2026enhancinglinguisticcompetencelanguage, | |
| title={Enhancing Linguistic Competence of Language Models through Pre-training with Language Learning Tasks}, | |
| author={Atsuki Yamaguchi and Maggie Mi and Nikolaos Aletras}, | |
| year={2026}, | |
| eprint={2601.03448}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CL}, | |
| url={https://arxiv.org/abs/2601.03448}, | |
| journal={arXiv}, | |
| volume={abs/2601.03448} | |
| } | |
| ``` | |