l2t-project
/

raw-1b-shared

@@ -1,14 +1,21 @@
 ---
-license: mit
 datasets:
 - HuggingFaceTB/smollm-corpus
 language:
 - en
 ---
 # Raw 1B Shared
 ## How to Get Started with the Model
 Use the code below to get started with the model.
@@ -25,7 +32,7 @@ tokenizer = AutoTokenizer.from_pretrained(
 ## Citation
-```
 @article{yamaguchi2026enhancinglinguisticcompetencelanguage,
       title={Enhancing Linguistic Competence of Language Models through Pre-training with Language Learning Tasks},
       author={Atsuki Yamaguchi and Maggie Mi and Nikolaos Aletras},
@@ -37,6 +44,4 @@ tokenizer = AutoTokenizer.from_pretrained(
       journal={arXiv},
       volume={abs/2601.03448}
 }
-```

 ---
 datasets:
 - HuggingFaceTB/smollm-corpus
 language:
 - en
+license: mit
+library_name: transformers
+pipeline_tag: text-generation
 ---
 # Raw 1B Shared
+This model is a 1B parameter language model pre-trained as a baseline for the research presented in the paper [Enhancing Linguistic Competence of Language Models through Pre-training with Language Learning Tasks](https://huggingface.co/papers/2601.03448).
+L2T (Language Learning Tasks) is a pre-training framework that integrates structured linguistic tasks alongside standard next-token prediction to explicitly optimize for linguistic competence in Large Language Models (LLMs). This specific checkpoint is the baseline model trained on raw text.
+- **Paper:** [Enhancing Linguistic Competence of Language Models through Pre-training with Language Learning Tasks](https://huggingface.co/papers/2601.03448)
+- **Repository:** [gucci-j/l2t](https://github.com/gucci-j/l2t)
 ## How to Get Started with the Model
 Use the code below to get started with the model.
 ## Citation
+```bibtex
 @article{yamaguchi2026enhancinglinguisticcompetencelanguage,
       title={Enhancing Linguistic Competence of Language Models through Pre-training with Language Learning Tasks},
       author={Atsuki Yamaguchi and Maggie Mi and Nikolaos Aletras},
       journal={arXiv},
       volume={abs/2601.03448}
 }
+```