HiTZ
/

cat_Llama-3.1-8B

Text Generation

text-generation-inference

Model card Files Files and versions

enekovalero commited on Dec 20, 2025

Commit

7f4dede

·

verified ·

1 Parent(s): 9622098

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ This model is released as a **base model**, intended for further fine-tuning or
 ## Training Data
-To train language-specific base LLMs, we followed the methodology proposed by Etxaniz et al. (2024), originally developed for Basque, and extended it to other low-resource languages. To enable fair comparisons across languages, we limited the corpus size for each language to roughly the same number of tokens. We also included a small English subset to mitigate catastrophic forgetting.
 ### Corpus composition

 ## Training Data
+To train language-specific base LLMs, we followed the methodology proposed by [Etxaniz et al. (2024)](https://aclanthology.org/2024.acl-long.799/), originally developed for Basque, and extended it to other low-resource languages. To enable fair comparisons across languages, we limited the corpus size for each language to roughly the same number of tokens. We also included a small English subset to mitigate catastrophic forgetting.
 ### Corpus composition