enekovalero commited on
Commit
7f4dede
·
verified ·
1 Parent(s): 9622098

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -18,7 +18,7 @@ This model is released as a **base model**, intended for further fine-tuning or
18
 
19
  ## Training Data
20
 
21
- To train language-specific base LLMs, we followed the methodology proposed by Etxaniz et al. (2024), originally developed for Basque, and extended it to other low-resource languages. To enable fair comparisons across languages, we limited the corpus size for each language to roughly the same number of tokens. We also included a small English subset to mitigate catastrophic forgetting.
22
 
23
  ### Corpus composition
24
 
 
18
 
19
  ## Training Data
20
 
21
+ To train language-specific base LLMs, we followed the methodology proposed by [Etxaniz et al. (2024)](https://aclanthology.org/2024.acl-long.799/), originally developed for Basque, and extended it to other low-resource languages. To enable fair comparisons across languages, we limited the corpus size for each language to roughly the same number of tokens. We also included a small English subset to mitigate catastrophic forgetting.
22
 
23
  ### Corpus composition
24