A collection of training corpus and models for "Multilingual Language Model Pretraining using Machine-translated Data".
BritLLM
community
AI & ML interests
contact@llm.org.uk
datasets 18
britllm/TransWebEdu
Updated • 843 • 3
britllm/TransWeb-Edu-English
Viewer • Updated • 36M • 178
britllm/TransWeb-Edu-Spanish
Viewer • Updated • 35.2M • 323 • 3
britllm/TransWeb-Edu-French
Viewer • Updated • 36M • 570
britllm/TransWeb-Edu-German
Viewer • Updated • 36M • 334 • 1
britllm/xnli_brit
Viewer • Updated • 9.69k • 26
britllm/piqa_scottish_gaelic
Updated • 4
britllm/piqa_welsh
Updated • 4
britllm/piqa_irish
Updated • 4
britllm/arc_scottish_gaelic
Viewer • Updated • 7.56k • 14