Baseline models for the BabyLM 2026 challenge.
AI & ML interests
Pretraining data constrained and cognitively relevant baby LLMs
Recent Activity
Papers
View all PapersA collection containing the baseline models for the BabyLM 2025 edition
A collection of datasets with multilingual data resources. Used as part of the BabyBabelLM initiatives.
Baseline models for the BabyLM 2026 challenge.
All materials for the 2026 edition of BabyLM
A collection containing the baseline models for the BabyLM 2025 edition
A multilingual collection of datasets modeling the language a person observes from birth until they acquire a native language.
A collection of datasets with multilingual data resources. Used as part of the BabyBabelLM initiatives.
Collection of subtitles as part of the multilingual BabyBabelLM datasets.