mmBERT: a modern multilingual encoder Collection mmBERT is trained on 3T tokens from over 1800 languages, showing SoTA scores on benchmarks and exceptional low-resource performance • 16 items • Updated Sep 9, 2025 • 53
view article Article Train and Fine-Tune Sentence Transformers Models espejelomar • Aug 10, 2022 • 17
meta-llama/Llama-3.2-11B-Vision-Instruct Image-Text-to-Text • 11B • Updated Dec 4, 2024 • 214k • 1.59k
view article Article Releasing the largest multilingual open pretraining dataset Pclanglais • Nov 13, 2024 • 107