mmBERT: a modern multilingual encoder Collection mmBERT is trained on 3T tokens from over 1800 languages, showing SoTA scores on benchmarks and exceptional low-resource performance • 16 items • Updated Sep 9, 2025 • 52
view article Article Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders Jul 9, 2025 • 785
view article Article Training and Finetuning Reranker Models with Sentence Transformers v4 Mar 26, 2025 • 185
Llama-3.1-Nemotron-70B Collection SOTA models on Arena Hard and RewardBench as of 1 Oct 2024. • 6 items • Updated 5 days ago • 155
Embedding Model Datasets Collection A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers • 70 items • Updated Dec 10, 2025 • 164
Awesome feedback datasets Collection A curated list of datasets with human or AI feedback. Useful for training reward models or applying techniques like DPO. • 19 items • Updated Apr 12, 2024 • 68
Awesome SFT datasets Collection A curated list of interesting datasets to fine-tune language models with. • 41 items • Updated 9 days ago • 148