ColBERT-Zero 🐶 Collection First large-scale fully pre-trained ColBERT model using only public data, outperforming GTE-ModernColBERT and GTE-ModernBERT • 10 items • Updated Apr 7 • 22
view article Article LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family lightonai • Jan 19 • 94
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 16 items • Updated May 5, 2025 • 308
MobileLLM Collection Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 • 49 items • Updated Mar 2 • 141
Llama-3.1-Nemotron-70B Collection SOTA models on Arena Hard and RewardBench as of 1 Oct 2024. • 6 items • Updated 2 days ago • 156
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated Dec 6, 2024 • 672
Molmo Collection Artifacts for open multimodal language models. • 5 items • Updated Dec 23, 2025 • 308
xLAM models Collection xLAM: A Family of Large Action Models to Empower AI Agent Systems: https://github.com/SalesforceAIResearch/xLAM • 19 items • Updated Mar 2 • 59
GLiNER bi-encoders Collection Bi-encoder and poly-encoder architectures of GLiNER • 5 items • Updated Jan 29 • 13
🪐 SmolLM Collection A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos • 12 items • Updated May 5, 2025 • 251
view article Article SmolLM - blazingly fast and remarkably powerful +1 loubnabnl, anton-l, eliebak • Jul 16, 2024 • 455
CommonCatalog Collection Common Catalog, a dataset with Creative Commons licensed images and machine-generated caption pairs • 7 items • Updated Mar 2 • 16
view article Article Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval +1 aamirshakir, tomaarsen, SeanLee97 • Mar 22, 2024 • 134
Idefics2 🐶 Collection Idefics2-8B is a foundation vision-language model. In this collection, you will find the models, datasets and demo related to its creation. • 11 items • Updated May 6, 2024 • 93
Awesome Document AI Collection A collection of open-source document AI 📄 📝 📈 • 27 items • Updated Mar 11, 2024 • 80
Vector-io compatible Datasets Collection These datasets can be loaded into your vector database with a single line bash command • 15 items • Updated Sep 19, 2024 • 3
Pokemons dataset captioned with different models Collection The Pokemons dataset from Lambda Labs is quite popular in the diffusion community because it lets us quickly validate ideas. • 3 items • Updated Nov 28, 2023 • 3