Luciole LLM Collection Open Source LLM in French, English, German, Spanish, Italian, Portuguese, Dutch and Arabic • 8 items • Updated 22 days ago • 9
Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction Paper • 2605.05242 • Published May 3 • 126
pplx-embed Collection Diffusion-Pretrained Dense and Contextual Embeddings • 10 items • Updated 30 days ago • 100
CLaRa Collection This is the Hugging Face repository for the paper CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning. • 3 items • Updated Nov 25, 2025 • 2
DR Tulu Collection Models and data associated with DR Tulu, http://allenai-web/papers/drtulu • 6 items • Updated Feb 24 • 37
Gaperon Collection Our French-English LLM suite (including Base and SFT models. All checkpoints are also included. • 16 items • Updated May 18 • 17
GLiNER-decoder Collection A joint encoder-decoder GLiNER model for a scalable open-ontology entity recognition • 3 items • Updated Jan 29 • 18
SauerkrautLM-Multilingual-(Reason)-ColBERT Collection SauerkrautLM ColBERT is a suite of Late-Interaction retrieval models built with PyLate’s ColBERT architecture and tuned for seven European languages. • 7 items • Updated Aug 3, 2025 • 20
GLiCLass-V3 Collection Models for zero-shot text classification that are up to 50 times faster than Cross-Encoders and show the same or higher accuracy. • 8 items • Updated Jan 29 • 21
view article Article Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders thomwolf, matthieu-lapeyre • Jul 9, 2025 • 803
Legal neural retrievers Collection Supervised models trained for statutory article retrieval • 6 items • Updated Jun 4, 2024 • 5
Pleias-RAG Collection New generation of small reasoning models for RAG, search, and source summarization. • 4 items • Updated Apr 24, 2025 • 30
view article Article How to generate text: using different decoding methods for language generation with Transformers patrickvonplaten • Mar 1, 2020 • 301
view article Article Controlling Language Model Generation with NVIDIA's LogitsProcessorZoo ariG23498, aerdem4 • Dec 23, 2024 • 51
view article Article Releasing the largest multilingual open pretraining dataset Pclanglais • Nov 13, 2024 • 108