view article Article EMO: Pretraining mixture of experts for emergent modularity allenai • 5 days ago • 30
view article Article Building a Fast Multilingual OCR Model with Synthetic Data nvidia • 26 days ago • 33
DFlash Collection Block Diffusion for Flash Speculative Decoding • 21 items • Updated 4 days ago • 110
view article Article NEO-unify: Building Native Multimodal Unified Models End to End sensenova • Mar 5 • 159
view article Article Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents ibm-research • 28 days ago • 28
TIPSv2 Collection TIPSv2 foundational vision-language models. Webpage: https://gdm-tipsv2.github.io/ • 9 items • Updated 29 days ago • 30
view article Article Welcome Gemma 4: Frontier multimodal intelligence on device +5 merve, pcuenq, sergiopaniego, burtenshaw, Steveeeeeeen, alvarobartt, SaylorTwift • Apr 2 • 891
view article Article Waypoint-1.5: Higher-Fidelity Interactive Worlds for Everyday GPUs +3 lapp0, LouisCastricato, ScottieFox, shahbuland, xAesthetics • Apr 9 • 29
view article Article Multimodal Embedding & Reranker Models with Sentence Transformers tomaarsen • Apr 9 • 59
view article Article TRL v1.0: Post-Training Library Built to Move with the Field +2 qgallouedec, stevhliu, pcuenq, sergiopaniego • Mar 31 • 51
Mistral Small 4 Collection A state-of-the-art model, open-weight, with a granular Mixture-of-Experts architecture that fuses instruct, reasoning and agentic skills. • 3 items • Updated Mar 16 • 74
view article Article Ulysses Sequence Parallelism: Training with Million-Token Contexts kashif, stas • Mar 9 • 28
view article Article Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries +7 aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, nouamanetazi, lvwerra, sergiopaniego • Mar 10 • 151
NVIDIA Nemotron v3 Collection Open, Production-ready Enterprise Models • 18 items • Updated 5 days ago • 288