Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published Dec 18, 2024 • 167
view article Article Nemotron 3 Nano 4B: A Compact Hybrid Model for Efficient Local AI nvidia • Mar 17 • 67
view article Article Aligning to What? Rethinking Agent Generalization in MiniMax M2 MiniMax-AI • Oct 30, 2025 • 43
view article Article Building the Open Agent Ecosystem Together: Introducing OpenEnv +8 spisakjo, darktex, zkwentz, mortimerp9, Sanyam, Hamid-Nazeri, Pankit01, emre0, lewtun, reach-vb • Oct 23, 2025 • 164
view article Article There is no such thing as a tokenizer-free lunch catherinearnett • Sep 25, 2025 • 100
view article Article Evaluate Your Own RAG: Why Best Practices Failed Us charles-azam • Nov 5, 2025 • 14
The Path Not Taken: RLVR Provably Learns Off the Principals Paper • 2511.08567 • Published Nov 11, 2025 • 36
view article Article Why Did MiniMax M2 End Up as a Full Attention Model? MiniMax-AI • Oct 30, 2025 • 80
Nemotron-Pre-Training-Datasets Collection Large scale pre-training datasets used in the Nemotron family of models. • 15 items • Updated 14 days ago • 170
SSDD: Single-Step Diffusion Decoder for Efficient Image Tokenization Paper • 2510.04961 • Published Oct 6, 2025 • 5
NVIDIA Nemotron V2 Collection Open, Production-ready Enterprise Models. Nvidia Open Model license. • 9 items • Updated 14 days ago • 106
view article Article You could have designed state of the art positional encoding FL33TW00D-HF • Nov 25, 2024 • 487
🧠 SmolLM3 Collection Smol, multilingual, long-context reasoner • 14 items • Updated Oct 9, 2025 • 104
DeepSeek R1 (All Versions) Collection DeepSeek-R1-0528 is here! The most powerful reasoning open LLM, available in GGUF, original & 4-bit formats. Includes Llama & Qwen distilled models. • 37 items • Updated 11 days ago • 269
Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices Paper • 2410.11795 • Published Oct 15, 2024 • 18