Grouped Query Experts: Mixture-of-Experts on GQA Self-Attention Paper • 2606.20945 • Published 10 days ago • 75
Grouped Query Experts: Mixture-of-Experts on GQA Self-Attention Paper • 2606.20945 • Published 10 days ago • 75
FrontiersMind/Nandi-Mini-V1.1-600M-Intermediate-Checkpoint-400GT Text Generation • 0.6B • Updated 28 days ago • 365 • 8
FrontiersMind/Nandi-Mini-V1.1-600M-Early-Checkpoint-250GT Text Generation • 0.6B • Updated May 27 • 280 • 11
view article Article Alyah ⭐️: Toward Robust Evaluation of Emirati Dialect Capabilities in Arabic LLMs tiiuae • Jan 27 • 26
Running 46 Falcon-H1-Tiny: A series of extremely small, yet powerful language models redefining capabilities at small scale 📝 46 Generate text using extremely small yet powerful language models
Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers Paper • 2601.04890 • Published Jan 8 • 44
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding Paper • 2506.16035 • Published Jun 19, 2025 • 89
ZClip: Adaptive Spike Mitigation for LLM Pre-Training Paper • 2504.02507 • Published Apr 3, 2025 • 90
view article Article Introducing smolagents: simple agents that write actions in code. +1 m-ric, merve, thomwolf • Dec 31, 2024 • 1.2k
ZClip: Adaptive Spike Mitigation for LLM Pre-Training Paper • 2504.02507 • Published Apr 3, 2025 • 90
Variance Control via Weight Rescaling in LLM Pre-training Paper • 2503.17500 • Published Mar 21, 2025 • 5