view article Article Mixture of Experts (MoEs) in Transformers +5 ariG23498, pcuenq, merve, IlyasMoutawwakil, ArthurZ, sergiopaniego, Molbap • Feb 26 • 169
Does Your Reasoning Model Implicitly Know When to Stop Thinking? Paper • 2602.08354 • Published Feb 9 • 266
Discovering Multiagent Learning Algorithms with Large Language Models Paper • 2602.16928 • Published Feb 18 • 17
LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding Paper • 2512.16229 • Published Dec 18, 2025 • 17
view article Article What’s MXFP4? The 4-Bit Secret Powering OpenAI’s GPT‑OSS Models on Modest Hardware RakshitAralimatti • Aug 8, 2025 • 36
view article Article Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance tiiuae • May 21, 2025 • 39
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30, 2024 • 81
Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache Paper • 2506.11886 • Published Jun 13, 2025 • 20
view article Article CodeAgents + Structure: A Better Way to Execute Actions akseljoonas, m-ric • May 28, 2025 • 82
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models Paper • 2504.07951 • Published Apr 10, 2025 • 32
view article Article Open R1: How to use OlympicCoder locally for coding +3 burtenshaw, reach-vb, lewtun, edbeeching, yagilb • Mar 20, 2025 • 63
Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published Dec 9, 2024 • 95
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling Paper • 2407.21787 • Published Jul 31, 2024 • 14
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning Paper • 2410.02884 • Published Oct 3, 2024 • 54
view article Article Llama can now see and run on your device - welcome Llama 3.2 +5 merve, philschmid, osanseviero, reach-vb, lewtun, ariG23498, pcuenq • Sep 25, 2024 • 191
view article Article Fine-tuning LLMs to 1.58bit: extreme quantization made easy +4 medmekk, marcsun13, lvwerra, pcuenq, osanseviero, thomwolf • Sep 18, 2024 • 281
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding Paper • 2408.11049 • Published Aug 20, 2024 • 14