view article Article Custom Kernels for All from Codex and Claude +2 burtenshaw, sayakpaul, ariG23498, evalstate • Feb 13 • 75
view article Article 混合专家模型(MoE)详解 +4 osanseviero, lewtun, philschmid, smangrul, ybelkada, pcuenq • Dec 11, 2023 • 82
view article Article Mixture of Experts (MoEs) in Transformers +5 ariG23498, pcuenq, merve, IlyasMoutawwakil, ArthurZ, sergiopaniego, Molbap • Feb 26 • 159
view article Article Scaling Mixture of Experts: Architecture Search for Billion-Parameter Language Models kshitijthakkar • Feb 9 • 1
view article Article Community Evals: Because we're done trusting black-box leaderboards over the community +5 burtenshaw, SaylorTwift, kramp, merve, davanstrien, nielsr, julien-c • Feb 4 • 89
view article Article We Got Claude to Build CUDA Kernels and teach open models! +2 burtenshaw, evalstate, merve, pcuenq • Jan 28 • 156
view article Article Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective LinkedIn • Jan 27 • 74
view article Article Architectural Choices in China's Open-Source AI Ecosystem: Building Beyond DeepSeek huggingface • Jan 27 • 45
view article Article 来自OpenAI gpt-oss的技巧,你🫵在transformers中也可以使用 +5 ariG23498, sergiopaniego, reach-vb, pcuenq, ArthurZ, SaylorTwift, cyrilvallez • Sep 11, 2025 • 14
view article Article 用开源模型强化你的 OCR 工作流 +5 merve, ariG23498, davanstrien, hynky, andito, reach-vb, pcuenq • Oct 21, 2025 • 14