Running 27 Weight-Space Geometry of Offline Reasoning Training 🧠27 Interactive weight-space geometry of six reasoning losses
HuggingFaceTB/SmolVLM2-2.2B-Instruct Image-Text-to-Text • 2B • Updated Apr 8, 2025 • 344k • 324
AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents Paper • 2602.06855 • Published Feb 6 • 83
view article Article CRAFT: Continuous Reasoning and Agentic Feedback Tuning flymy-ai • Feb 5 • 66
view article Article TFLOPS Gap: Why FP4 MoE Kernel Engineering Matters on Blackwell apsys • Jan 5 • 14
facebook/dinov3-convnext-base-pretrain-lvd1689m Image Feature Extraction • 87.6M • Updated Aug 19, 2025 • 5.56k • 18
GigaEvo: An Open Source Optimization Framework Powered By LLMs And Evolution Algorithms Paper • 2511.17592 • Published Nov 17, 2025 • 122
Unveiling Intrinsic Dimension of Texts: from Academic Abstract to Creative Story Paper • 2511.15210 • Published Nov 19, 2025 • 91
Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance Paper • 2511.13254 • Published Nov 17, 2025 • 140
view article Article SmolLM3: smol, multilingual, long-context reasoner +21 eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf • Jul 8, 2025 • 780
Quartet: Native FP4 Training Can Be Optimal for Large Language Models Paper • 2505.14669 • Published May 20, 2025 • 79