Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published Oct 6, 2025 • 514
The Majority is not always right: RL training for solution aggregation Paper • 2509.06870 • Published Sep 8, 2025 • 15
view article Article Learn the Hugging Face Kernel Hub in 5 Minutes +5 drbh, danieldk, Narsil, pcuenq, pagezyhf, merve, reach-vb • Jun 12, 2025 • 164
view article Article Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers +5 ariG23498, sergiopaniego, reach-vb, pcuenq, ArthurZ, SaylorTwift, cyrilvallez • Sep 11, 2025 • 187
A Primer on the Inner Workings of Transformer-based Language Models Paper • 2405.00208 • Published Apr 30, 2024 • 12
Fantastic Pretraining Optimizers and Where to Find Them Paper • 2509.02046 • Published Sep 2, 2025 • 14
Deep Ignorance Collection This collection contains the model and data artifacts from O'Brien et al. (2025). https://deepignorance.ai • 40 items • Updated Mar 2 • 11
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models Paper • 2508.06471 • Published Aug 8, 2025 • 211
AirTrafficGen: Configurable Air Traffic Scenario Generation with Large Language Models Paper • 2508.02269 • Published Aug 4, 2025 • 1
Air Traffic Controller Task Demand via Graph Neural Networks: An Interpretable Approach to Airspace Complexity Paper • 2507.13423 • Published Jul 17, 2025 • 1
view article Article Welcome GPT OSS, the new open-source model family from OpenAI! +10 reach-vb, pcuenq, lewtun, clem, Rocketknight1, clefourrier, celinah, Wauplin, marcsun13, pagezyhf, ahadnagy, joaogante • Aug 5, 2025 • 513
view article Article You could have designed state of the art positional encoding FL33TW00D-HF • Nov 25, 2024 • 478
view article Article SmolLM3: smol, multilingual, long-context reasoner +21 eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf • Jul 8, 2025 • 773
view article Article Make LLM Fine-tuning 2x faster with Unsloth and 🤗 TRL danielhanchen • Jan 10, 2024 • 76
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level Paper • 2411.03562 • Published Nov 5, 2024 • 70