view article Article What’s MXFP4? The 4-Bit Secret Powering OpenAI’s GPT‑OSS Models on Modest Hardware RakshitAralimatti • Aug 8, 2025 • 34
view article Article Transformers v5: Simple model definitions powering the AI ecosystem +2 lysandre, ArthurZ, cyrilvallez, reach-vb • Dec 1, 2025 • 310
view article Article Announcing New Hugging Face and KerasHub integration ariG23498 • Jul 10, 2024 • 5
MathArena Outputs Collection Outputs of models on the MathArena Benchmark. • 30 items • Updated 8 days ago • 1
view article Article Ultra-Long Sequence Parallelism: Ulysses + Ring-Attention Technical Principles and Implementation exploding-gradients • Sep 16, 2025 • 20
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published Oct 13, 2025 • 182
INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats Paper • 2510.25602 • Published Oct 29, 2025 • 80
view article Article Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training +3 smohammadi, siro1, winglian, marcsun13, djsaunde • Aug 8, 2025 • 98
view article Article Make LLM Fine-tuning 2x faster with Unsloth and 🤗 TRL danielhanchen • Jan 10, 2024 • 76
view article Article Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers +5 ariG23498, sergiopaniego, reach-vb, pcuenq, ArthurZ, SaylorTwift, cyrilvallez • Sep 11, 2025 • 187
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models Paper • 2410.07985 • Published Oct 10, 2024 • 32
MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning Paper • 2507.16812 • Published Jul 22, 2025 • 64
view article Article OpenReasoning-Nemotron: A Family of State-of-the-Art Distilled Reasoning Models nvidia • Jul 18, 2025 • 50
OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique Paper • 2507.09075 • Published Jul 11, 2025 • 19
view article Article Ettin Suite: SoTA Paired Encoders and Decoders +4 orionweller, kdricci, mmarone, NohTow, dlawrie, vandurme • Jul 16, 2025 • 80