view article Article Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers +5 Sep 11, 2025 • 185
view article Article Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training +3 Aug 8, 2025 • 93
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning Paper • 2510.25992 • Published Oct 29, 2025 • 48
view article Article Ultra-Long Sequence Parallelism: Ulysses + Ring-Attention Technical Principles and Implementation Sep 16, 2025 • 18
Towards General Agentic Intelligence via Environment Scaling Paper • 2509.13311 • Published Sep 16, 2025 • 72
SuperBPE Collection SuperBPE tokenizers and models trained with them • 8 items • Updated 12 days ago • 17
💧 LFM2 Collection LFM2 is a new generation of hybrid models, designed for on-device deployment. • 28 items • Updated 12 days ago • 146
Hybrid Linear Attention Research Collection All 1.3B & 340M hybrid linear-attention experiments. • 62 items • Updated Sep 11, 2025 • 13
Avey 1 Research Preview Collection 1.5B preview models trained on 100B tokens of FineWeb, and an instruct-tuned version on smoltalk. • 3 items • Updated Jun 16, 2025 • 6
V-JEPA 2 Collection A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated Jun 13, 2025 • 196
Falcon-H1 Collection Falcon-H1 Family of Hybrid-Head Language Models (Transformer-SSM), including 0.5B, 1.5B, 1.5B-Deep, 3B, 7B, and 34B (pretrained & instruction-tuned). • 33 items • Updated 12 days ago • 59
Kimina Prover Preview Collection State-of-the-Art Models for Formal Mathematical Reasoning • 5 items • Updated Apr 28, 2025 • 33