view article Article Transformers v5: Simple model definitions powering the AI ecosystem +2 Dec 1, 2025 • 310
deepseek-ai/DeepSeek-V3-0324 Text Generation • 685B • Updated Mar 27, 2025 • 619k • • 3.11k
Running 3.83k The Ultra-Scale Playbook 🌌 3.83k The ultimate guide to training LLM on large GPU Clusters
view article Article Finally, a Replacement for BERT: Introducing ModernBERT +13 Dec 19, 2024 • 740