view article Article Profiling in PyTorch (Part 2): From nn.Linear to a Fused MLP +3 ariG23498, ror, sergiopaniego, pcuenq, sayakpaul • 24 days ago • 50
view article Article Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler +3 ariG23498, sayakpaul, sergiopaniego, ror, pcuenq • May 29 • 132
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency not-lain • Jan 30, 2025 • 361
view article Article Tensor Parallelism (TP) in Transformers: 5 Minutes to Understand qgallouedec • Dec 4, 2025 • 72
MoBA: Mixture of Block Attention for Long-Context LLMs Paper • 2502.13189 • Published Feb 18, 2025 • 19
MedGemma Release Collection Collection of Gemma 3 variants for performance on medical text and image comprehension to accelerate building healthcare-based AI applications. • 9 items • Updated Mar 12 • 510