·
AI & ML interests
LLMs (Inference Primarily)
Organizations
view article 2. Attention Optimizations: From Standard Attention to FlashAttention
view article 2.2c: FlashAttention — IO Analysis and Evolution
view article 1.7: Optimization Landscape — Phase-Specific Strategies
view article 1.6: The Utilization Paradox — Quantitative Analysis
view article 1.3: The Two Phases Defined — Prefill and Decode
view article 1.2: The KV Cache — How It Eliminates Redundancy
view article 1.1: The Autoregressive Loop and the Redundancy Problem - LLM Inference