·
AI & ML interests
LLMs (Inference Primarily)
Recent Activity
Organizations
-
-
-
-
-
-
-
-
-
-
-
published an
article about 1 month ago view article 2. Attention Optimizations: From Standard Attention to FlashAttention
published an
article about 1 month ago view article 2.2c: FlashAttention — IO Analysis and Evolution
published an
article about 1 month ago published an
article about 1 month ago published an
article about 1 month ago published an
article about 1 month ago view article 1.7: Optimization Landscape — Phase-Specific Strategies
published an
article about 2 months ago view article 1.6: The Utilization Paradox — Quantitative Analysis
published an
article about 2 months ago published an
article about 2 months ago published an
article about 2 months ago view article 1.3: The Two Phases Defined — Prefill and Decode
published an
article about 2 months ago view article 1.2: The KV Cache — How It Eliminates Redundancy
published an
article about 2 months ago view article 1.1: The Autoregressive Loop and the Redundancy Problem - LLM Inference