Spaces:

atharv6f
/

llm-inference-profiler

Runtime error

App Files Files Community

llm-inference-profiler / README.md

a0y0346

Fix: Use Gradio 5.12.0 + pyaudioop for Python 3.13

2c154e8 3 months ago

preview code

raw

history blame contribute delete

1.83 kB

A newer version of the Gradio SDK is available: 6.13.0

Upgrade

metadata

title: LLM Inference Profiler
emoji: ⚡
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.12.0
app_file: app.py
pinned: false
license: mit
short_description: Interactive calculator for LLM inference performance

LLM Inference Profiler

An interactive educational tool for understanding LLM inference performance. Explore how model size, GPU specs, and workload characteristics affect the prefill and decode phases.

Features

Time Analysis: See how long prefill and decode take, and why decode dominates
GPU Utilization: Understand why prefill achieves 50-70% utilization while decode is often <5%
Arithmetic Intensity: Visualize the compute-bound vs memory-bound nature of each phase
KV Cache Growth: Watch how memory usage grows during generation
Waste Factor: See how much work the KV cache saves

Key Concepts Demonstrated

Prefill Phase: Processes all prompt tokens in parallel (compute-bound)
Decode Phase: Generates tokens one at a time (memory-bound)
KV Cache: Trades memory for compute by storing Key/Value vectors
Arithmetic Intensity: The ratio that determines if you're compute or memory limited

Based On

This tool accompanies the "Foundations of LLM Inference" article series, which covers:

The Autoregressive Loop and Redundancy Problem
The KV Cache
Prefill and Decode Phases
Why Prefill is Compute-Bound
Why Decode is Memory-Bound
The Utilization Paradox
Optimization Strategies

Usage

Select a model (LLaMA-7B, 13B, 70B, etc.)
Choose a GPU (A100, H100, T4, etc.)
Set precision (FP16, INT8, INT4)
Adjust prompt and generation lengths
Experiment with batch size to see its effect on decode

The tool will show you timing breakdowns, utilization metrics, and interactive visualizations.