A newer version of the Gradio SDK is available: 6.13.0
metadata
title: LLM Cost, Capacity, Latency & Batch Sizer
emoji: 🧮
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.31.0
app_file: app.py
pinned: false
LLM Cost, Capacity, Latency & Batch Sizer
Tabs:
- Cost & Capacity – Managed API vs GPU costs (busy-time vs scheduled uptime; set 24 h/day for always-on).
- Latency Estimator – prefill + decode + overhead, scaled by Queue/Burst factor for p95.
- Batch Size Calculator – computes theoretical & recommended safe batch from VRAM and KV-cache math.
KV cache rule: KV ≈ 2 × hidden_size × bytes/elem × layers × seq_len × batch_size
Use KV precision 4/8/16 bits, and reserve headroom to avoid OOMs.