xiaosuhu1986's picture
Add batch size calculator
9052fea verified

A newer version of the Gradio SDK is available: 6.13.0

Upgrade
metadata
title: LLM Cost, Capacity, Latency & Batch Sizer
emoji: 🧮
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.31.0
app_file: app.py
pinned: false

LLM Cost, Capacity, Latency & Batch Sizer

Tabs:

  1. Cost & Capacity – Managed API vs GPU costs (busy-time vs scheduled uptime; set 24 h/day for always-on).
  2. Latency Estimator – prefill + decode + overhead, scaled by Queue/Burst factor for p95.
  3. Batch Size Calculator – computes theoretical & recommended safe batch from VRAM and KV-cache math.

KV cache rule: KV ≈ 2 × hidden_size × bytes/elem × layers × seq_len × batch_size

Use KV precision 4/8/16 bits, and reserve headroom to avoid OOMs.