Spaces:

xiaosuhu1986
/

DocTrek-LLM-cost-estimation

Sleeping

App Files Files Community

DocTrek-LLM-cost-estimation / README.md

xiaosuhu1986

Add batch size calculator

9052fea verified 8 months ago

preview code

raw

history blame contribute delete

696 Bytes

A newer version of the Gradio SDK is available: 6.13.0

Upgrade

metadata

title: LLM Cost, Capacity, Latency & Batch Sizer
emoji: 🧮
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.31.0
app_file: app.py
pinned: false

LLM Cost, Capacity, Latency & Batch Sizer

Tabs:

Cost & Capacity – Managed API vs GPU costs (busy-time vs scheduled uptime; set 24 h/day for always-on).
Latency Estimator – prefill + decode + overhead, scaled by Queue/Burst factor for p95.
Batch Size Calculator – computes theoretical & recommended safe batch from VRAM and KV-cache math.

KV cache rule: KV ≈ 2 × hidden_size × bytes/elem × layers × seq_len × batch_size

Use KV precision 4/8/16 bits, and reserve headroom to avoid OOMs.