FitCheck / README.md
cn0303's picture
Speed predictions with receipts: bandwidth roofline, real-runs chart, honest provenance
ee8ca43 verified
---
title: FitCheck
emoji:
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 6.16.0
app_file: app.py
python_version: "3.12"
pinned: false
license: mit
short_description: Honest, plain answers about what AI your computer can run
models:
- nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16
---
<!--
ZeroGPU is selected in the Space's Settings (the README can't set it). The
model brick (/api/ask) only loads the LLM when SPACES_ZERO_GPU is set, so
local `python app.py` stays instant.
-->
# FitCheck
**What AI can your computer actually run?** And the other way round: **what
computer do you need for the AI you want to run?**
Tell FitCheck about your machine in plain words. It answers honestly — real
models, real memory figures, real licenses, real copy-paste commands — from
chatbots to object detection, image generation, speech, and robotics.
## Why it's trustworthy
- **A deterministic engine does the math, not an AI.** Verdicts come from a
transparent rules engine over `catalogue.json` — 83 real models verified
against the Hugging Face API. Nothing in the verdict can be hallucinated.
- **Model sizes are exact.** For GGUF models the weights figure is the actual
file size in bytes from the Hub — not a params-times-bits estimate. Chat
memory uses each model's real architecture (GQA-aware), and every estimate
includes a 0.58 GB safety buffer (the 95% load-success margin fitted from
~19,500 community measurements).
- **Provenance on every number.** The UI says whether a figure is an exact
file size, a vendor-published number, community-reported, or estimated.
- **Licenses up front.** AGPL, non-commercial, and gated models are labelled
on every card — before you build your project on one.
- **Speed estimates with receipts, not vibes.** For LLMs, FitCheck predicts
decode tokens/sec from your memory bandwidth (decode is bandwidth-bound) and
shows where your machine lands among **real community benchmark runs**
([LocalScore](https://www.localscore.ai)) on an interactive roofline chart.
A learned predictor — following IBM's
[LLM-Pilot methodology](https://arxiv.org/abs/2410.02425) (gradient boosting
over hardware features, validated leave-one-accelerator-out) — replaces the
analytical estimate **only if it beats it on hardware it never saw**;
otherwise the labelled baseline ships. Vision and diffusion models are
compute-bound, not bandwidth-bound, so they honestly keep memory verdicts
only rather than fake speed numbers.
- **Conservative by design.** Three plain bands (Runs great / Tight, but works
/ Won't fit) that would rather under-promise than over-promise.
## What's inside
1. **The catalogue**`scripts/curation.json` (hand-picked models across
LLM, vision-language, vision, image/video generation, speech, music,
embeddings, forecasting) enriched by `scripts/refresh_catalogue.py` from
public Hub endpoints into `catalogue.json`. Refreshed nightly; baked in at
build time so the running app is fully offline.
2. **The engine** (`engine/`) — pure Python memory math and honest banding.
Also answers the reverse question: minimum vs comfortable hardware tiers
for a goal ("Help me pick one" mode).
3. **The model brick** (`model_brick.py`) — NVIDIA Nemotron 3 Nano 4B running
in-Space on ZeroGPU (hybrid Mamba-2, accelerated by prebuilt hub kernels),
explaining the engine's numbers in plain words. It never does the math; if
it states a figure that isn't in the engine's facts, the gate logs it.
4. **The frontend** (`static/`) — hand-built HTML/CSS/JS, no framework, served
by Gradio server mode (`gr.Server`). Optional extra: paste any Hugging Face
model id and FitCheck walks its finetune/quantized lineage to a known base
("if the base runs, your finetune runs") — the one clearly-labelled online
feature.
## Run it locally
```
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
python app.py
```
Open http://127.0.0.1:7860/ (add `?go` for an instant sample result). Locally
the explainer reports the model isn't loaded (it only loads on the Space) —
everything else works fully offline.
Built for the [Build Small hackathon](https://huggingface.co/build-small-hackathon)
(Backyard AI track).