FitCheck / README.md
cn0303's picture
Speed predictions with receipts: bandwidth roofline, real-runs chart, honest provenance
ee8ca43 verified

A newer version of the Gradio SDK is available: 6.17.3

Upgrade
metadata
title: FitCheck
emoji: 
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 6.16.0
app_file: app.py
python_version: '3.12'
pinned: false
license: mit
short_description: Honest, plain answers about what AI your computer can run
models:
  - nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16

FitCheck

What AI can your computer actually run? And the other way round: what computer do you need for the AI you want to run?

Tell FitCheck about your machine in plain words. It answers honestly — real models, real memory figures, real licenses, real copy-paste commands — from chatbots to object detection, image generation, speech, and robotics.

Why it's trustworthy

  • A deterministic engine does the math, not an AI. Verdicts come from a transparent rules engine over catalogue.json — 83 real models verified against the Hugging Face API. Nothing in the verdict can be hallucinated.
  • Model sizes are exact. For GGUF models the weights figure is the actual file size in bytes from the Hub — not a params-times-bits estimate. Chat memory uses each model's real architecture (GQA-aware), and every estimate includes a 0.58 GB safety buffer (the 95% load-success margin fitted from ~19,500 community measurements).
  • Provenance on every number. The UI says whether a figure is an exact file size, a vendor-published number, community-reported, or estimated.
  • Licenses up front. AGPL, non-commercial, and gated models are labelled on every card — before you build your project on one.
  • Speed estimates with receipts, not vibes. For LLMs, FitCheck predicts decode tokens/sec from your memory bandwidth (decode is bandwidth-bound) and shows where your machine lands among real community benchmark runs (LocalScore) on an interactive roofline chart. A learned predictor — following IBM's LLM-Pilot methodology (gradient boosting over hardware features, validated leave-one-accelerator-out) — replaces the analytical estimate only if it beats it on hardware it never saw; otherwise the labelled baseline ships. Vision and diffusion models are compute-bound, not bandwidth-bound, so they honestly keep memory verdicts only rather than fake speed numbers.
  • Conservative by design. Three plain bands (Runs great / Tight, but works / Won't fit) that would rather under-promise than over-promise.

What's inside

  1. The cataloguescripts/curation.json (hand-picked models across LLM, vision-language, vision, image/video generation, speech, music, embeddings, forecasting) enriched by scripts/refresh_catalogue.py from public Hub endpoints into catalogue.json. Refreshed nightly; baked in at build time so the running app is fully offline.
  2. The engine (engine/) — pure Python memory math and honest banding. Also answers the reverse question: minimum vs comfortable hardware tiers for a goal ("Help me pick one" mode).
  3. The model brick (model_brick.py) — NVIDIA Nemotron 3 Nano 4B running in-Space on ZeroGPU (hybrid Mamba-2, accelerated by prebuilt hub kernels), explaining the engine's numbers in plain words. It never does the math; if it states a figure that isn't in the engine's facts, the gate logs it.
  4. The frontend (static/) — hand-built HTML/CSS/JS, no framework, served by Gradio server mode (gr.Server). Optional extra: paste any Hugging Face model id and FitCheck walks its finetune/quantized lineage to a known base ("if the base runs, your finetune runs") — the one clearly-labelled online feature.

Run it locally

python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
python app.py

Open http://127.0.0.1:7860/ (add ?go for an instant sample result). Locally the explainer reports the model isn't loaded (it only loads on the Space) — everything else works fully offline.

Built for the Build Small hackathon (Backyard AI track).