Spaces:
Running on Zero
Running on Zero
| title: FitCheck | |
| emoji: ✅ | |
| colorFrom: indigo | |
| colorTo: green | |
| sdk: gradio | |
| sdk_version: 6.16.0 | |
| app_file: app.py | |
| python_version: "3.12" | |
| pinned: false | |
| license: mit | |
| short_description: Honest, plain answers about what AI your computer can run | |
| models: | |
| - nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16 | |
| <!-- | |
| ZeroGPU is selected in the Space's Settings (the README can't set it). The | |
| model brick (/api/ask) only loads the LLM when SPACES_ZERO_GPU is set, so | |
| local `python app.py` stays instant. | |
| --> | |
| # FitCheck | |
| **What AI can your computer actually run?** And the other way round: **what | |
| computer do you need for the AI you want to run?** | |
| Tell FitCheck about your machine in plain words. It answers honestly — real | |
| models, real memory figures, real licenses, real copy-paste commands — from | |
| chatbots to object detection, image generation, speech, and robotics. | |
| ## Why it's trustworthy | |
| - **A deterministic engine does the math, not an AI.** Verdicts come from a | |
| transparent rules engine over `catalogue.json` — 83 real models verified | |
| against the Hugging Face API. Nothing in the verdict can be hallucinated. | |
| - **Model sizes are exact.** For GGUF models the weights figure is the actual | |
| file size in bytes from the Hub — not a params-times-bits estimate. Chat | |
| memory uses each model's real architecture (GQA-aware), and every estimate | |
| includes a 0.58 GB safety buffer (the 95% load-success margin fitted from | |
| ~19,500 community measurements). | |
| - **Provenance on every number.** The UI says whether a figure is an exact | |
| file size, a vendor-published number, community-reported, or estimated. | |
| - **Licenses up front.** AGPL, non-commercial, and gated models are labelled | |
| on every card — before you build your project on one. | |
| - **Speed estimates with receipts, not vibes.** For LLMs, FitCheck predicts | |
| decode tokens/sec from your memory bandwidth (decode is bandwidth-bound) and | |
| shows where your machine lands among **real community benchmark runs** | |
| ([LocalScore](https://www.localscore.ai)) on an interactive roofline chart. | |
| A learned predictor — following IBM's | |
| [LLM-Pilot methodology](https://arxiv.org/abs/2410.02425) (gradient boosting | |
| over hardware features, validated leave-one-accelerator-out) — replaces the | |
| analytical estimate **only if it beats it on hardware it never saw**; | |
| otherwise the labelled baseline ships. Vision and diffusion models are | |
| compute-bound, not bandwidth-bound, so they honestly keep memory verdicts | |
| only rather than fake speed numbers. | |
| - **Conservative by design.** Three plain bands (Runs great / Tight, but works | |
| / Won't fit) that would rather under-promise than over-promise. | |
| ## What's inside | |
| 1. **The catalogue** — `scripts/curation.json` (hand-picked models across | |
| LLM, vision-language, vision, image/video generation, speech, music, | |
| embeddings, forecasting) enriched by `scripts/refresh_catalogue.py` from | |
| public Hub endpoints into `catalogue.json`. Refreshed nightly; baked in at | |
| build time so the running app is fully offline. | |
| 2. **The engine** (`engine/`) — pure Python memory math and honest banding. | |
| Also answers the reverse question: minimum vs comfortable hardware tiers | |
| for a goal ("Help me pick one" mode). | |
| 3. **The model brick** (`model_brick.py`) — NVIDIA Nemotron 3 Nano 4B running | |
| in-Space on ZeroGPU (hybrid Mamba-2, accelerated by prebuilt hub kernels), | |
| explaining the engine's numbers in plain words. It never does the math; if | |
| it states a figure that isn't in the engine's facts, the gate logs it. | |
| 4. **The frontend** (`static/`) — hand-built HTML/CSS/JS, no framework, served | |
| by Gradio server mode (`gr.Server`). Optional extra: paste any Hugging Face | |
| model id and FitCheck walks its finetune/quantized lineage to a known base | |
| ("if the base runs, your finetune runs") — the one clearly-labelled online | |
| feature. | |
| ## Run it locally | |
| ``` | |
| python -m venv .venv | |
| .venv\Scripts\activate | |
| pip install -r requirements.txt | |
| python app.py | |
| ``` | |
| Open http://127.0.0.1:7860/ (add `?go` for an instant sample result). Locally | |
| the explainer reports the model isn't loaded (it only loads on the Space) — | |
| everything else works fully offline. | |
| Built for the [Build Small hackathon](https://huggingface.co/build-small-hackathon) | |
| (Backyard AI track). | |