Spaces:
Running on Zero
Running on Zero
File size: 4,276 Bytes
c2e1718 c97ad08 c2e1718 c97ad08 c2e1718 12d2e34 c2e1718 c97ad08 12d2e34 c2e1718 12d2e34 ebf621f 12d2e34 c97ad08 ebf621f c97ad08 ebf621f c97ad08 ebf621f c97ad08 ebf621f ee8ca43 ebf621f c97ad08 ebf621f c97ad08 ebf621f c97ad08 ebf621f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | ---
title: FitCheck
emoji: ✅
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 6.16.0
app_file: app.py
python_version: "3.12"
pinned: false
license: mit
short_description: Honest, plain answers about what AI your computer can run
models:
- nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16
---
<!--
ZeroGPU is selected in the Space's Settings (the README can't set it). The
model brick (/api/ask) only loads the LLM when SPACES_ZERO_GPU is set, so
local `python app.py` stays instant.
-->
# FitCheck
**What AI can your computer actually run?** And the other way round: **what
computer do you need for the AI you want to run?**
Tell FitCheck about your machine in plain words. It answers honestly — real
models, real memory figures, real licenses, real copy-paste commands — from
chatbots to object detection, image generation, speech, and robotics.
## Why it's trustworthy
- **A deterministic engine does the math, not an AI.** Verdicts come from a
transparent rules engine over `catalogue.json` — 83 real models verified
against the Hugging Face API. Nothing in the verdict can be hallucinated.
- **Model sizes are exact.** For GGUF models the weights figure is the actual
file size in bytes from the Hub — not a params-times-bits estimate. Chat
memory uses each model's real architecture (GQA-aware), and every estimate
includes a 0.58 GB safety buffer (the 95% load-success margin fitted from
~19,500 community measurements).
- **Provenance on every number.** The UI says whether a figure is an exact
file size, a vendor-published number, community-reported, or estimated.
- **Licenses up front.** AGPL, non-commercial, and gated models are labelled
on every card — before you build your project on one.
- **Speed estimates with receipts, not vibes.** For LLMs, FitCheck predicts
decode tokens/sec from your memory bandwidth (decode is bandwidth-bound) and
shows where your machine lands among **real community benchmark runs**
([LocalScore](https://www.localscore.ai)) on an interactive roofline chart.
A learned predictor — following IBM's
[LLM-Pilot methodology](https://arxiv.org/abs/2410.02425) (gradient boosting
over hardware features, validated leave-one-accelerator-out) — replaces the
analytical estimate **only if it beats it on hardware it never saw**;
otherwise the labelled baseline ships. Vision and diffusion models are
compute-bound, not bandwidth-bound, so they honestly keep memory verdicts
only rather than fake speed numbers.
- **Conservative by design.** Three plain bands (Runs great / Tight, but works
/ Won't fit) that would rather under-promise than over-promise.
## What's inside
1. **The catalogue** — `scripts/curation.json` (hand-picked models across
LLM, vision-language, vision, image/video generation, speech, music,
embeddings, forecasting) enriched by `scripts/refresh_catalogue.py` from
public Hub endpoints into `catalogue.json`. Refreshed nightly; baked in at
build time so the running app is fully offline.
2. **The engine** (`engine/`) — pure Python memory math and honest banding.
Also answers the reverse question: minimum vs comfortable hardware tiers
for a goal ("Help me pick one" mode).
3. **The model brick** (`model_brick.py`) — NVIDIA Nemotron 3 Nano 4B running
in-Space on ZeroGPU (hybrid Mamba-2, accelerated by prebuilt hub kernels),
explaining the engine's numbers in plain words. It never does the math; if
it states a figure that isn't in the engine's facts, the gate logs it.
4. **The frontend** (`static/`) — hand-built HTML/CSS/JS, no framework, served
by Gradio server mode (`gr.Server`). Optional extra: paste any Hugging Face
model id and FitCheck walks its finetune/quantized lineage to a known base
("if the base runs, your finetune runs") — the one clearly-labelled online
feature.
## Run it locally
```
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
python app.py
```
Open http://127.0.0.1:7860/ (add `?go` for an instant sample result). Locally
the explainer reports the model isn't loaded (it only loads on the Space) —
everything else works fully offline.
Built for the [Build Small hackathon](https://huggingface.co/build-small-hackathon)
(Backyard AI track).
|