Spaces:

build-small-hackathon
/

FitCheck

Running on Zero

File size: 4,276 Bytes

c2e1718
 
c97ad08
 
c2e1718
 
c97ad08
c2e1718
12d2e34
c2e1718
c97ad08
 
12d2e34
 
c2e1718
 
12d2e34
ebf621f
 
 
12d2e34
 
c97ad08
 
ebf621f
 
c97ad08
ebf621f
 
 
c97ad08
ebf621f
c97ad08
ebf621f
 
 
 
 
 
 
 
 
 
 
 
ee8ca43
 
 
 
 
 
 
 
 
 
 
ebf621f
 
c97ad08
ebf621f
c97ad08
ebf621f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c97ad08
 
 
 
 
 
 
 
 
 
ebf621f

---
title: FitCheck
emoji: ✅
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 6.16.0
app_file: app.py
python_version: "3.12"
pinned: false
license: mit
short_description: Honest, plain answers about what AI your computer can run
models:
  - nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16
---

<!--
ZeroGPU is selected in the Space's Settings (the README can't set it). The
model brick (/api/ask) only loads the LLM when SPACES_ZERO_GPU is set, so
local `python app.py` stays instant.
-->

# FitCheck

**What AI can your computer actually run?** And the other way round: **what
computer do you need for the AI you want to run?**

Tell FitCheck about your machine in plain words. It answers honestly — real
models, real memory figures, real licenses, real copy-paste commands — from
chatbots to object detection, image generation, speech, and robotics.

## Why it's trustworthy

- **A deterministic engine does the math, not an AI.** Verdicts come from a
  transparent rules engine over `catalogue.json` — 83 real models verified
  against the Hugging Face API. Nothing in the verdict can be hallucinated.
- **Model sizes are exact.** For GGUF models the weights figure is the actual
  file size in bytes from the Hub — not a params-times-bits estimate. Chat
  memory uses each model's real architecture (GQA-aware), and every estimate
  includes a 0.58 GB safety buffer (the 95% load-success margin fitted from
  ~19,500 community measurements).
- **Provenance on every number.** The UI says whether a figure is an exact
  file size, a vendor-published number, community-reported, or estimated.
- **Licenses up front.** AGPL, non-commercial, and gated models are labelled
  on every card — before you build your project on one.
- **Speed estimates with receipts, not vibes.** For LLMs, FitCheck predicts
  decode tokens/sec from your memory bandwidth (decode is bandwidth-bound) and
  shows where your machine lands among **real community benchmark runs**
  ([LocalScore](https://www.localscore.ai)) on an interactive roofline chart.
  A learned predictor — following IBM's
  [LLM-Pilot methodology](https://arxiv.org/abs/2410.02425) (gradient boosting
  over hardware features, validated leave-one-accelerator-out) — replaces the
  analytical estimate **only if it beats it on hardware it never saw**;
  otherwise the labelled baseline ships. Vision and diffusion models are
  compute-bound, not bandwidth-bound, so they honestly keep memory verdicts
  only rather than fake speed numbers.
- **Conservative by design.** Three plain bands (Runs great / Tight, but works
  / Won't fit) that would rather under-promise than over-promise.

## What's inside

1. **The catalogue** — `scripts/curation.json` (hand-picked models across
   LLM, vision-language, vision, image/video generation, speech, music,
   embeddings, forecasting) enriched by `scripts/refresh_catalogue.py` from
   public Hub endpoints into `catalogue.json`. Refreshed nightly; baked in at
   build time so the running app is fully offline.
2. **The engine** (`engine/`) — pure Python memory math and honest banding.
   Also answers the reverse question: minimum vs comfortable hardware tiers
   for a goal ("Help me pick one" mode).
3. **The model brick** (`model_brick.py`) — NVIDIA Nemotron 3 Nano 4B running
   in-Space on ZeroGPU (hybrid Mamba-2, accelerated by prebuilt hub kernels),
   explaining the engine's numbers in plain words. It never does the math; if
   it states a figure that isn't in the engine's facts, the gate logs it.
4. **The frontend** (`static/`) — hand-built HTML/CSS/JS, no framework, served
   by Gradio server mode (`gr.Server`). Optional extra: paste any Hugging Face
   model id and FitCheck walks its finetune/quantized lineage to a known base
   ("if the base runs, your finetune runs") — the one clearly-labelled online
   feature.

## Run it locally

```
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
python app.py
```

Open http://127.0.0.1:7860/ (add `?go` for an instant sample result). Locally
the explainer reports the model isn't loaded (it only loads on the Space) —
everything else works fully offline.

Built for the [Build Small hackathon](https://huggingface.co/build-small-hackathon)
(Backyard AI track).