Spaces:

build-small-hackathon
/

FitCheck

Running on Zero

App Files Files Community

FitCheck / README.md

cn0303

Speed predictions with receipts: bandwidth roofline, real-runs chart, honest provenance

ee8ca43 verified about 23 hours ago

preview code

raw

history blame contribute delete

4.28 kB

	---
	title: FitCheck
	emoji: ✅
	colorFrom: indigo
	colorTo: green
	sdk: gradio
	sdk_version: 6.16.0
	app_file: app.py
	python_version: "3.12"
	pinned: false
	license: mit
	short_description: Honest, plain answers about what AI your computer can run
	models:
	- nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16
	---

	<!--
	ZeroGPU is selected in the Space's Settings (the README can't set it). The
	model brick (/api/ask) only loads the LLM when SPACES_ZERO_GPU is set, so
	local `python app.py` stays instant.
	-->

	# FitCheck

	What AI can your computer actually run? And the other way round: **what
	computer do you need for the AI you want to run?**

	Tell FitCheck about your machine in plain words. It answers honestly — real
	models, real memory figures, real licenses, real copy-paste commands — from
	chatbots to object detection, image generation, speech, and robotics.

	## Why it's trustworthy

	- A deterministic engine does the math, not an AI. Verdicts come from a
	transparent rules engine over `catalogue.json` — 83 real models verified
	against the Hugging Face API. Nothing in the verdict can be hallucinated.
	- Model sizes are exact. For GGUF models the weights figure is the actual
	file size in bytes from the Hub — not a params-times-bits estimate. Chat
	memory uses each model's real architecture (GQA-aware), and every estimate
	includes a 0.58 GB safety buffer (the 95% load-success margin fitted from
	~19,500 community measurements).
	- Provenance on every number. The UI says whether a figure is an exact
	file size, a vendor-published number, community-reported, or estimated.
	- Licenses up front. AGPL, non-commercial, and gated models are labelled
	on every card — before you build your project on one.
	- Speed estimates with receipts, not vibes. For LLMs, FitCheck predicts
	decode tokens/sec from your memory bandwidth (decode is bandwidth-bound) and
	shows where your machine lands among real community benchmark runs
	([LocalScore](https://www.localscore.ai)) on an interactive roofline chart.
	A learned predictor — following IBM's
	[LLM-Pilot methodology](https://arxiv.org/abs/2410.02425) (gradient boosting
	over hardware features, validated leave-one-accelerator-out) — replaces the
	analytical estimate only if it beats it on hardware it never saw;
	otherwise the labelled baseline ships. Vision and diffusion models are
	compute-bound, not bandwidth-bound, so they honestly keep memory verdicts
	only rather than fake speed numbers.
	- Conservative by design. Three plain bands (Runs great / Tight, but works
	/ Won't fit) that would rather under-promise than over-promise.

	## What's inside

	1. The catalogue — `scripts/curation.json` (hand-picked models across
	LLM, vision-language, vision, image/video generation, speech, music,
	embeddings, forecasting) enriched by `scripts/refresh_catalogue.py` from
	public Hub endpoints into `catalogue.json`. Refreshed nightly; baked in at
	build time so the running app is fully offline.
	2. The engine (`engine/`) — pure Python memory math and honest banding.
	Also answers the reverse question: minimum vs comfortable hardware tiers
	for a goal ("Help me pick one" mode).
	3. The model brick (`model_brick.py`) — NVIDIA Nemotron 3 Nano 4B running
	in-Space on ZeroGPU (hybrid Mamba-2, accelerated by prebuilt hub kernels),
	explaining the engine's numbers in plain words. It never does the math; if
	it states a figure that isn't in the engine's facts, the gate logs it.
	4. The frontend (`static/`) — hand-built HTML/CSS/JS, no framework, served
	by Gradio server mode (`gr.Server`). Optional extra: paste any Hugging Face
	model id and FitCheck walks its finetune/quantized lineage to a known base
	("if the base runs, your finetune runs") — the one clearly-labelled online
	feature.

	## Run it locally

	```
	python -m venv .venv
	.venv\Scripts\activate
	pip install -r requirements.txt
	python app.py
	```

	Open http://127.0.0.1:7860/ (add `?go` for an instant sample result). Locally
	the explainer reports the model isn't loaded (it only loads on the Space) —
	everything else works fully offline.

	Built for the [Build Small hackathon](https://huggingface.co/build-small-hackathon)
	(Backyard AI track).