Spaces:

build-small-hackathon
/

vision-base

Sleeping

App Files Files Community

vision-base / README.md

SPP

docs: add hackathon README with tags, prizes, and submission metadata

2a54827 18 days ago

preview code

Raw

History Blame Contribute Delete

3.92 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

metadata

title: Vision Base
emoji: 👁️
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 6.18.0
python_version: '3.13'
app_file: app.py
pinned: false
tags:
  - track:backyard
  - sponsor:openbmb
  - achievement:tiny-titan
  - achievement:best-demo
  - build-small-hackathon
  - minicpm
  - openbmb
  - computer-vision
  - zerogpu
  - gradio

👁️ Vision Base

Four practical vision tools in one app — powered by a 1.3 B parameter model.

Live demo: build-small-hackathon/vision-base

What it does

Most people don't carry a nutritionist, a chef, a technician, and a fortune teller in their pocket. Vision Base puts all four there — and does it with a model smaller than most podcast files.

Tool	What you snap	What you get
🔍 Allergen Lens	A food label	Every allergen flagged, dietary status, prep instructions, and a personal safety verdict
🍽️ Fridge Dinner	Your open fridge	Three dinner ideas using only what's visible, plus a "use soon" warning list
🔮 Object Oracle	Anything at all	A tarot-style mystical reading of that object's hidden essence
🛠️ What's That Error?	An appliance error screen	The fault code decoded, root cause explained, and step-by-step fix — no manual needed

Why it matters

Every one of these is a real problem people reach for their phone to solve. Checking a label with an allergy is stressful. Staring at an error code at 10 PM is frustrating. Staring at a half-empty fridge every evening is a genuine daily dilemma. Vision Base solves all four in under 5 seconds — no API keys, no subscription, no data leaving your session.

The model: MiniCPM-V 4.6

1.3 billion parameters — qualifies for the Tiny Titan badge (≤ 4B)
Built by OpenBMB — qualifies for the OpenBMB sponsor prize
Runs on ZeroGPU — accessible to anyone, no GPU required from the user
Handles JSON-structured extraction, multi-image input, and free-form creative generation — all in the same weights

The model loads once at startup, runs via @spaces.GPU(duration=120), and offloads to CPU between calls to be a good ZeroGPU citizen.

Tech stack

openbmb/MiniCPM-V-4.6   ← vision backbone (1.3B params, bfloat16)
Gradio 6.18              ← UI framework
Hugging Face ZeroGPU     ← A100 GPU on demand
spaces + transformers    ← inference wiring

Custom UI details: dark gradient hero header, per-tab color theming, shimmer skeleton loading animation, slide-up reveal on output, styled result cards — all in pure Gradio CSS and gr.HTML.

Demo video

🎬 [TODO: paste your demo video URL here — YouTube, Loom, or HF Video]

(Record a 60–90 second walkthrough: snap a food label, open fridge, mystery object, and appliance error — show all four tools in action)

Social post

📣 [TODO: paste your social media post URL here — X/Twitter, LinkedIn, etc.]

(Mention #BuildSmall, link the Space, and show a screenshot or short clip of the app working)

Tracks & prizes this submission targets

Category	Why we qualify
Track: Backyard AI	Four practical everyday tools solving real user problems
OpenBMB Sponsor Prize	Core model is `openbmb/MiniCPM-V-4.6` — OpenBMB's own multimodal vision model
Tiny Titan Badge	Entire app runs on 1.3B parameters — well under the 4B cap
Best Demo Badge	Polished four-in-one app with video and social storytelling

Running locally

git clone https://huggingface.co/spaces/build-small-hackathon/vision-base
cd vision-base
uv venv .venv && source .venv/bin/activate
uv pip install -r requirements.txt
python app.py

Requires a CUDA GPU locally. On the Space, ZeroGPU handles GPU allocation automatically.