vision-base / README.md
SPP
docs: add hackathon README with tags, prizes, and submission metadata
2a54827
|
Raw
History Blame Contribute Delete
3.92 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade
metadata
title: Vision Base
emoji: ๐Ÿ‘๏ธ
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 6.18.0
python_version: '3.13'
app_file: app.py
pinned: false
tags:
  - track:backyard
  - sponsor:openbmb
  - achievement:tiny-titan
  - achievement:best-demo
  - build-small-hackathon
  - minicpm
  - openbmb
  - computer-vision
  - zerogpu
  - gradio

๐Ÿ‘๏ธ Vision Base

Four practical vision tools in one app โ€” powered by a 1.3 B parameter model.

Live demo: build-small-hackathon/vision-base


What it does

Most people don't carry a nutritionist, a chef, a technician, and a fortune teller in their pocket. Vision Base puts all four there โ€” and does it with a model smaller than most podcast files.

Tool What you snap What you get
๐Ÿ” Allergen Lens A food label Every allergen flagged, dietary status, prep instructions, and a personal safety verdict
๐Ÿฝ๏ธ Fridge Dinner Your open fridge Three dinner ideas using only what's visible, plus a "use soon" warning list
๐Ÿ”ฎ Object Oracle Anything at all A tarot-style mystical reading of that object's hidden essence
๐Ÿ› ๏ธ What's That Error? An appliance error screen The fault code decoded, root cause explained, and step-by-step fix โ€” no manual needed

Why it matters

Every one of these is a real problem people reach for their phone to solve. Checking a label with an allergy is stressful. Staring at an error code at 10 PM is frustrating. Staring at a half-empty fridge every evening is a genuine daily dilemma. Vision Base solves all four in under 5 seconds โ€” no API keys, no subscription, no data leaving your session.


The model: MiniCPM-V 4.6

  • 1.3 billion parameters โ€” qualifies for the Tiny Titan badge (โ‰ค 4B)
  • Built by OpenBMB โ€” qualifies for the OpenBMB sponsor prize
  • Runs on ZeroGPU โ€” accessible to anyone, no GPU required from the user
  • Handles JSON-structured extraction, multi-image input, and free-form creative generation โ€” all in the same weights

The model loads once at startup, runs via @spaces.GPU(duration=120), and offloads to CPU between calls to be a good ZeroGPU citizen.


Tech stack

openbmb/MiniCPM-V-4.6   โ† vision backbone (1.3B params, bfloat16)
Gradio 6.18              โ† UI framework
Hugging Face ZeroGPU     โ† A100 GPU on demand
spaces + transformers    โ† inference wiring

Custom UI details: dark gradient hero header, per-tab color theming, shimmer skeleton loading animation, slide-up reveal on output, styled result cards โ€” all in pure Gradio CSS and gr.HTML.


Demo video

๐ŸŽฌ [TODO: paste your demo video URL here โ€” YouTube, Loom, or HF Video]

(Record a 60โ€“90 second walkthrough: snap a food label, open fridge, mystery object, and appliance error โ€” show all four tools in action)


Social post

๐Ÿ“ฃ [TODO: paste your social media post URL here โ€” X/Twitter, LinkedIn, etc.]

(Mention #BuildSmall, link the Space, and show a screenshot or short clip of the app working)


Tracks & prizes this submission targets

Category Why we qualify
Track: Backyard AI Four practical everyday tools solving real user problems
OpenBMB Sponsor Prize Core model is openbmb/MiniCPM-V-4.6 โ€” OpenBMB's own multimodal vision model
Tiny Titan Badge Entire app runs on 1.3B parameters โ€” well under the 4B cap
Best Demo Badge Polished four-in-one app with video and social storytelling

Running locally

git clone https://huggingface.co/spaces/build-small-hackathon/vision-base
cd vision-base
uv venv .venv && source .venv/bin/activate
uv pip install -r requirements.txt
python app.py

Requires a CUDA GPU locally. On the Space, ZeroGPU handles GPU allocation automatically.