Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.19.0
title: Vision Base
emoji: ๐๏ธ
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 6.18.0
python_version: '3.13'
app_file: app.py
pinned: false
tags:
- track:backyard
- sponsor:openbmb
- achievement:tiny-titan
- achievement:best-demo
- build-small-hackathon
- minicpm
- openbmb
- computer-vision
- zerogpu
- gradio
๐๏ธ Vision Base
Four practical vision tools in one app โ powered by a 1.3 B parameter model.
Live demo: build-small-hackathon/vision-base
What it does
Most people don't carry a nutritionist, a chef, a technician, and a fortune teller in their pocket. Vision Base puts all four there โ and does it with a model smaller than most podcast files.
| Tool | What you snap | What you get |
|---|---|---|
| ๐ Allergen Lens | A food label | Every allergen flagged, dietary status, prep instructions, and a personal safety verdict |
| ๐ฝ๏ธ Fridge Dinner | Your open fridge | Three dinner ideas using only what's visible, plus a "use soon" warning list |
| ๐ฎ Object Oracle | Anything at all | A tarot-style mystical reading of that object's hidden essence |
| ๐ ๏ธ What's That Error? | An appliance error screen | The fault code decoded, root cause explained, and step-by-step fix โ no manual needed |
Why it matters
Every one of these is a real problem people reach for their phone to solve. Checking a label with an allergy is stressful. Staring at an error code at 10 PM is frustrating. Staring at a half-empty fridge every evening is a genuine daily dilemma. Vision Base solves all four in under 5 seconds โ no API keys, no subscription, no data leaving your session.
The model: MiniCPM-V 4.6
- 1.3 billion parameters โ qualifies for the Tiny Titan badge (โค 4B)
- Built by OpenBMB โ qualifies for the OpenBMB sponsor prize
- Runs on ZeroGPU โ accessible to anyone, no GPU required from the user
- Handles JSON-structured extraction, multi-image input, and free-form creative generation โ all in the same weights
The model loads once at startup, runs via @spaces.GPU(duration=120), and offloads to CPU between calls to be a good ZeroGPU citizen.
Tech stack
openbmb/MiniCPM-V-4.6 โ vision backbone (1.3B params, bfloat16)
Gradio 6.18 โ UI framework
Hugging Face ZeroGPU โ A100 GPU on demand
spaces + transformers โ inference wiring
Custom UI details: dark gradient hero header, per-tab color theming, shimmer skeleton loading animation, slide-up reveal on output, styled result cards โ all in pure Gradio CSS and gr.HTML.
Demo video
๐ฌ [TODO: paste your demo video URL here โ YouTube, Loom, or HF Video]
(Record a 60โ90 second walkthrough: snap a food label, open fridge, mystery object, and appliance error โ show all four tools in action)
Social post
๐ฃ [TODO: paste your social media post URL here โ X/Twitter, LinkedIn, etc.]
(Mention #BuildSmall, link the Space, and show a screenshot or short clip of the app working)
Tracks & prizes this submission targets
| Category | Why we qualify |
|---|---|
| Track: Backyard AI | Four practical everyday tools solving real user problems |
| OpenBMB Sponsor Prize | Core model is openbmb/MiniCPM-V-4.6 โ OpenBMB's own multimodal vision model |
| Tiny Titan Badge | Entire app runs on 1.3B parameters โ well under the 4B cap |
| Best Demo Badge | Polished four-in-one app with video and social storytelling |
Running locally
git clone https://huggingface.co/spaces/build-small-hackathon/vision-base
cd vision-base
uv venv .venv && source .venv/bin/activate
uv pip install -r requirements.txt
python app.py
Requires a CUDA GPU locally. On the Space, ZeroGPU handles GPU allocation automatically.