--- title: Vision Base emoji: 👁️ colorFrom: indigo colorTo: purple sdk: gradio sdk_version: 6.18.0 python_version: '3.13' app_file: app.py pinned: false tags: - track:backyard - sponsor:openbmb - achievement:tiny-titan - achievement:best-demo - build-small-hackathon - minicpm - openbmb - computer-vision - zerogpu - gradio --- # 👁️ Vision Base > **Four practical vision tools in one app — powered by a 1.3 B parameter model.** Live demo: [build-small-hackathon/vision-base](https://huggingface.co/spaces/build-small-hackathon/vision-base) --- ## What it does Most people don't carry a nutritionist, a chef, a technician, and a fortune teller in their pocket. Vision Base puts all four there — and does it with a model smaller than most podcast files. | Tool | What you snap | What you get | |---|---|---| | 🔍 **Allergen Lens** | A food label | Every allergen flagged, dietary status, prep instructions, and a personal safety verdict | | 🍽️ **Fridge Dinner** | Your open fridge | Three dinner ideas using only what's visible, plus a "use soon" warning list | | 🔮 **Object Oracle** | Anything at all | A tarot-style mystical reading of that object's hidden essence | | 🛠️ **What's That Error?** | An appliance error screen | The fault code decoded, root cause explained, and step-by-step fix — no manual needed | --- ## Why it matters Every one of these is a real problem people reach for their phone to solve. Checking a label with an allergy is stressful. Staring at an error code at 10 PM is frustrating. Staring at a half-empty fridge every evening is a genuine daily dilemma. Vision Base solves all four in under 5 seconds — no API keys, no subscription, no data leaving your session. --- ## The model: MiniCPM-V 4.6 - **1.3 billion parameters** — qualifies for the Tiny Titan badge (≤ 4B) - Built by **OpenBMB** — qualifies for the OpenBMB sponsor prize - Runs on **ZeroGPU** — accessible to anyone, no GPU required from the user - Handles JSON-structured extraction, multi-image input, and free-form creative generation — all in the same weights The model loads once at startup, runs via `@spaces.GPU(duration=120)`, and offloads to CPU between calls to be a good ZeroGPU citizen. --- ## Tech stack ``` openbmb/MiniCPM-V-4.6 ← vision backbone (1.3B params, bfloat16) Gradio 6.18 ← UI framework Hugging Face ZeroGPU ← A100 GPU on demand spaces + transformers ← inference wiring ``` Custom UI details: dark gradient hero header, per-tab color theming, shimmer skeleton loading animation, slide-up reveal on output, styled result cards — all in pure Gradio CSS and `gr.HTML`. --- ## Demo video 🎬 **[TODO: paste your demo video URL here — YouTube, Loom, or HF Video]** *(Record a 60–90 second walkthrough: snap a food label, open fridge, mystery object, and appliance error — show all four tools in action)* --- ## Social post 📣 **[TODO: paste your social media post URL here — X/Twitter, LinkedIn, etc.]** *(Mention `#BuildSmall`, link the Space, and show a screenshot or short clip of the app working)* --- ## Tracks & prizes this submission targets | Category | Why we qualify | |---|---| | **Track: Backyard AI** | Four practical everyday tools solving real user problems | | **OpenBMB Sponsor Prize** | Core model is `openbmb/MiniCPM-V-4.6` — OpenBMB's own multimodal vision model | | **Tiny Titan Badge** | Entire app runs on 1.3B parameters — well under the 4B cap | | **Best Demo Badge** | Polished four-in-one app with video and social storytelling | --- ## Running locally ```bash git clone https://huggingface.co/spaces/build-small-hackathon/vision-base cd vision-base uv venv .venv && source .venv/bin/activate uv pip install -r requirements.txt python app.py ``` > Requires a CUDA GPU locally. On the Space, ZeroGPU handles GPU allocation automatically.