Spaces:

build-small-hackathon
/

Cook_with_a_LLM

Running on Zero

App Files Files Community

Cook_with_a_LLM / README.md

Fred1e4

README: add demo video link (#8)

3b5d6e3 about 4 hours ago

preview code

raw

history blame contribute delete

4.42 kB

A newer version of the Gradio SDK is available: 6.18.0

Upgrade

metadata

title: Cook With A LLM
emoji: 🍲
colorFrom: red
colorTo: yellow
sdk: gradio
sdk_version: 6.15.2
python_version: '3.12'
app_file: app.py
pinned: false
license: apache-2.0
tags:
  - backyard-ai
  - well-tuned
  - off-brand
  - sharing-is-caring
  - field-notes

🍲 Cook With Me — Multimodal Sous-Chef

Snap your fridge. Pick a dish. Cook step by step. Check your progress with a photo.

A closed-loop multimodal cooking assistant built for the Hugging Face Small Models / Big Adventures Hackathon (June 2026).

Contributors

eldinosaur - Carlos Castañeda Mora
Fred1e4 - Fredin Vazquez

🔗 Links

🎥 Demo video: https://youtube.com/shorts/c3PikNvKAjQ
📱 Social post: https://www.instagram.com/fd_albert14/p/DZnz-oaGorr/
🤗 Live Space: https://huggingface.co/spaces/build-small-hackathon/Cook_with_a_LLM
🧠 Fine-tuned planner: https://huggingface.co/eldinosaur/cook-with-me-planner-8b
📊 SFT dataset: https://huggingface.co/datasets/eldinosaur/cook-with-me-recipes-sft

How it works

📸 Fridge photo  ──▶  [Vision Agent]          identify ingredients
                            │
                            ▼
                      [Recipe Planner]         propose 3 dishes → full recipe JSON
                            │
                            ▼
                      [Nutrition Engine]       per-serving macros (lookup, no hallucination)
                            │
                            ▼
📸 Progress photo ──▶  [Progress Validator]    go / wait / fix verdict

Snap your fridge or pantry — the fine-tuned vision model identifies every ingredient.
Pick one of three AI-suggested dishes tailored to what you have.
Cook step by step with a generated recipe and per-serving nutrition info.
Check your progress by uploading a photo of your pan — the model tells you go, wait, or fix.

Models

Role	Model	Params	Runtime
Vision — ingredients + progress validation	`openbmb/MiniCPM-V-4.6` (fine-tuned)	~4.6B	`transformers` / ZeroGPU
Recipe planner — dishes + recipe JSON	`openbmb/MiniCPM4.1-8B` → `eldinosaur/cook-with-me-planner-8b` (fine-tuned)	~8B	Modal (transformers 4.x)
Step illustrator — per-step images	`FLUX.2-klein-9B` (SDXL-Turbo fallback)	~9B	Modal (L4)

Total: ~21.6B parameters (≤ 32B cap ✓)

Two models are fine-tuned: the vision model on fridge/pantry photos for ingredient detection, and the planner on 2,046 recipe pairs for reliable recipe-JSON generation. The planner and illustrator run on dedicated Modal GPU endpoints (the planner needs transformers 4.x while the vision model needs 5.x, so they live in separate containers).

Badges targeted

Badge	Status	How
🎯 Well-Tuned	✓	Two fine-tuned models on Hub: MiniCPM-V-4.6 (ingredient detection) + MiniCPM4.1-8B (recipe planner, SFT on 2,046 pairs)
🎨 Off-Brand	✓	Custom recipe-card UI with bespoke CSS components (chips, dish cards, step cards, nutrition pills)
📡 Sharing is Caring	✓	Agent traces shared on Hub
📓 Field Notes	✓	Blog post: "Building a closed-loop visual cooking coach"

Architecture highlights

Specialized small models, one pipeline: a fine-tuned vision model for ingredients/progress, a separately fine-tuned 8B planner for recipe JSON, and a diffusion model for step images — each on the runtime it needs (ZeroGPU + Modal endpoints).
Closed-loop visual validation: the planner writes the steps → the illustrator renders each step → user cooks → the vision model compares the pan photo and returns go / wait / fix — a real agent loop, not a wrapper.
Hallucination-free nutrition: macros come from a lookup table, not LLM arithmetic.
Robust JSON extraction: multi-strategy parser handles markdown fences, single quotes, and trailing commas so generation failures degrade gracefully.

Track

Chapter One — Backyard AI · "Build something for someone you actually know."

Submission for the Hugging Face Hackathon · June 5–15, 2026.