Spaces:
Running on Zero
A newer version of the Gradio SDK is available: 6.18.0
title: Cook With A LLM
emoji: π²
colorFrom: red
colorTo: yellow
sdk: gradio
sdk_version: 6.15.2
python_version: '3.12'
app_file: app.py
pinned: false
license: apache-2.0
tags:
- backyard-ai
- well-tuned
- off-brand
- sharing-is-caring
- field-notes
π² Cook With Me β Multimodal Sous-Chef
Snap your fridge. Pick a dish. Cook step by step. Check your progress with a photo.
A closed-loop multimodal cooking assistant built for the Hugging Face Small Models / Big Adventures Hackathon (June 2026).
Contributors
- eldinosaur - Carlos CastaΓ±eda Mora
- Fred1e4 - Fredin Vazquez
π Links
- π₯ Demo video: https://youtube.com/shorts/c3PikNvKAjQ
- π± Social post: https://www.instagram.com/fd_albert14/p/DZnz-oaGorr/
- π€ Live Space: https://huggingface.co/spaces/build-small-hackathon/Cook_with_a_LLM
- π§ Fine-tuned planner: https://huggingface.co/eldinosaur/cook-with-me-planner-8b
- π SFT dataset: https://huggingface.co/datasets/eldinosaur/cook-with-me-recipes-sft
How it works
πΈ Fridge photo βββΆ [Vision Agent] identify ingredients
β
βΌ
[Recipe Planner] propose 3 dishes β full recipe JSON
β
βΌ
[Nutrition Engine] per-serving macros (lookup, no hallucination)
β
βΌ
πΈ Progress photo βββΆ [Progress Validator] go / wait / fix verdict
- Snap your fridge or pantry β the fine-tuned vision model identifies every ingredient.
- Pick one of three AI-suggested dishes tailored to what you have.
- Cook step by step with a generated recipe and per-serving nutrition info.
- Check your progress by uploading a photo of your pan β the model tells you go, wait, or fix.
Models
| Role | Model | Params | Runtime |
|---|---|---|---|
| Vision β ingredients + progress validation | openbmb/MiniCPM-V-4.6 (fine-tuned) |
~4.6B | transformers / ZeroGPU |
| Recipe planner β dishes + recipe JSON | openbmb/MiniCPM4.1-8B β eldinosaur/cook-with-me-planner-8b (fine-tuned) |
~8B | Modal (transformers 4.x) |
| Step illustrator β per-step images | FLUX.2-klein-9B (SDXL-Turbo fallback) |
~9B | Modal (L4) |
Total: ~21.6B parameters (β€ 32B cap β)
Two models are fine-tuned: the vision model on fridge/pantry photos for ingredient
detection, and the planner on 2,046 recipe pairs for reliable recipe-JSON generation.
The planner and illustrator run on dedicated Modal GPU endpoints (the planner needs
transformers 4.x while the vision model needs 5.x, so they live in separate containers).
Badges targeted
| Badge | Status | How |
|---|---|---|
| π― Well-Tuned | β | Two fine-tuned models on Hub: MiniCPM-V-4.6 (ingredient detection) + MiniCPM4.1-8B (recipe planner, SFT on 2,046 pairs) |
| π¨ Off-Brand | β | Custom recipe-card UI with bespoke CSS components (chips, dish cards, step cards, nutrition pills) |
| π‘ Sharing is Caring | β | Agent traces shared on Hub |
| π Field Notes | β | Blog post: "Building a closed-loop visual cooking coach" |
Architecture highlights
- Specialized small models, one pipeline: a fine-tuned vision model for ingredients/progress, a separately fine-tuned 8B planner for recipe JSON, and a diffusion model for step images β each on the runtime it needs (ZeroGPU + Modal endpoints).
- Closed-loop visual validation: the planner writes the steps β the illustrator renders each step β user cooks β the vision model compares the pan photo and returns go / wait / fix β a real agent loop, not a wrapper.
- Hallucination-free nutrition: macros come from a lookup table, not LLM arithmetic.
- Robust JSON extraction: multi-strategy parser handles markdown fences, single quotes, and trailing commas so generation failures degrade gracefully.
Track
Chapter One β Backyard AI Β· "Build something for someone you actually know."
Submission for the Hugging Face Hackathon Β· June 5β15, 2026.