Spaces:
Running on Zero
Running on Zero
| title: Cook With A LLM | |
| emoji: π² | |
| colorFrom: red | |
| colorTo: yellow | |
| sdk: gradio | |
| sdk_version: 6.15.2 | |
| python_version: '3.12' | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| tags: | |
| - backyard-ai | |
| - well-tuned | |
| - off-brand | |
| - sharing-is-caring | |
| - field-notes | |
| # π² Cook With Me β Multimodal Sous-Chef | |
| > *Snap your fridge. Pick a dish. Cook step by step. Check your progress with a photo.* | |
| A closed-loop multimodal cooking assistant built for the **Hugging Face Small Models / Big Adventures Hackathon (June 2026)**. | |
| --- | |
| # Contributors | |
| 1. **eldinosaur** - Carlos CastaΓ±eda Mora | |
| 1. **Fred1e4** - Fredin Vazquez | |
| --- | |
| ## π Links | |
| - π₯ **Demo video:** https://youtube.com/shorts/c3PikNvKAjQ | |
| - π± **Social post:** https://www.instagram.com/fd_albert14/p/DZnz-oaGorr/ | |
| - π€ **Live Space:** https://huggingface.co/spaces/build-small-hackathon/Cook_with_a_LLM | |
| - π§ **Fine-tuned planner:** https://huggingface.co/eldinosaur/cook-with-me-planner-8b | |
| - π **SFT dataset:** https://huggingface.co/datasets/eldinosaur/cook-with-me-recipes-sft | |
| --- | |
| ## How it works | |
| ``` | |
| πΈ Fridge photo βββΆ [Vision Agent] identify ingredients | |
| β | |
| βΌ | |
| [Recipe Planner] propose 3 dishes β full recipe JSON | |
| β | |
| βΌ | |
| [Nutrition Engine] per-serving macros (lookup, no hallucination) | |
| β | |
| βΌ | |
| πΈ Progress photo βββΆ [Progress Validator] go / wait / fix verdict | |
| ``` | |
| 1. **Snap** your fridge or pantry β the fine-tuned vision model identifies every ingredient. | |
| 2. **Pick** one of three AI-suggested dishes tailored to what you have. | |
| 3. **Cook** step by step with a generated recipe and per-serving nutrition info. | |
| 4. **Check** your progress by uploading a photo of your pan β the model tells you *go*, *wait*, or *fix*. | |
| --- | |
| ## Models | |
| | Role | Model | Params | Runtime | | |
| |---|---|---|---| | |
| | Vision β ingredients + progress validation | `openbmb/MiniCPM-V-4.6` (fine-tuned) | ~4.6B | `transformers` / ZeroGPU | | |
| | Recipe planner β dishes + recipe JSON | `openbmb/MiniCPM4.1-8B` β [`eldinosaur/cook-with-me-planner-8b`](https://huggingface.co/eldinosaur/cook-with-me-planner-8b) (fine-tuned) | ~8B | Modal (transformers 4.x) | | |
| | Step illustrator β per-step images | `FLUX.2-klein-9B` (SDXL-Turbo fallback) | ~9B | Modal (L4) | | |
| **Total: ~21.6B parameters** (β€ 32B cap β) | |
| **Two models are fine-tuned:** the vision model on fridge/pantry photos for ingredient | |
| detection, and the planner on **2,046 recipe pairs** for reliable recipe-JSON generation. | |
| The planner and illustrator run on dedicated **Modal** GPU endpoints (the planner needs | |
| `transformers` 4.x while the vision model needs 5.x, so they live in separate containers). | |
| --- | |
| ## Badges targeted | |
| | Badge | Status | How | | |
| |---|---|---| | |
| | π― Well-Tuned | β | **Two** fine-tuned models on Hub: MiniCPM-V-4.6 (ingredient detection) + MiniCPM4.1-8B (recipe planner, SFT on 2,046 pairs) | | |
| | π¨ Off-Brand | β | Custom recipe-card UI with bespoke CSS components (chips, dish cards, step cards, nutrition pills) | | |
| | π‘ Sharing is Caring | β | Agent traces shared on Hub | | |
| | π Field Notes | β | Blog post: "Building a closed-loop visual cooking coach" | | |
| --- | |
| ## Architecture highlights | |
| - **Specialized small models, one pipeline:** a fine-tuned vision model for ingredients/progress, a separately fine-tuned 8B planner for recipe JSON, and a diffusion model for step images β each on the runtime it needs (ZeroGPU + Modal endpoints). | |
| - **Closed-loop visual validation:** the planner writes the steps β the illustrator renders each step β user cooks β the vision model compares the pan photo and returns *go / wait / fix* β a real agent loop, not a wrapper. | |
| - **Hallucination-free nutrition:** macros come from a lookup table, not LLM arithmetic. | |
| - **Robust JSON extraction:** multi-strategy parser handles markdown fences, single quotes, and trailing commas so generation failures degrade gracefully. | |
| --- | |
| ## Track | |
| **Chapter One β Backyard AI** Β· "Build something for someone you actually know." | |
| Submission for the Hugging Face Hackathon Β· June 5β15, 2026. | |