Spaces:
Running on Zero
Running on Zero
File size: 4,415 Bytes
850fee1 3b5d6e3 850fee1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 | ---
title: Cook With A LLM
emoji: π²
colorFrom: red
colorTo: yellow
sdk: gradio
sdk_version: 6.15.2
python_version: '3.12'
app_file: app.py
pinned: false
license: apache-2.0
tags:
- backyard-ai
- well-tuned
- off-brand
- sharing-is-caring
- field-notes
---
# π² Cook With Me β Multimodal Sous-Chef
> *Snap your fridge. Pick a dish. Cook step by step. Check your progress with a photo.*
A closed-loop multimodal cooking assistant built for the **Hugging Face Small Models / Big Adventures Hackathon (June 2026)**.
---
# Contributors
1. **eldinosaur** - Carlos CastaΓ±eda Mora
1. **Fred1e4** - Fredin Vazquez
---
## π Links
- π₯ **Demo video:** https://youtube.com/shorts/c3PikNvKAjQ
- π± **Social post:** https://www.instagram.com/fd_albert14/p/DZnz-oaGorr/
- π€ **Live Space:** https://huggingface.co/spaces/build-small-hackathon/Cook_with_a_LLM
- π§ **Fine-tuned planner:** https://huggingface.co/eldinosaur/cook-with-me-planner-8b
- π **SFT dataset:** https://huggingface.co/datasets/eldinosaur/cook-with-me-recipes-sft
---
## How it works
```
πΈ Fridge photo βββΆ [Vision Agent] identify ingredients
β
βΌ
[Recipe Planner] propose 3 dishes β full recipe JSON
β
βΌ
[Nutrition Engine] per-serving macros (lookup, no hallucination)
β
βΌ
πΈ Progress photo βββΆ [Progress Validator] go / wait / fix verdict
```
1. **Snap** your fridge or pantry β the fine-tuned vision model identifies every ingredient.
2. **Pick** one of three AI-suggested dishes tailored to what you have.
3. **Cook** step by step with a generated recipe and per-serving nutrition info.
4. **Check** your progress by uploading a photo of your pan β the model tells you *go*, *wait*, or *fix*.
---
## Models
| Role | Model | Params | Runtime |
|---|---|---|---|
| Vision β ingredients + progress validation | `openbmb/MiniCPM-V-4.6` (fine-tuned) | ~4.6B | `transformers` / ZeroGPU |
| Recipe planner β dishes + recipe JSON | `openbmb/MiniCPM4.1-8B` β [`eldinosaur/cook-with-me-planner-8b`](https://huggingface.co/eldinosaur/cook-with-me-planner-8b) (fine-tuned) | ~8B | Modal (transformers 4.x) |
| Step illustrator β per-step images | `FLUX.2-klein-9B` (SDXL-Turbo fallback) | ~9B | Modal (L4) |
**Total: ~21.6B parameters** (β€ 32B cap β)
**Two models are fine-tuned:** the vision model on fridge/pantry photos for ingredient
detection, and the planner on **2,046 recipe pairs** for reliable recipe-JSON generation.
The planner and illustrator run on dedicated **Modal** GPU endpoints (the planner needs
`transformers` 4.x while the vision model needs 5.x, so they live in separate containers).
---
## Badges targeted
| Badge | Status | How |
|---|---|---|
| π― Well-Tuned | β | **Two** fine-tuned models on Hub: MiniCPM-V-4.6 (ingredient detection) + MiniCPM4.1-8B (recipe planner, SFT on 2,046 pairs) |
| π¨ Off-Brand | β | Custom recipe-card UI with bespoke CSS components (chips, dish cards, step cards, nutrition pills) |
| π‘ Sharing is Caring | β | Agent traces shared on Hub |
| π Field Notes | β | Blog post: "Building a closed-loop visual cooking coach" |
---
## Architecture highlights
- **Specialized small models, one pipeline:** a fine-tuned vision model for ingredients/progress, a separately fine-tuned 8B planner for recipe JSON, and a diffusion model for step images β each on the runtime it needs (ZeroGPU + Modal endpoints).
- **Closed-loop visual validation:** the planner writes the steps β the illustrator renders each step β user cooks β the vision model compares the pan photo and returns *go / wait / fix* β a real agent loop, not a wrapper.
- **Hallucination-free nutrition:** macros come from a lookup table, not LLM arithmetic.
- **Robust JSON extraction:** multi-strategy parser handles markdown fences, single quotes, and trailing commas so generation failures degrade gracefully.
---
## Track
**Chapter One β Backyard AI** Β· "Build something for someone you actually know."
Submission for the Hugging Face Hackathon Β· June 5β15, 2026.
|