File size: 4,415 Bytes
850fee1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3b5d6e3
850fee1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
---

title: Cook With A LLM
emoji: 🍲
colorFrom: red
colorTo: yellow
sdk: gradio
sdk_version: 6.15.2
python_version: '3.12'
app_file: app.py
pinned: false
license: apache-2.0
tags:
  - backyard-ai
  - well-tuned
  - off-brand
  - sharing-is-caring
  - field-notes
---


# 🍲 Cook With Me β€” Multimodal Sous-Chef

> *Snap your fridge. Pick a dish. Cook step by step. Check your progress with a photo.*

A closed-loop multimodal cooking assistant built for the **Hugging Face Small Models / Big Adventures Hackathon (June 2026)**.

---

# Contributors

1. **eldinosaur** - Carlos CastaΓ±eda Mora
1. **Fred1e4** - Fredin Vazquez

---

## πŸ”— Links

- πŸŽ₯ **Demo video:** https://youtube.com/shorts/c3PikNvKAjQ
- πŸ“± **Social post:** https://www.instagram.com/fd_albert14/p/DZnz-oaGorr/

- πŸ€— **Live Space:** https://huggingface.co/spaces/build-small-hackathon/Cook_with_a_LLM
- 🧠 **Fine-tuned planner:** https://huggingface.co/eldinosaur/cook-with-me-planner-8b
- πŸ“Š **SFT dataset:** https://huggingface.co/datasets/eldinosaur/cook-with-me-recipes-sft

---

## How it works

```

πŸ“Έ Fridge photo  ──▢  [Vision Agent]          identify ingredients

                            β”‚

                            β–Ό

                      [Recipe Planner]         propose 3 dishes β†’ full recipe JSON

                            β”‚

                            β–Ό

                      [Nutrition Engine]       per-serving macros (lookup, no hallucination)

                            β”‚

                            β–Ό

πŸ“Έ Progress photo ──▢  [Progress Validator]    go / wait / fix verdict

```

1. **Snap** your fridge or pantry β€” the fine-tuned vision model identifies every ingredient.
2. **Pick** one of three AI-suggested dishes tailored to what you have.
3. **Cook** step by step with a generated recipe and per-serving nutrition info.
4. **Check** your progress by uploading a photo of your pan β€” the model tells you *go*, *wait*, or *fix*.

---

## Models

| Role | Model | Params | Runtime |
|---|---|---|---|
| Vision β€” ingredients + progress validation | `openbmb/MiniCPM-V-4.6` (fine-tuned) | ~4.6B | `transformers` / ZeroGPU |
| Recipe planner β€” dishes + recipe JSON | `openbmb/MiniCPM4.1-8B` β†’ [`eldinosaur/cook-with-me-planner-8b`](https://huggingface.co/eldinosaur/cook-with-me-planner-8b) (fine-tuned) | ~8B | Modal (transformers 4.x) |
| Step illustrator β€” per-step images | `FLUX.2-klein-9B` (SDXL-Turbo fallback) | ~9B | Modal (L4) |

**Total: ~21.6B parameters** (≀ 32B cap βœ“)

**Two models are fine-tuned:** the vision model on fridge/pantry photos for ingredient
detection, and the planner on **2,046 recipe pairs** for reliable recipe-JSON generation.
The planner and illustrator run on dedicated **Modal** GPU endpoints (the planner needs
`transformers` 4.x while the vision model needs 5.x, so they live in separate containers).

---

## Badges targeted

| Badge | Status | How |
|---|---|---|
| 🎯 Well-Tuned | βœ“ | **Two** fine-tuned models on Hub: MiniCPM-V-4.6 (ingredient detection) + MiniCPM4.1-8B (recipe planner, SFT on 2,046 pairs) |
| 🎨 Off-Brand | βœ“ | Custom recipe-card UI with bespoke CSS components (chips, dish cards, step cards, nutrition pills) |
| πŸ“‘ Sharing is Caring | βœ“ | Agent traces shared on Hub |
| πŸ““ Field Notes | βœ“ | Blog post: "Building a closed-loop visual cooking coach" |

---

## Architecture highlights

- **Specialized small models, one pipeline:** a fine-tuned vision model for ingredients/progress, a separately fine-tuned 8B planner for recipe JSON, and a diffusion model for step images β€” each on the runtime it needs (ZeroGPU + Modal endpoints).
- **Closed-loop visual validation:** the planner writes the steps β†’ the illustrator renders each step β†’ user cooks β†’ the vision model compares the pan photo and returns *go / wait / fix* β€” a real agent loop, not a wrapper.
- **Hallucination-free nutrition:** macros come from a lookup table, not LLM arithmetic.
- **Robust JSON extraction:** multi-strategy parser handles markdown fences, single quotes, and trailing commas so generation failures degrade gracefully.

---

## Track

**Chapter One β€” Backyard AI** Β· "Build something for someone you actually know."

Submission for the Hugging Face Hackathon Β· June 5–15, 2026.