.gitignore DELETED
@@ -1,24 +0,0 @@
1
- # Python
2
- __pycache__/
3
- *.py[cod]
4
- *.egg-info/
5
- .venv/
6
- venv/
7
-
8
- # Generated data (SFT dataset lives on HF Hub: eldinosaur/cook-with-me-recipes-sft)
9
- data/*.parquet
10
- data/*.jsonl
11
- data/*.png
12
- data/*.npy
13
- data/*.csv
14
-
15
- # Local caches / model weights
16
- *.gguf
17
- .cache/
18
- assets/*.png
19
-
20
- # OS / editor
21
- .DS_Store
22
- Thumbs.db
23
- .idea/
24
- .vscode/
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md CHANGED
@@ -1,80 +1,13 @@
1
  ---
2
  title: Cook With A LLM
3
- emoji: 🍲
4
- colorFrom: red
5
- colorTo: yellow
6
  sdk: gradio
7
  sdk_version: 6.15.2
8
  python_version: '3.12'
9
  app_file: app.py
10
  pinned: false
11
- license: apache-2.0
12
  ---
13
 
14
- # 🍲 Cook With Me Multimodal Sous-Chef
15
-
16
- > *Snap your fridge. Pick a dish. Cook step by step. Check your progress with a photo.*
17
-
18
- A closed-loop multimodal cooking assistant built for the **Hugging Face Small Models / Big Adventures Hackathon (June 2026)**.
19
-
20
- ---
21
-
22
- ## How it works
23
-
24
- ```
25
- 📸 Fridge photo ──▶ [Vision Agent] identify ingredients
26
-
27
-
28
- [Recipe Planner] propose 3 dishes → full recipe JSON
29
-
30
-
31
- [Nutrition Engine] per-serving macros (lookup, no hallucination)
32
-
33
-
34
- 📸 Progress photo ──▶ [Progress Validator] go / wait / fix verdict
35
- ```
36
-
37
- 1. **Snap** your fridge or pantry — the fine-tuned vision model identifies every ingredient.
38
- 2. **Pick** one of three AI-suggested dishes tailored to what you have.
39
- 3. **Cook** step by step with a generated recipe and per-serving nutrition info.
40
- 4. **Check** your progress by uploading a photo of your pan — the model tells you *go*, *wait*, or *fix*.
41
-
42
- ---
43
-
44
- ## Models
45
-
46
- | Role | Model | Params | Runtime |
47
- |---|---|---|---|
48
- | Vision + Planner + Validator | `openbmb/MiniCPM-V-4.6` (fine-tuned) | ~4.6B | `transformers` / ZeroGPU |
49
-
50
- **Total: ~4.6B parameters** (≤ 32B cap ✓ — significant headroom)
51
-
52
- The ingredient-identification model is **fine-tuned** on fridge/pantry photos for higher precision.
53
-
54
- ---
55
-
56
- ## Badges targeted
57
-
58
- | Badge | Status | How |
59
- |---|---|---|
60
- | 🎯 Well-Tuned | ✓ | Fine-tuned MiniCPM-V-4.6 for ingredient detection, published to Hub |
61
- | 🎨 Off-Brand | ✓ | Recipe-card UI with custom CSS — Lora serif, warm parchment palette |
62
- | 📡 Sharing is Caring | ✓ | Agent traces shared on Hub |
63
- | 📓 Field Notes | ✓ | Blog post: "Building a closed-loop visual cooking coach" |
64
-
65
- ---
66
-
67
- ## Architecture highlights
68
-
69
- - **Single model, three roles:** MiniCPM-V-4.6 handles vision (ingredients + progress) *and* text planning (recipe JSON generation) — no redundant model downloads.
70
- - **Closed-loop visual validation:** Flux generates step targets → user cooks → vision model compares — a real agent loop, not a wrapper.
71
- - **Hallucination-free nutrition:** macros come from a lookup table, not LLM arithmetic.
72
- - **Robust JSON extraction:** multi-strategy parser handles markdown fences, single quotes, and trailing commas so generation failures degrade gracefully.
73
-
74
- ---
75
-
76
- ## Track
77
-
78
- **Chapter One — Backyard AI** · "Build something for someone you actually know."
79
-
80
- Submission for the Hugging Face Hackathon · June 5–15, 2026.
 
1
  ---
2
  title: Cook With A LLM
3
+ emoji: 🐠
4
+ colorFrom: pink
5
+ colorTo: pink
6
  sdk: gradio
7
  sdk_version: 6.15.2
8
  python_version: '3.12'
9
  app_file: app.py
10
  pinned: false
 
11
  ---
12
 
13
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Strategy/arquitectura.html DELETED
@@ -1,668 +0,0 @@
1
- <!DOCTYPE html>
2
- <html lang="es">
3
- <head>
4
- <meta charset="UTF-8" />
5
- <meta name="viewport" content="width=device-width, initial-scale=1.0" />
6
- <title>Cocina Conmigo — Plan visual del proyecto</title>
7
- <style>
8
- :root {
9
- --bg: #f5ecd9;
10
- --card: #fffbf0;
11
- --ink: #2b2018;
12
- --accent: #a85c2a; /* terracotta */
13
- --accent-soft: #f6dccc;
14
- --accent2: #6b4a2a;
15
- --gold: #c9962b;
16
- --green: #3f7a3a;
17
- --green-soft: #dbe9d8;
18
- --red: #b03a2e;
19
- --red-soft: #f4d6d2;
20
- --gray: #8a7e6f;
21
- --line: #d8c9ad;
22
- }
23
- * { box-sizing: border-box; }
24
- body {
25
- font-family: 'Inter', -apple-system, sans-serif;
26
- background: var(--bg);
27
- color: var(--ink);
28
- margin: 0;
29
- padding: 32px 16px 80px;
30
- line-height: 1.55;
31
- }
32
- .wrap { max-width: 1240px; margin: 0 auto; }
33
-
34
- h1 { font-family: 'Lora', Georgia, serif; font-size: 46px; margin: 0 0 4px;
35
- letter-spacing: -0.5px; font-weight: 700; }
36
- h1 em { color: var(--accent); font-style: italic; }
37
- .subtitle { color: var(--accent2); font-style: italic; margin-bottom: 28px; font-size: 17px; }
38
-
39
- h2 {
40
- margin-top: 56px; border-top: 1px dashed var(--line); padding-top: 24px;
41
- font-size: 26px; font-family: 'Lora', Georgia, serif; letter-spacing: 0.3px;
42
- }
43
- h2 .num {
44
- color: var(--accent); font-family: ui-monospace, monospace;
45
- font-size: 20px; margin-right: 10px;
46
- }
47
- h3 { font-size: 18px; margin-top: 28px; color: var(--accent2); font-family: 'Lora', Georgia, serif; }
48
-
49
- /* Hero */
50
- .hero {
51
- background: var(--card); border: 2px solid var(--ink); border-radius: 14px;
52
- padding: 30px 32px; display: grid; grid-template-columns: 1fr; gap: 18px;
53
- }
54
- @media(min-width: 760px){ .hero { grid-template-columns: 2fr 1fr; align-items: center; } }
55
- .hero h2 { border:0; margin:0 0 6px; padding:0; font-size: 22px; }
56
- .hero .quote {
57
- font-style: italic; font-size: 17px; color: var(--accent2);
58
- border-left: 3px solid var(--accent); padding-left: 14px; margin: 6px 0 0;
59
- }
60
- .hero .target {
61
- background: #fff3cf; border-radius: 12px; padding: 14px 16px;
62
- font-size: 13px; border: 1px solid var(--line); line-height: 1.55;
63
- }
64
- .hero .target strong { color: var(--accent); }
65
-
66
- /* Pills */
67
- .pill {
68
- display: inline-block; padding: 2px 9px; border-radius: 12px;
69
- color: white; font-size: 12px; margin: 2px 4px 2px 0; font-family: ui-monospace, monospace;
70
- }
71
- .pill.user { background: var(--gray); }
72
- .pill.gradio { background: var(--accent); }
73
- .pill.hf { background: var(--gold); }
74
- .pill.modal { background: var(--green); }
75
- .pill.flux { background: #111; }
76
- .pill.openbmb { background: #075e54; }
77
- .pill.cohere { background: #5e3aa3; }
78
- .pill.openai { background: #2c5e8a; }
79
- .pill.llama { background: #6a3d8a; }
80
-
81
- /* Phone/recipe card mockup */
82
- .phone-row {
83
- display: grid; grid-template-columns: 1fr; gap: 18px; margin-top: 16px;
84
- }
85
- @media(min-width: 760px){ .phone-row { grid-template-columns: repeat(4, 1fr); } }
86
- .phone {
87
- background: #111; border-radius: 24px; padding: 8px;
88
- box-shadow: 0 8px 22px rgba(0,0,0,0.18);
89
- }
90
- .phone .screen {
91
- background: #fffbf0; border-radius: 18px; overflow: hidden;
92
- height: 380px; display: flex; flex-direction: column;
93
- }
94
- .phone .topbar {
95
- background: var(--accent); color: white; padding: 10px 14px;
96
- font-size: 13px; font-family: 'Lora', serif;
97
- }
98
- .phone .body { padding: 12px; flex: 1; overflow-y: auto; font-size: 12px; }
99
- .phone .body .illu {
100
- width: 100%; aspect-ratio: 4/3; border-radius: 8px;
101
- background: linear-gradient(135deg, #ffd28b 0%, #c97a3e 100%);
102
- display: flex; align-items: center; justify-content: center;
103
- font-size: 48px; box-shadow: 0 2px 8px rgba(0,0,0,0.1); margin-bottom: 8px;
104
- }
105
- .phone .body p { margin: 6px 0; line-height: 1.5; }
106
- .phone .body .voice {
107
- background: var(--green-soft); border-radius: 6px; padding: 6px 10px;
108
- margin-top: 8px; font-size: 11px; color: var(--green);
109
- }
110
- .phone .body .tip {
111
- background: var(--red-soft); border-radius: 6px; padding: 6px 10px;
112
- margin-top: 6px; font-size: 11px; color: var(--red);
113
- }
114
- .scenario-label {
115
- text-align: center; font-size: 13px; color: var(--accent2);
116
- margin-top: 8px; font-style: italic;
117
- }
118
-
119
- /* SVG */
120
- svg { width: 100%; height: auto; display: block; }
121
- .node-box { fill: var(--card); stroke: var(--ink); stroke-width: 1.5; }
122
- .node-text { font-family: 'Inter', sans-serif; font-size: 14px; fill: var(--ink); }
123
- .node-title { font-weight: 700; font-size: 15px; }
124
- .node-sub { font-size: 11px; fill: var(--accent2); font-style: italic; }
125
- .arrow { stroke: var(--ink); stroke-width: 1.8; fill: none; }
126
- .arrow-label { font-size: 11px; fill: var(--accent2); font-family: ui-monospace, monospace; }
127
- .dashed { stroke-dasharray: 6 4; }
128
- .arrow-loop { stroke: var(--accent); stroke-width: 2.2; fill: none; }
129
-
130
- /* Cards */
131
- .grid-2 { display: grid; grid-template-columns: 1fr; gap: 18px; margin-top: 16px; }
132
- @media(min-width: 880px){ .grid-2 { grid-template-columns: 1fr 1fr; } }
133
- .grid-3 { display: grid; grid-template-columns: 1fr; gap: 14px; margin-top: 14px; }
134
- @media(min-width: 760px){ .grid-3 { grid-template-columns: repeat(3, 1fr); } }
135
-
136
- .card {
137
- background: var(--card); border: 1px solid var(--line);
138
- border-radius: 10px; padding: 18px 20px;
139
- }
140
- .card.pick { border: 2px solid var(--accent); }
141
- .pick-tag {
142
- display: inline-block; background: var(--accent); color: white;
143
- font-family: ui-monospace, monospace; font-size: 11px;
144
- padding: 1px 7px; border-radius: 10px; margin-bottom: 6px;
145
- }
146
-
147
- table {
148
- width: 100%; border-collapse: collapse; background: var(--card);
149
- border: 1px solid var(--line); margin-top: 14px; font-size: 14px;
150
- }
151
- th, td { padding: 8px 10px; text-align: left; border-bottom: 1px solid var(--line); vertical-align: top; }
152
- th { background: #efe4cb; font-size: 13px; letter-spacing: 0.5px; text-transform: uppercase; }
153
- code {
154
- background: #efe4cb; border-radius: 3px; padding: 1px 5px; font-size: 13px;
155
- }
156
-
157
- /* Forbidden zone */
158
- .forbidden {
159
- background: var(--red-soft); border: 1px solid var(--red);
160
- border-radius: 8px; padding: 14px 18px; margin-top: 14px;
161
- }
162
- .forbidden strong { color: var(--red); }
163
- .forbidden ul {
164
- columns: 2; column-gap: 28px; margin: 8px 0 0; padding-left: 18px; font-size: 14px;
165
- }
166
-
167
- /* Timeline */
168
- .timeline { position: relative; padding-left: 36px; margin-top: 20px; }
169
- .timeline::before {
170
- content: ""; position: absolute; left: 12px; top: 6px; bottom: 6px;
171
- width: 3px; background: var(--accent); border-radius: 2px;
172
- }
173
- .day {
174
- position: relative; margin-bottom: 14px; background: var(--card);
175
- border: 1px solid var(--line); border-radius: 8px; padding: 12px 16px;
176
- }
177
- .day::before {
178
- content: ""; position: absolute; left: -29px; top: 16px;
179
- width: 13px; height: 13px; background: var(--accent);
180
- border: 2px solid var(--card); border-radius: 50%;
181
- }
182
- .day .lbl {
183
- display: inline-block; background: var(--accent); color: white;
184
- font-family: ui-monospace, monospace; font-size: 11px;
185
- padding: 1px 7px; border-radius: 10px; margin-right: 8px;
186
- }
187
- .day strong { font-size: 15px; }
188
- .day .what { font-size: 13px; color: var(--accent2); margin-top: 2px; }
189
-
190
- /* Award rows */
191
- .award-row {
192
- display: flex; justify-content: space-between;
193
- padding: 8px 12px; border-bottom: 1px solid var(--line); font-size: 14px;
194
- }
195
- .award-row:last-child { border-bottom: 0; }
196
- .prob {
197
- font-family: ui-monospace, monospace; font-size: 12px;
198
- padding: 1px 8px; border-radius: 10px; color: white;
199
- }
200
- .prob-h { background: #2e7d32; }
201
- .prob-m { background: #ef9c2c; }
202
- .prob-l { background: #b03a2e; }
203
-
204
- /* Badges grid */
205
- .badges-grid {
206
- display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr));
207
- gap: 12px; margin-top: 14px;
208
- }
209
- .badge-card {
210
- background: var(--card); border: 1px solid var(--line);
211
- border-radius: 8px; padding: 12px 14px;
212
- }
213
- .badge-card.skip { opacity: 0.45; border-style: dashed; }
214
- .badge-card .tag {
215
- display: inline-block; background: var(--accent); color: white;
216
- font-family: ui-monospace, monospace; font-size: 11px;
217
- padding: 1px 7px; border-radius: 10px; margin-bottom: 6px;
218
- }
219
- .badge-card.skip .tag { background: var(--gray); }
220
- .badge-card strong { font-size: 14px; }
221
- .badge-card p { font-size: 13px; color: var(--accent2); margin: 4px 0 0; }
222
-
223
- .footnote {
224
- margin-top: 30px; padding: 14px 18px;
225
- border-left: 4px solid var(--accent);
226
- background: var(--card); font-size: 14px; border-radius: 4px;
227
- }
228
- </style>
229
- <link rel="preconnect" href="https://fonts.googleapis.com">
230
- <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
231
- <link href="https://fonts.googleapis.com/css2?family=Lora:wght@400;600;700&family=Inter:wght@400;500;600;700&display=swap" rel="stylesheet">
232
- </head>
233
- <body>
234
- <div class="wrap">
235
-
236
- <h1><em>Cocina Conmigo</em></h1>
237
- <div class="subtitle">Sous-chef multimodal con visión, voz y Flux.2 — para cocinar con tu mamá sin tener las manos libres</div>
238
-
239
- <div class="hero">
240
- <div>
241
- <h2>La idea en una frase</h2>
242
- <p>Tu mamá toma foto del refri, la app le propone qué cocinar, le <strong>muestra cómo se debe ver cada paso</strong> con Flux.2, y la <strong>narra por voz</strong> mientras ella cocina con las manos llenas.</p>
243
- <p class="quote">"Mi mamá me pidió que le enseñara a hacer ramen. Le construí un sous-chef que vive en su tablet."</p>
244
- <div style="margin-top: 14px;">
245
- <span class="pill flux">Flux.2 Klein 9B</span>
246
- <span class="pill openbmb">MiniCPM-V + voice</span>
247
- <span class="pill cohere">Cohere voice</span>
248
- <span class="pill gradio">Gradio Workflows</span>
249
- <span class="pill modal">Modal-powered</span>
250
- <span class="pill llama">llama.cpp</span>
251
- </div>
252
- </div>
253
- <div class="target">
254
- <strong>Track:</strong> Backyard AI<br/>
255
- <strong>Persona:</strong> tu mamá / pareja / vecino<br/>
256
- <strong>Idioma:</strong> español-mexicano<br/>
257
- <strong>Total params:</strong> ~17B (≤ 32B ✓)<br/>
258
- <strong>Cocina:</strong> mexicana tradicional<br/>
259
- <strong>Storyline:</strong> "Para que mi mamá deje de googlear"
260
- </div>
261
- </div>
262
-
263
-
264
- <h2><span class="num">01</span>Por qué esta idea, y no las anteriores</h2>
265
- <table>
266
- <thead><tr><th>Iteración</th><th>Idea</th><th>Por qué se descartó</th></tr></thead>
267
- <tbody>
268
- <tr><td>v1</td><td>Abuelita (parent phone helper)</td><td>En la lista pre-cocinada de OpenBMB → 5-15 equipos lo harán</td></tr>
269
- <tr><td>v2</td><td>Cuentacuentos (voice storyteller)</td><td>También en la lista pre-cocinada de OpenBMB</td></tr>
270
- <tr style="background:#fff3cf;"><td><strong>v3 (ésta)</strong></td><td><strong>Cocina Conmigo</strong></td><td>Refinamiento de tu idea #1 · NO está en ninguna lista pre-cocinada · usa Flux.2 + Workflows + voces · diaria + universal</td></tr>
271
- </tbody>
272
- </table>
273
-
274
- <div class="forbidden">
275
- <strong>⛔ Las 12 ideas en zona prohibida (clúster OpenBMB):</strong>
276
- <ul>
277
- <li>parent phone helper</li>
278
- <li>receipt / bill explainer</li>
279
- <li>shop menu / repair manual</li>
280
- <li>offline personal assistant / voice companion</li>
281
- <li>voice storyteller</li>
282
- <li>visual mystery box</li>
283
- <li>AI museum (≈ tu idea #4)</li>
284
- <li>doodle creature</li>
285
- <li>dream postcard gen</li>
286
- <li>omni-modal adventure</li>
287
- <li>tiny local NPC / character agent</li>
288
- <li>cortes de cabello (tu idea #3, ya saturada)</li>
289
- </ul>
290
- </div>
291
-
292
-
293
- <h2><span class="num">02</span>Las 4 historias del demo</h2>
294
- <div class="phone-row">
295
-
296
- <div>
297
- <div class="phone"><div class="screen">
298
- <div class="topbar">📸 Tengo esto en el refri</div>
299
- <div class="body">
300
- <div class="illu">🍅🌶🐔🧅</div>
301
- <p><strong>Veo:</strong> pollo, jitomate, cebolla, cilantro, tortillas, queso.</p>
302
- <p style="background:#fff3cf;border-radius:6px;padding:6px 10px;">
303
- <strong>3 opciones:</strong><br/>
304
- 🌮 Tinga · 🌯 Enchiladas · 🧀 Quesadillas
305
- </p>
306
- <div class="voice">🔊 "¿Qué traes ganas?"</div>
307
- </div>
308
- </div></div>
309
- <div class="scenario-label">1. Visión + Planner</div>
310
- </div>
311
-
312
- <div>
313
- <div class="phone"><div class="screen">
314
- <div class="topbar">👩‍🍳 Paso 2 de 5</div>
315
- <div class="body">
316
- <div class="illu">🍳✨</div>
317
- <p><strong>Acitrona la cebolla en aceite caliente.</strong></p>
318
- <p style="font-size:11px;color:var(--gray);">⏱ 4 minutos · hasta que esté transparente</p>
319
- <div class="voice">🔊 OpenBMB voice narra…</div>
320
- </div>
321
- </div></div>
322
- <div class="scenario-label">2. Voz + imagen objetivo</div>
323
- </div>
324
-
325
- <div>
326
- <div class="phone"><div class="screen">
327
- <div class="topbar">📸 ¿Voy bien?</div>
328
- <div class="body">
329
- <div class="illu">🍳👀</div>
330
- <p style="background:var(--green-soft);border-radius:6px;padding:6px 10px;color:var(--green);">
331
- <strong>✅ Va perfecto.</strong> La cebolla ya se ve transparente.
332
- </p>
333
- <div class="tip">🔊 Cohere voice: "¡Súbele 1 minuto más, está bien!"</div>
334
- </div>
335
- </div></div>
336
- <div class="scenario-label">3. Closed-loop visual</div>
337
- </div>
338
-
339
- <div>
340
- <div class="phone"><div class="screen">
341
- <div class="topbar">🔄 Replan</div>
342
- <div class="body">
343
- <p>Usuario: <em>"No tengo cilantro."</em></p>
344
- <div class="illu" style="background: linear-gradient(135deg,#ffd28b,#a85c2a);">🌮</div>
345
- <p>"No pasa nada. Le ponemos perejil o nada. Sigue siendo tinga."</p>
346
- <div class="voice">🔊 Receta regenera · plato final actualizado</div>
347
- </div>
348
- </div></div>
349
- <div class="scenario-label">4. Adaptación en vivo</div>
350
- </div>
351
-
352
- </div>
353
-
354
-
355
- <h2><span class="num">03</span>Arquitectura — 5 agentes en un Gradio Workflow</h2>
356
-
357
- <svg viewBox="0 0 1240 540" xmlns="http://www.w3.org/2000/svg">
358
- <defs>
359
- <marker id="ar" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="7" markerHeight="7" orient="auto">
360
- <path d="M0,0 L10,5 L0,10 z" fill="#2b2018"/>
361
- </marker>
362
- <marker id="aro" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="7" markerHeight="7" orient="auto">
363
- <path d="M0,0 L10,5 L0,10 z" fill="#a85c2a"/>
364
- </marker>
365
- </defs>
366
-
367
- <!-- User input area -->
368
- <rect x="20" y="40" width="200" height="240" rx="10" fill="#fff3cf" stroke="#d8c9ad" stroke-dasharray="4 3"/>
369
- <text x="40" y="62" class="node-text node-title" fill="#6b4a2a">USUARIO (cocina)</text>
370
-
371
- <rect class="node-box" x="40" y="80" width="160" height="50" rx="6" fill="#ddd1bd"/>
372
- <text x="120" y="102" class="node-text node-title" text-anchor="middle">📸 Foto del refri</text>
373
- <text x="120" y="118" class="node-text node-sub" text-anchor="middle">trigger inicial</text>
374
-
375
- <rect class="node-box" x="40" y="140" width="160" height="50" rx="6" fill="#ddd1bd"/>
376
- <text x="120" y="162" class="node-text node-title" text-anchor="middle">🎙️ Pregunta voz</text>
377
- <text x="120" y="178" class="node-text node-sub" text-anchor="middle">"¿voy bien?"</text>
378
-
379
- <rect class="node-box" x="40" y="200" width="160" height="50" rx="6" fill="#ddd1bd"/>
380
- <text x="120" y="222" class="node-text node-title" text-anchor="middle">📸 Foto progreso</text>
381
- <text x="120" y="238" class="node-text node-sub" text-anchor="middle">closed-loop</text>
382
-
383
- <!-- Output area -->
384
- <rect x="20" y="320" width="200" height="180" rx="10" fill="#fff3cf" stroke="#d8c9ad" stroke-dasharray="4 3"/>
385
- <text x="40" y="342" class="node-text node-title" fill="#6b4a2a">SALIDA</text>
386
-
387
- <rect class="node-box" x="40" y="360" width="160" height="50" rx="6" fill="#dbe9d8"/>
388
- <text x="120" y="382" class="node-text node-title" text-anchor="middle">🍽️ Plato final + receta</text>
389
- <text x="120" y="398" class="node-text node-sub" text-anchor="middle">imagen + texto</text>
390
-
391
- <rect class="node-box" x="40" y="420" width="160" height="50" rx="6" fill="#dbe9d8"/>
392
- <text x="120" y="442" class="node-text node-title" text-anchor="middle">🔊 Voz por paso</text>
393
- <text x="120" y="458" class="node-text node-sub" text-anchor="middle">narrador + tips</text>
394
-
395
- <!-- Pipeline center -->
396
- <rect x="260" y="40" width="700" height="460" rx="10" fill="#fffaf0" stroke="#d8c9ad" stroke-width="1.5"/>
397
- <text x="610" y="62" class="node-text node-title" text-anchor="middle" fill="#6b4a2a">HF SPACE — Gradio Workflow (5 agentes)</text>
398
-
399
- <!-- Vision (Mise en Place) -->
400
- <rect class="node-box" x="280" y="90" width="200" height="80" rx="6" fill="#e6d5ed"/>
401
- <text x="380" y="110" class="node-text node-title" text-anchor="middle">👁️ MISE EN PLACE</text>
402
- <text x="380" y="126" class="node-text node-sub" text-anchor="middle">MiniCPM-V (Q4)</text>
403
- <text x="380" y="142" class="node-text node-sub" text-anchor="middle">~2-4B</text>
404
- <text x="380" y="160" class="node-text node-sub" text-anchor="middle">identifica ingredientes</text>
405
-
406
- <!-- Recipe Planner -->
407
- <rect class="node-box" x="510" y="90" width="200" height="80" rx="6" fill="#fbe4d3"/>
408
- <text x="610" y="110" class="node-text node-title" text-anchor="middle">🧠 RECIPE PLANNER</text>
409
- <text x="610" y="126" class="node-text node-sub" text-anchor="middle">MiniCPM-4 (LoRA mx)</text>
410
- <text x="610" y="142" class="node-text node-sub" text-anchor="middle">~4B</text>
411
- <text x="610" y="160" class="node-text node-sub" text-anchor="middle">arma receta JSON · replan</text>
412
-
413
- <!-- Step Illustrator -->
414
- <rect class="node-box" x="740" y="90" width="200" height="80" rx="6" fill="#f6dccc"/>
415
- <text x="840" y="110" class="node-text node-title" text-anchor="middle">🎨 STEP ILLUSTRATOR</text>
416
- <text x="840" y="126" class="node-text node-sub" text-anchor="middle">Flux.2 Klein 9B</text>
417
- <text x="840" y="142" class="node-text node-sub" text-anchor="middle">en Modal GPU L4</text>
418
- <text x="840" y="160" class="node-text node-sub" text-anchor="middle">imagen-objetivo por paso</text>
419
-
420
- <!-- Sous-Chef Narrator -->
421
- <rect class="node-box" x="510" y="200" width="200" height="70" rx="6" fill="#cfe0ee"/>
422
- <text x="610" y="222" class="node-text node-title" text-anchor="middle">🔊 SOUS-CHEF NARRATOR</text>
423
- <text x="610" y="238" class="node-text node-sub" text-anchor="middle">OpenBMB voice (~1B)</text>
424
- <text x="610" y="254" class="node-text node-sub" text-anchor="middle">tono cálido</text>
425
-
426
- <!-- Tip Giver -->
427
- <rect class="node-box" x="740" y="200" width="200" height="70" rx="6" fill="#e9d6f5"/>
428
- <text x="840" y="222" class="node-text node-title" text-anchor="middle">🎭 TIP GIVER</text>
429
- <text x="840" y="238" class="node-text node-sub" text-anchor="middle">Cohere voice (~1B)</text>
430
- <text x="840" y="254" class="node-text node-sub" text-anchor="middle">warnings · enérgico</text>
431
-
432
- <!-- Progress Validator (closed loop) -->
433
- <rect class="node-box" x="280" y="290" width="220" height="90" rx="6" fill="#dbe9d8" stroke="#3f7a3a" stroke-width="2"/>
434
- <text x="390" y="312" class="node-text node-title" text-anchor="middle" fill="#3f7a3a">✅ PROGRESS VALIDATOR</text>
435
- <text x="390" y="328" class="node-text node-sub" text-anchor="middle">MiniCPM-V (reuso)</text>
436
- <text x="390" y="344" class="node-text node-sub" text-anchor="middle">compara foto usuario vs</text>
437
- <text x="390" y="360" class="node-text node-sub" text-anchor="middle">imagen-objetivo</text>
438
- <text x="390" y="376" class="node-text node-sub" text-anchor="middle">CLOSED LOOP 🔄</text>
439
-
440
- <!-- STT -->
441
- <rect class="node-box" x="280" y="200" width="200" height="70" rx="6" fill="#cfe0ee"/>
442
- <text x="380" y="222" class="node-text node-title" text-anchor="middle">🎙️ STT (opcional)</text>
443
- <text x="380" y="238" class="node-text node-sub" text-anchor="middle">Whisper-tiny (~40M)</text>
444
- <text x="380" y="254" class="node-text node-sub" text-anchor="middle">"¿voy bien?" hands-free</text>
445
-
446
- <!-- Recipe State -->
447
- <rect class="node-box" x="510" y="290" width="430" height="90" rx="6" fill="#fff3cf"/>
448
- <text x="725" y="312" class="node-text node-title" text-anchor="middle" fill="#8a6a18">📖 RECIPE STATE (dataclass)</text>
449
- <text x="725" y="328" class="node-text node-sub" text-anchor="middle">name · final_dish_image · steps · current_step ·</text>
450
- <text x="725" y="344" class="node-text node-sub" text-anchor="middle">missing_ingredients · substitutes · user_progress_photos</text>
451
- <text x="725" y="362" class="node-text node-sub" text-anchor="middle">cada agente lee y escribe sobre este objeto</text>
452
-
453
- <!-- Page assembler -->
454
- <rect class="node-box" x="280" y="400" width="660" height="60" rx="6" fill="#f6dccc"/>
455
- <text x="610" y="422" class="node-text node-title" text-anchor="middle">📖 RECIPE CARD ASSEMBLER</text>
456
- <text x="610" y="438" class="node-text node-sub" text-anchor="middle">renderiza la tarjeta de receta + cards por paso + audio reproducible</text>
457
-
458
- <!-- Modal box -->
459
- <rect x="990" y="40" width="240" height="460" rx="10" fill="#dbe9d8" stroke="#3f7a3a" stroke-width="1.5"/>
460
- <text x="1110" y="62" class="node-text node-title" text-anchor="middle" fill="#3f7a3a">MODAL</text>
461
-
462
- <rect class="node-box" x="1010" y="90" width="200" height="80" rx="6" fill="#fff"/>
463
- <text x="1110" y="112" class="node-text node-title" text-anchor="middle">Flux endpoint</text>
464
- <text x="1110" y="128" class="node-text node-sub" text-anchor="middle">runtime · @app.cls L4</text>
465
- <text x="1110" y="144" class="node-text node-sub" text-anchor="middle">scaledown 180s</text>
466
- <text x="1110" y="160" class="node-text node-sub" text-anchor="middle">~1-3s/imagen</text>
467
-
468
- <rect class="node-box" x="1010" y="190" width="200" height="80" rx="6" fill="#fff"/>
469
- <text x="1110" y="212" class="node-text node-title" text-anchor="middle">Dataset cocina mx</text>
470
- <text x="1110" y="228" class="node-text node-sub" text-anchor="middle">offline · 200 recetas</text>
471
- <text x="1110" y="244" class="node-text node-sub" text-anchor="middle">Codex API genera</text>
472
- <text x="1110" y="260" class="node-text node-sub" text-anchor="middle">~$5</text>
473
-
474
- <rect class="node-box" x="1010" y="290" width="200" height="80" rx="6" fill="#fff"/>
475
- <text x="1110" y="312" class="node-text node-title" text-anchor="middle">LoRA Planner</text>
476
- <text x="1110" y="328" class="node-text node-sub" text-anchor="middle">offline · A10G ~30 min</text>
477
- <text x="1110" y="344" class="node-text node-sub" text-anchor="middle">push GGUF a HF</text>
478
- <text x="1110" y="360" class="node-text node-sub" text-anchor="middle">~$1</text>
479
-
480
- <rect class="node-box" x="1010" y="390" width="200" height="80" rx="6" fill="#fff"/>
481
- <text x="1110" y="412" class="node-text node-title" text-anchor="middle">Eval pipeline</text>
482
- <text x="1110" y="428" class="node-text node-sub" text-anchor="middle">consistencia visual</text>
483
- <text x="1110" y="444" class="node-text node-sub" text-anchor="middle">% ingredientes correctos</text>
484
- <text x="1110" y="460" class="node-text node-sub" text-anchor="middle">~$1</text>
485
-
486
- <!-- Arrows: input → vision -->
487
- <path class="arrow" d="M200 105 L278 130" marker-end="url(#ar)"/>
488
- <text x="200" y="100" class="arrow-label">refri</text>
489
-
490
- <!-- input → STT -->
491
- <path class="arrow" d="M200 165 L278 235" marker-end="url(#ar)"/>
492
- <text x="205" y="200" class="arrow-label">audio</text>
493
-
494
- <!-- input progress → validator -->
495
- <path class="arrow arrow-loop" d="M200 225 L278 330" marker-end="url(#aro)"/>
496
- <text x="200" y="270" class="arrow-label" style="fill:#a85c2a;">progreso</text>
497
-
498
- <!-- Vision → Planner -->
499
- <path class="arrow" d="M480 130 L508 130" marker-end="url(#ar)"/>
500
- <text x="482" y="120" class="arrow-label">ingredientes</text>
501
-
502
- <!-- Planner → Illustrator -->
503
- <path class="arrow" d="M710 130 L738 130" marker-end="url(#ar)"/>
504
- <text x="712" y="120" class="arrow-label">visual prompt</text>
505
-
506
- <!-- Illustrator → Modal -->
507
- <path class="arrow dashed" d="M940 130 L1008 130" marker-end="url(#ar)"/>
508
- <text x="945" y="120" class="arrow-label">.remote()</text>
509
-
510
- <!-- Planner → narrator -->
511
- <path class="arrow" d="M610 170 L610 198" marker-end="url(#ar)"/>
512
- <!-- Planner → tip giver -->
513
- <path class="arrow" d="M710 145 C 760 170, 800 180, 800 198" marker-end="url(#ar)"/>
514
-
515
- <!-- Validator → Planner (loop) -->
516
- <path class="arrow arrow-loop" d="M390 290 C 390 240, 470 190, 510 145" marker-end="url(#aro)"/>
517
- <text x="395" y="240" class="arrow-label" style="fill:#a85c2a;">verdict · feedback</text>
518
-
519
- <!-- STT → Validator -->
520
- <path class="arrow dashed" d="M380 270 L380 288" marker-end="url(#ar)"/>
521
-
522
- <!-- Recipe state ↔ all agents -->
523
- <path class="arrow dashed" d="M725 290 L725 270" marker-end="url(#ar)"/>
524
- <path class="arrow dashed" d="M610 290 L610 270" marker-end="url(#ar)"/>
525
-
526
- <!-- All → Assembler -->
527
- <path class="arrow" d="M610 380 L610 398" marker-end="url(#ar)"/>
528
-
529
- <!-- Assembler → output -->
530
- <path class="arrow" d="M280 425 C 240 425, 220 410, 200 385" marker-end="url(#ar)"/>
531
- <path class="arrow" d="M280 440 C 240 440, 220 445, 200 445" marker-end="url(#ar)"/>
532
-
533
- <!-- Modal → Planner (LoRA pesos offline) -->
534
- <path class="arrow dashed" d="M1010 330 C 870 330, 750 280, 710 165" marker-end="url(#ar)"/>
535
- <text x="900" y="280" class="arrow-label">LoRA pesos</text>
536
- </svg>
537
- <p style="font-size: 13px; color: var(--accent2); margin-top: 10px;">
538
- <strong>Flecha naranja</strong> = closed-loop visual (la innovación). El usuario toma foto del progreso, MiniCPM-V valida vs imagen-objetivo, el Planner ajusta o avanza. Ningún recipe app del mercado lo hace.
539
- </p>
540
-
541
-
542
- <h2><span class="num">04</span>El truco innovador: closed-loop visual cocinero</h2>
543
- <div class="grid-3">
544
- <div class="card">
545
- <h3>1. Imagen-objetivo por paso</h3>
546
- <p style="font-size:13px;">Flux.2 genera "así debe verse el sartén/plato/olla en el paso N". No es texto, no es stock photo: es generación context-aware del estado deseado.</p>
547
- </div>
548
- <div class="card">
549
- <h3>2. Validación con foto del usuario</h3>
550
- <p style="font-size:13px;">El usuario sube foto de cómo va. MiniCPM-V compara contra la imagen-objetivo y devuelve verdict: <code>go</code> · <code>wait</code> · <code>fix</code>.</p>
551
- </div>
552
- <div class="card">
553
- <h3>3. Replan adaptativo</h3>
554
- <p style="font-size:13px;">"No tengo cilantro." → Planner regenera receta + Flux regenera imagen final. El plan no es estático, evoluciona con el estado real.</p>
555
- </div>
556
- </div>
557
- <p style="margin-top:14px; font-size:14px;">
558
- <strong>Esta es la sección destacada del README</strong> y el blog post de Field Notes badge: <em>"How visual closed-loop cooking guidance works."</em>
559
- </p>
560
-
561
-
562
- <h2><span class="num">05</span>Badges objetivo (5/6)</h2>
563
- <div class="badges-grid">
564
- <div class="badge-card"><span class="tag">LLAMA.CPP</span><br/><strong>Llama Champion</strong><p>Vision + Planner via <code>llama-cpp-python</code> con GGUF Q4.</p></div>
565
- <div class="badge-card"><span class="tag">FINE-TUNED</span><br/><strong>Well-Tuned</strong><p>LoRA en cocina mexicana · publicado en HF.</p></div>
566
- <div class="badge-card"><span class="tag">CUSTOM UI</span><br/><strong>Off-Brand</strong><p>UI tarjeta de receta · serif · paleta cálida · modo cocina XL.</p></div>
567
- <div class="badge-card"><span class="tag">OPEN TRACE</span><br/><strong>Sharing is Caring</strong><p>Dataset 150 recetas mx + traces + recetas generadas al Hub.</p></div>
568
- <div class="badge-card"><span class="tag">TENTATIVE</span><br/><strong>Field Notes</strong><p>Blog: "Le construí un sous-chef a mi mamá".</p></div>
569
- <div class="badge-card skip"><span class="tag">LOCAL-FIRST</span><br/><strong>Off the Grid</strong><p>Sacrificado: Flux.2 corre en Modal por calidad.</p></div>
570
- </div>
571
-
572
-
573
- <h2><span class="num">06</span>Premios objetivo</h2>
574
- <div class="card">
575
- <div class="award-row"><span><strong>Backyard AI Track</strong> · $1K–$4K</span><span class="prob prob-h">ALTA</span></div>
576
- <div class="award-row"><span><strong>Modal Awards</strong> · $3K–$10K credits</span><span class="prob prob-h">ALTA</span></div>
577
- <div class="award-row"><span><strong>OpenBMB Award</strong> · $1K–$2.5K</span><span class="prob prob-h">ALTA</span></div>
578
- <div class="award-row"><span><strong>Best Demo</strong> · $1K</span><span class="prob prob-h">ALTA</span></div>
579
- <div class="award-row"><span><strong>Community Choice</strong> · $2K</span><span class="prob prob-h">ALTA</span></div>
580
- <div class="award-row"><span><strong>Best Agent</strong> · $1K</span><span class="prob prob-h">ALTA — closed-loop multi-agente real</span></div>
581
- <div class="award-row"><span><strong>Bonus Quest Champion</strong> · $2K</span><span class="prob prob-m">MEDIA-ALTA · 5/6 badges</span></div>
582
- <div class="award-row"><span><strong>Off-Brand</strong> · $1.5K</span><span class="prob prob-m">MEDIA</span></div>
583
- <div class="award-row"><span><strong>Tiny Titan</strong> · $1.5K</span><span class="prob prob-l">BAJA · Flux 9B saca del rango</span></div>
584
- </div>
585
- <p style="font-size: 14px; margin-top: 8px;"><strong>Cota razonable acumulada: $5K–$12K cash + $3K–$10K Modal credits.</strong></p>
586
-
587
-
588
- <h2><span class="num">07</span>Timeline de 10 días</h2>
589
- <div class="timeline">
590
- <div class="day"><span class="lbl">D1</span><strong>Setup + Modal Flux endpoint</strong><div class="what">"Hola Flux": prompt → imagen de un platillo. Space vacío deployado.</div></div>
591
- <div class="day"><span class="lbl">D2</span><strong>Vision: identificación de ingredientes</strong><div class="what">MiniCPM-V Q4 · prueba con 5 fotos reales del refri.</div></div>
592
- <div class="day"><span class="lbl">D3</span><strong>Recipe Planner LLM</strong><div class="what">MiniCPM-4 · JSON estructurado · 3 opciones a partir de ingredientes.</div></div>
593
- <div class="day"><span class="lbl">D4</span><strong>Step Illustrator (Flux + consistencia)</strong><div class="what">Imagen del plato final + 5 imágenes-objetivo por paso · i2i suave.</div></div>
594
- <div class="day"><span class="lbl">D5</span><strong>Voz: narrador + tip-giver</strong><div class="what">OpenBMB voice + Cohere voice · audio pre-renderizado por paso.</div></div>
595
- <div class="day"><span class="lbl">D6</span><strong>UI Off-Brand: recipe card</strong><div class="what">gr.Blocks + CSS serif tierra · modo cocina XL hands-free.</div></div>
596
- <div class="day"><span class="lbl">D7</span><strong>Gradio Workflows showcase</strong><div class="what">Pipeline reescrita como Workflow visible · pestaña separada.</div></div>
597
- <div class="day"><span class="lbl">D8</span><strong>Fine-tune del Planner en cocina mx</strong><div class="what">200 recetas sintéticas · LoRA · GGUF · push HF.</div></div>
598
- <div class="day"><span class="lbl">D9</span><strong>STT + Progress Validator + eval</strong><div class="what">Whisper · closed-loop activo · Sharing is Caring badge.</div></div>
599
- <div class="day"><span class="lbl">D10</span><strong>Demo + README + blog + submit</strong><div class="what">Mamá real cocinando · 60-90s · subtítulos EN · Field Notes blog.</div></div>
600
- </div>
601
-
602
-
603
- <h2><span class="num">08</span>Plan B (corte de scope)</h2>
604
- <table>
605
- <thead><tr><th>#</th><th>Cortar</th><th>Pierdes</th><th>Conservas</th></tr></thead>
606
- <tbody>
607
- <tr><td>1</td><td>STT (preguntas voz)</td><td>comodidad demo</td><td>texto + foto</td></tr>
608
- <tr><td>2</td><td>2da voz (Cohere tip-giver)</td><td>1 sponsor voice</td><td>narrador único</td></tr>
609
- <tr><td>3</td><td>Progress Validator (closed-loop)</td><td><strong>Best Agent</strong> + innovación principal</td><td>demo lineal</td></tr>
610
- <tr><td>4</td><td>Fine-tune del Planner</td><td><strong>Well-Tuned</strong></td><td>resto badges</td></tr>
611
- <tr><td>5</td><td>Gradio Workflows showcase</td><td>diferenciador "fresh"</td><td>pipeline Python</td></tr>
612
- <tr><td>6</td><td>UI super-custom</td><td><strong>Off-Brand</strong></td><td>UI default</td></tr>
613
- <tr style="background:#fff3cf;"><td>—</td><td><strong>NUNCA</strong></td><td colspan="2">Vision + Planner + Illustrator + Narrator + UI mínima + video con persona real cocinando</td></tr>
614
- </tbody>
615
- </table>
616
-
617
-
618
- <h2><span class="num">09</span>Riesgos clave</h2>
619
- <table>
620
- <thead><tr><th>Riesgo</th><th>Mitigación</th></tr></thead>
621
- <tbody>
622
- <tr><td>Flux.2 Klein no tiene API/pesos públicos cuando lo necesitas</td><td>Plan B: Flux.1-schnell o SDXL-Lightning. Pierdes posicionamiento sponsor pero idea sobrevive.</td></tr>
623
- <tr><td>MiniCPM-V no identifica ingredientes mexicanos (chile poblano, nopales)</td><td>Few-shot en prompt; eventualmente fine-tune ligero del visión sobre 50 fotos etiquetadas</td></tr>
624
- <tr><td>Flux.2 genera comida poco apetitosa</td><td>Itera prompts ("recipe magazine, warm light, top-down"); usa imagen final como ref para los pasos</td></tr>
625
- <tr><td>Progress Validator da false positives</td><td>Conservador: solo dice "vas bien" si similitud es alta; default es "sigue" sin juicio fuerte</td></tr>
626
- <tr><td>Latencia receta &gt; 30s</td><td>Streaming progresivo; paraleliza Flux + TTS</td></tr>
627
- <tr><td>Modal cold start ~30-60s en Flux</td><td>Pre-warm 30s antes de filmar · <code>keep_warm=1</code> el día del demo</td></tr>
628
- <tr><td>Persona del demo se quema/cocina mal</td><td>Practica la receta una vez antes · 2-3 candidatos de receta listos</td></tr>
629
- <tr><td>Otro equipo presenta "recipe app con AI"</td><td>Diferéncialo con: closed-loop visual + español + cocina mx + dataset publicado + persona real</td></tr>
630
- </tbody>
631
- </table>
632
-
633
-
634
- <h2><span class="num">10</span>Cómo gastar los créditos</h2>
635
- <div class="grid-2">
636
- <div class="card">
637
- <h3>Modal · $250</h3>
638
- <table>
639
- <tr><td>Flux dev (días 1-9)</td><td>$5-15</td></tr>
640
- <tr><td>Dataset cocina mx</td><td>$3-8</td></tr>
641
- <tr><td>LoRA + sweeps</td><td>$4-5</td></tr>
642
- <tr><td>Eval</td><td>$1</td></tr>
643
- <tr><td>Inferencia grading jueces</td><td>$10-25</td></tr>
644
- <tr><th>Subtotal</th><th>$25-65</th></tr>
645
- <tr><th>+ Buffer</th><th>$30</th></tr>
646
- <tr><th>Proyectado</th><th><strong>~$55-95 / $250</strong></th></tr>
647
- </table>
648
- </div>
649
- <div class="card">
650
- <h3>OpenAI Codex · $100</h3>
651
- <table>
652
- <tr><td>Codex CLI pair-programmer</td><td>$20-40</td></tr>
653
- <tr><td>200 recetas mx sintéticas</td><td>$10-25</td></tr>
654
- <tr><td>Prompts Flux por paso</td><td>$5-10</td></tr>
655
- <tr><td>Reserva</td><td>$30</td></tr>
656
- <tr><th>Proyectado</th><th><strong>~$65-105 / $100</strong></th></tr>
657
- </table>
658
- </div>
659
- </div>
660
-
661
-
662
- <div class="footnote">
663
- <strong>Mantra del proyecto:</strong> "Una mamá cocinando frente a la cámara. Un platillo que se ve apetitoso. Una voz que la acompaña sin juzgar. Un paso a la vez."
664
- </div>
665
-
666
- </div>
667
- </body>
668
- </html>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Strategy/estrategia.md DELETED
@@ -1,496 +0,0 @@
1
- # Estrategia detallada — "Cocina Conmigo"
2
-
3
- > Documento de ejecución. Lee primero `plan.md` para el "qué" y el "por qué".
4
- > Este archivo es el "cómo": modelo mental, multi-agente, timeline, gasto de créditos, riesgos, snippets.
5
-
6
- ---
7
-
8
- ## 1. Modelo mental: la "receta" como objeto de estado
9
-
10
- La app no es un chatbot. Es una **máquina de estado** alrededor de un objeto `Recipe` que evoluciona en el tiempo. Ese estado se actualiza en cada turno.
11
-
12
- ```python
13
- @dataclass
14
- class Recipe:
15
- name: str # "Tinga de Pollo"
16
- final_dish_image: bytes # imagen Flux del plato final
17
- available_ingredients: list[str] # lo que la cámara vio en el refri
18
- missing_ingredients: list[str] # lo que falta + sus sustitutos
19
- steps: list[Step] # 5-7 pasos
20
- current_step: int # qué paso vamos haciendo
21
- user_progress_photos: list[bytes] # fotos que el usuario tomó
22
-
23
- @dataclass
24
- class Step:
25
- n: int
26
- instruction_text: str # "Pica la cebolla en cubos chicos"
27
- visual_target: bytes # imagen Flux: "así debe verse el sartén"
28
- duration_estimate: str # "4 minutos"
29
- audio_narration: bytes # narración pre-renderizada
30
- tip: str | None # "no la quemes"
31
- tip_audio: bytes | None # voz Cohere
32
- ```
33
-
34
- Ventajas de pensarlo así:
35
- - Cada nodo del Workflow toma `Recipe` y devuelve `Recipe` modificada. Composable y observable.
36
- - El "replan" (no tengo cilantro) es una sola función `recipe.replan(missing="cilantro") → Recipe`.
37
- - El "validador" toma `Recipe` + `progress_photo` y devuelve `feedback`.
38
-
39
- ---
40
-
41
- ## 2. Los 5 agentes (multi-agente real, no simulado)
42
-
43
- | Agente | Responsabilidad | Trigger | Output |
44
- |---|---|---|---|
45
- | **Mise en Place** | Identificar ingredientes en foto del refri | foto del refri | `available_ingredients` |
46
- | **Recipe Planner** | Proponer 3 recetas factibles · armar la elegida | usuario elige idea | `Recipe` con steps |
47
- | **Step Illustrator** | Generar imagen-objetivo de cada paso + plato final | nueva receta | `Step.visual_target` para cada paso |
48
- | **Sous-Chef Narrator** | Narrar instrucciones por voz | paso activo | `Step.audio_narration` |
49
- | **Progress Validator** | Comparar foto del usuario vs imagen-objetivo | usuario sube foto mid-cooking | `feedback` (texto + voz tip) |
50
-
51
- Esto es un **sistema multi-agente real**: cada agente tiene su propia función, su propio modelo, y se comunican por estado compartido (`Recipe`). No es un agente único con tools — es 5 agentes en pipeline + closed-loop.
52
-
53
- > **Best Agent badge candidate.** Documenta esto en el README con un diagrama explícito.
54
-
55
- ---
56
-
57
- ## 3. El truco innovador: closed-loop visual
58
-
59
- ```
60
- ┌─────────────────────────────────────┐
61
- │ │
62
- ▼ │
63
- [Step Illustrator]──▶ visual_target ──▶ [Usuario cocina]
64
-
65
-
66
- 📸 progress_photo
67
-
68
-
69
- [Progress Validator]
70
- (MiniCPM-V)
71
-
72
- ┌───────────────────┤
73
- │ │
74
- ✅ va bien ❌ ajustar
75
- │ │
76
- siguiente paso [Recipe Planner]
77
- replan/tip
78
-
79
- └──────▶ vuelta al loop
80
- ```
81
-
82
- Esta es **la innovación técnica** del proyecto. La mayoría de "recipe apps" son listas estáticas. Cocina Conmigo:
83
-
84
- 1. Genera *visualmente* cómo debe verse cada paso (no solo texto).
85
- 2. Acepta foto del usuario y la *compara* con el objetivo.
86
- 3. Adapta el plan en vivo si algo no va.
87
-
88
- Sección dedicada en el README: *"How visual closed-loop cooking guidance works"*. Es también el blog post de Field Notes.
89
-
90
- ---
91
-
92
- ## 4. Cronograma — 10 días
93
-
94
- > ~50-70 horas de trabajo + 1 humano + Codex CLI como pair.
95
-
96
- ### Día 1 — Setup + Modal Flux endpoint
97
- - `pip install gradio modal openai huggingface-hub diffusers llama-cpp-python`
98
- - `modal setup` y deploya el endpoint Flux que devuelve imagen dada un prompt.
99
- - Crea Space vacío en HF, push inicial.
100
- - **Entregable:** Space que muestra una imagen Flux dado un texto.
101
-
102
- ### Día 2 — Vision: identificación de ingredientes
103
- - Carga MiniCPM-V Q4 GGUF en local.
104
- - Función: `identify_ingredients(fridge_photo) → list[str]`.
105
- - Prueba con 5 fotos de refri reales (el tuyo, el de tu mamá).
106
- - **Entregable:** dada foto del refri, devuelve lista correcta de 80%+ ingredientes visibles.
107
-
108
- ### Día 3 — Recipe Planner LLM
109
- - Carga MiniCPM-4 Q4 GGUF.
110
- - Prompt template estructurado que devuelve JSON:
111
- ```json
112
- {
113
- "name": "Tinga de Pollo",
114
- "options": [{"name": "...", "why": "..."}, ...],
115
- "steps": [{"n": 1, "instruction": "...", "duration": "...", "visual": "..."}],
116
- "missing": ["cilantro"],
117
- "substitutes": {"cilantro": ["perejil", "nada"]}
118
- }
119
- ```
120
- - Conecta Vision + Planner: foto refri → 3 opciones de receta.
121
- - **Entregable:** dada foto + selección, devuelve receta completa estructurada.
122
-
123
- ### Día 4 — Step Illustrator (Flux.2 con consistencia)
124
- - Para cada `Step.visual` del JSON, llama Flux.2 endpoint con prompt:
125
- > *"Top-down view of a kitchen pan with [step.visual]. Mexican cooking style. Warm lighting. Natural ingredients. Photorealistic, recipe magazine style."*
126
- - Para mantener estilo entre pasos: usa la imagen del paso anterior como `ref` con `strength=0.6` (más relajado que cuentos, porque el contenido cambia mucho).
127
- - Genera también imagen del plato final (sin `ref`).
128
- - **Entregable:** receta de 5 pasos cada uno con imagen-objetivo + foto del plato final.
129
-
130
- ### Día 5 — Voz: narrador + tip-giver
131
- - **OpenBMB voice** para `Step.audio_narration`: instrucciones tono cálido y claro.
132
- - **Cohere Labs voice** para `Step.tip_audio`: tono más enérgico ("¡cuidado!").
133
- - Genera audio de los 5 pasos por adelantado (no en streaming, evita cold starts molestos).
134
- - **Entregable:** receta completa con narración audible.
135
-
136
- ### Día 6 — UI Off-Brand: tarjeta de receta
137
- - `gr.Blocks` + CSS custom.
138
- - Layout: hero con imagen del plato final + título grande, abajo carrusel de pasos cada uno con `imagen objetivo + texto + botón "ya"`, modo cocina hands-free con texto enorme.
139
- - Estilo: serif elegante (`Lora`), paleta cálida tierra/dorado.
140
- - **Entregable:** Space que parece tarjeta de revista de cocina, no Gradio.
141
-
142
- ### Día 7 — Gradio Workflows showcase
143
- - Reescribe pipeline como **Gradio Workflow** con nodos visibles.
144
- - Nodos: `📸 Fridge → 👁️ Vision → 🧠 Planner → 🎨 Illustrator → 🔊 Narrator → 📖 Recipe Card`.
145
- - Para `Progress Validator`, agrega rama: `📸 Progress Photo → 👁️ Validator → 💬 Feedback`.
146
- - Pestaña separada en el Space que muestra el grafo del Workflow corriendo en vivo.
147
- - **Entregable:** Workflow visualmente impresionante en pantalla. Diferenciador para jueces de Gradio.
148
-
149
- ### Día 8 — Fine-tune del Planner en cocina mexicana
150
- - **Dataset sintético en Modal:** Codex API genera 200 recetas mexicanas en formato JSON estructurado (tinga, mole, chiles rellenos, sopes, pozole, etc.). Filtras manualmente las 150 mejores.
151
- - **LoRA en Modal A10G:** ~30-60 min de fine-tune sobre MiniCPM-4 4B.
152
- - **GGUF + push HF:** convierte a Q4_K_M, sube a HF Hub.
153
- - Reemplaza el Planner por la versión fine-tuneada.
154
- - **Entregable:** modelo `tu-usuario/cocinaconmigo-4b-mx-Q4_K_M-gguf` publicado.
155
-
156
- ### Día 9 — STT + Progress Validator + eval
157
- - `faster-whisper tiny` en español: usuario pregunta hands-free.
158
- - Implementa **Progress Validator**: foto del usuario → MiniCPM-V compara contra `Step.visual_target` → genera feedback.
159
- - Eval: 10 recetas generadas, mide:
160
- - % ingredientes correctamente identificados.
161
- - % pasos con imagen-objetivo coherente.
162
- - Calidad subjetiva de validación (5 fotos de progreso).
163
- - Sube traces al Hub (badge **Sharing is Caring**).
164
- - **Entregable:** app completa con voz IN, validador, traces publicados.
165
-
166
- ### Día 10 — Demo video + README + blog + submit
167
- - **Filma a una persona real cocinando** una receta sugerida por la app, de principio a fin.
168
- - 60-90 segundos: foto del refri → 3 opciones → elige → cocina con voz → toma foto mid-cooking → app valida → plato final → la persona come.
169
- - README: badges declarados, diagrama, link al video, sección "How closed-loop visual cooking guidance works".
170
- - Blog post (badge **Field Notes**): "Le construí un sous-chef a mi mamá".
171
- - Submit + post social.
172
-
173
- ---
174
-
175
- ## 5. Decisiones técnicas explícitas
176
-
177
- ### 5.1 Por qué Modal en runtime (rompiendo Off the Grid)
178
- Igual que en planes anteriores: Flux.2 9B en CPU del Space free es inviable (GB de RAM y minutos por imagen). Modal-powered es la elección obligada cuando el centro de la app es generación visual.
179
-
180
- ### 5.2 Por qué cocina mexicana específicamente
181
- - Dataset acotado pero rico. Cubrible en 200 recetas.
182
- - Diferenciador cultural automático.
183
- - Se alinea con el público "para mi mamá" (si tu mamá es latina).
184
- - Si los jueces son mexicanos en Discord/Slack, +1.
185
-
186
- ### 5.3 Por qué visual_target con Flux.2 en lugar de imagen stock
187
- - Stock photos tienen sesgo americano/europeo. Flux genera estilo mexicano si lo prompteas.
188
- - Stock no se ajusta al ingrediente exacto que tienes (Flux sí).
189
- - Esto es lo que hace única la app — es el wow factor.
190
-
191
- ### 5.4 Por qué pre-renderizar audio en lugar de streaming
192
- - Latencia: streaming TTS tarda y se ve mal en demo.
193
- - Cocina es secuencial: sabes los 5 pasos cuando empieza la receta. Pre-render todo en paralelo.
194
- - Si el usuario hace replan, regeneras solo los pasos afectados.
195
-
196
- ### 5.5 LoRA y no full fine-tune
197
- Mismo argumento de planes anteriores: 150-200 ejemplos = LoRA r=16 es suficiente. ~30 min A10G ≈ $1.
198
-
199
- ### 5.6 Cómo gastar los $250 de Modal
200
- | Concepto | Estimado |
201
- |---|---|
202
- | Inferencia Flux.2 dev (días 1-9, ~5h GPU L4) | $5-15 |
203
- | Generación dataset sintético cocina mexicana (~2h) | $3-8 |
204
- | LoRA fine-tune + sweeps (~3h A10G) | $4-5 |
205
- | Eval pipeline | $1 |
206
- | Inferencia durante grading de jueces (~10h) | $10-25 |
207
- | **Subtotal** | **$25-65** |
208
- | Buffer | $30 |
209
- | **Total proyectado** | **~$55-95 / $250** |
210
-
211
- ### 5.7 Cómo gastar los $100 de OpenAI Codex
212
- - Codex CLI durante 10 días como pair-programmer: $20-40.
213
- - Generación de 200 recetas mexicanas estructuradas (Día 8): $10-25.
214
- - Generación de prompts de Flux para los pasos (Día 4): $5-10.
215
- - Reserva: $30.
216
-
217
- ---
218
-
219
- ## 6. Riesgos y mitigaciones
220
-
221
- | Riesgo | Impacto | Mitigación |
222
- |---|---|---|
223
- | Flux.2 Klein no tiene API/pesos públicos cuando lo necesitas | Bloquea idea | Plan B: Flux.1-schnell o SDXL-Lightning. Pierdes tag sponsor pero idea sobrevive. |
224
- | MiniCPM-V no identifica ingredientes mexicanos (chile poblano, chayote, nopales) | Recipe Planner falla | Agrega few-shot examples al prompt; eventualmente fine-tune del visión sobre 50 fotos etiquetadas |
225
- | Flux.2 genera comida poco apetitosa/uncanny | Mata el demo | Itera prompts (style="recipe magazine, warm light, top-down"); usa imagen de plato final como ref para los pasos |
226
- | Latencia: receta completa tarda más de 30s en generarse | Demo aburrido | Streaming progresivo (muestra opción + plato final primero, pasos después); paraleliza Flux + TTS |
227
- | Modal cold start ~30-60s en Flux | Primera demo lenta | Pre-warm 30s antes de filmar; `keep_warm=1` el día del demo |
228
- | Validador de progreso da false positives ("vas bien" cuando no) | Confunde al usuario | Conservador: solo dice "vas bien" si la similitud es muy alta; default es "sigue" sin juicio fuerte |
229
- | TTS español sin acento mexicano | Suena raro | Si OpenBMB no tiene es-MX, usa Cohere o Kokoro con voz neutra; pre-graba para video |
230
- | Usuario del demo cocina mal/se quema | Mata el video | Practica la receta una vez antes de filmar; ten 2-3 candidatos de receta listos |
231
- | Otro equipo presenta "recipe app con AI" | Compite por premios | Diferénciate con: closed-loop visual + español + cocina mexicana específica + dataset publicado + persona real cocinando + Workflow visible |
232
- | Workflows de Gradio inestable (lanzado ayer) | Rompe app | Versión sin Workflows como backup. Workflows es decoración. |
233
-
234
- ---
235
-
236
- ## 7. Plan B — corte de scope
237
-
238
- Si en Día 7 ves que no llegas, recorta features en este orden:
239
-
240
- | # | Cortar | Pierdes | Conservas |
241
- |---|---|---|---|
242
- | 1 | STT (preguntas hands-free por voz) | comodidad demo | input por texto + foto |
243
- | 2 | 2da voz (Cohere tip-giver) | un sponsor de voz | narrador único |
244
- | 3 | Progress Validator (closed-loop) | **Best Agent badge** + innovación principal | demo lineal sin loop |
245
- | 4 | Fine-tune del Planner | **Well-Tuned badge** | base model + prompting |
246
- | 5 | Gradio Workflows showcase | diferenciador "fresh" | pipeline Python |
247
- | 6 | UI super-custom | **Off-Brand badge** | UI default |
248
-
249
- **NUNCA cortar:**
250
- - Vision + Planner + Step Illustrator + Narrator + UI mínima + video con persona real cocinando.
251
-
252
- Eso solo ya entra fuerte a Backyard AI track.
253
-
254
- ---
255
-
256
- ## 8. Métricas de éxito (auto-evaluación pre-submit)
257
-
258
- Antes de mandar:
259
-
260
- - [ ] Una persona real cocinó una receta entera con la app y se la comió.
261
- - [ ] El video tiene una cara humana y un plato terminado en al menos 30s de los 90s.
262
- - [ ] La app identifica correctamente ≥4 de 5 ingredientes en una foto típica de refri.
263
- - [ ] Las imágenes de Flux para los pasos se ven *apetitosas* (test: si las muestras a alguien sin contexto, dice "se ve rico").
264
- - [ ] Una receta completa se genera en menos de 30s (texto + 5 imágenes + audio).
265
- - [ ] El Progress Validator funciona en al menos 5 de 10 fotos de progreso reales.
266
- - [ ] El README tiene un diagrama y la sección "How closed-loop cooking works".
267
- - [ ] Hay 3 recetas pre-renderizadas listas para que jueces las vean sin esperar.
268
- - [ ] Total params declarado y verificado ≤ 32B.
269
- - [ ] Sin secrets hardcoded.
270
-
271
- Si fallas más de 2, no submitas; arregla.
272
-
273
- ---
274
-
275
- ## 9. Lo que NO debes hacer
276
-
277
- - **No** intentes generar video del platillo. Imagen estática se ve mejor que video AI mediocre.
278
- - **No** hagas más de 7 pasos por receta. Atención del juez = 60-90s.
279
- - **No** soportes 100 recetas. Soporta 20 recetas mexicanas excelentes y di "más recetas pronto".
280
- - **No** subas fotos del refri real con productos identificables (marcas, info personal). Borra labels.
281
- - **No** persigas Off the Grid. Decisión ya tomada.
282
- - **No** dejes el video de demo para el último día sin practicar la receta antes.
283
- - **No** publiques tokens en el repo.
284
- - **No** generes recetas con ingredientes raros que la mayoría no tenga (cocina accesible > cocina chef).
285
-
286
- ---
287
-
288
- ## 10. Pitch del README (esqueleto)
289
-
290
- ```markdown
291
- # Cocina Conmigo
292
- > A visual sous-chef that sees what's in your fridge,
293
- > shows you what each step should look like, and walks you through it
294
- > with voice — hands-free.
295
-
296
- [60-second demo video embed: tu mamá cocinando tinga]
297
-
298
- ## Why it shouldn't exist (but does)
299
- Every recipe app is a list of steps. Cocina Conmigo is a closed-loop assistant:
300
- it generates the *target image* of each cooking step with Flux.2, listens
301
- when you ask "¿voy bien?", and adapts when you say "no tengo cilantro."
302
-
303
- ## Tech
304
- - 👁️ MiniCPM-V — sees your fridge + validates your progress
305
- - 🧠 MiniCPM-4 4B (LoRA fine-tuned on Mexican cuisine) — recipe planner
306
- - 🎨 Flux.2 Klein 9B (Modal endpoint) — generates target images per step
307
- - 🔊 OpenBMB voice — sous-chef narrator
308
- - 🎭 Cohere voice — tip-giver second voice
309
- - 🎙️ Whisper-tiny — voice input
310
- - ⚙️ Gradio Workflows — visible pipeline of nodes
311
-
312
- Total params: ~17B (≤ 32B ✓)
313
-
314
- ## Badges
315
- ✓ Llama Champion · ✓ Well-Tuned · ✓ Off-Brand · ✓ Sharing is Caring · ✓ Field Notes
316
-
317
- ## Built for
318
- My mom. She makes great mole. She can never remember tinga.
319
-
320
- ## Try it
321
- [HF Space link]
322
- ```
323
-
324
- ---
325
-
326
- ## 11. Apéndice: snippets clave
327
-
328
- ### 11.1 Mise en Place agent (vision)
329
- ```python
330
- def identify_ingredients(image: PIL.Image) -> list[str]:
331
- prompt = """Veo esta foto de un refrigerador o despensa.
332
- Lista TODOS los ingredientes que se ven, en español, en JSON:
333
- {"ingredients": ["pollo", "cebolla", "cilantro", ...]}
334
- Solo ingredientes alimentarios, no contenedores."""
335
- out = mini_cpm_v.create_chat_completion(messages=[
336
- {"role": "user", "content": [
337
- {"type": "image_url", "image_url": pil_to_data_url(image)},
338
- {"type": "text", "text": prompt}
339
- ]}
340
- ])
341
- return json.loads(out["choices"][0]["message"]["content"])["ingredients"]
342
- ```
343
-
344
- ### 11.2 Recipe Planner agent (LLM)
345
- ```python
346
- SYS = """Eres un chef mexicano. Generas recetas a partir de ingredientes
347
- disponibles. Prefiere cocina mexicana tradicional, accesible.
348
-
349
- Salida JSON estricta:
350
- {
351
- "name": "...",
352
- "options": [{"name": "...", "why": "..."}],
353
- "steps": [
354
- {"n": 1, "instruction": "...", "duration": "4 min",
355
- "visual": "english visual description for image gen",
356
- "tip": "optional warning or tip"}
357
- ],
358
- "missing": ["cilantro"],
359
- "substitutes": {"cilantro": ["perejil", "nada"]},
360
- "final_dish_visual": "english visual description of the final plated dish"
361
- }
362
- """
363
-
364
- def plan_recipe(ingredients, choice=None):
365
- msgs = [{"role": "system", "content": SYS}]
366
- msgs.append({"role": "user", "content":
367
- f"Tengo: {', '.join(ingredients)}.\n"
368
- + (f"Quiero hacer: {choice}." if choice else "Propón 3 opciones.")})
369
- raw = llm.create_chat_completion(messages=msgs, temperature=0.7)
370
- return json.loads(raw["choices"][0]["message"]["content"])
371
- ```
372
-
373
- ### 11.3 Step Illustrator (Flux endpoint)
374
- ```python
375
- import modal
376
- app = modal.App("cocina-flux")
377
- image = modal.Image.debian_slim().pip_install("torch","diffusers","transformers","accelerate","Pillow")
378
-
379
- @app.cls(image=image, gpu="L4", scaledown_window=180, keep_warm=0)
380
- class FluxKlein:
381
- @modal.enter()
382
- def load(self):
383
- from diffusers import FluxPipeline
384
- self.pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.2-klein",
385
- torch_dtype="bfloat16").to("cuda")
386
-
387
- @modal.method()
388
- def render_step(self, visual: str, ref_png: bytes | None = None) -> bytes:
389
- from PIL import Image; import io
390
- prompt = (f"Top-down photo of a kitchen pan or plate showing {visual}. "
391
- f"Mexican home cooking, warm natural lighting, recipe magazine "
392
- f"style, photorealistic, appetizing.")
393
- if ref_png:
394
- ref = Image.open(io.BytesIO(ref_png)).convert("RGB")
395
- img = self.pipe(prompt=prompt, image=ref, strength=0.6,
396
- num_inference_steps=4).images[0]
397
- else:
398
- img = self.pipe(prompt=prompt, num_inference_steps=4).images[0]
399
- buf = io.BytesIO(); img.save(buf, "PNG"); return buf.getvalue()
400
- ```
401
-
402
- ### 11.4 Progress Validator (closed-loop)
403
- ```python
404
- def validate_progress(target_image: PIL.Image, user_photo: PIL.Image,
405
- step_instruction: str) -> dict:
406
- prompt = f"""Compara estas dos fotos de cocina.
407
- Foto 1 (objetivo): cómo debe verse después del paso "{step_instruction}".
408
- Foto 2 (usuario): cómo va el usuario.
409
-
410
- Responde en JSON:
411
- {{"verdict": "go|wait|fix", "feedback_es": "...", "tip": "..." | null}}
412
- - "go": va bien, siguiente paso
413
- - "wait": le falta tiempo
414
- - "fix": algo se ve mal, sugiere ajuste
415
- """
416
- out = mini_cpm_v.create_chat_completion(messages=[
417
- {"role": "user", "content": [
418
- {"type": "image_url", "image_url": pil_to_data_url(target_image)},
419
- {"type": "image_url", "image_url": pil_to_data_url(user_photo)},
420
- {"type": "text", "text": prompt}
421
- ]}
422
- ])
423
- return json.loads(out["choices"][0]["message"]["content"])
424
- ```
425
-
426
- ### 11.5 UI Off-Brand (recipe card)
427
- ```python
428
- import gradio as gr
429
-
430
- CSS = """
431
- @import url('https://fonts.googleapis.com/css2?family=Lora:wght@400;700&family=Inter:wght@400;600&display=swap');
432
- .gradio-container {background: #f5ecd9 !important; font-family: 'Inter', sans-serif !important;}
433
- .recipe-hero {background: #fffbf0; border-radius: 14px; padding: 28px;
434
- box-shadow: 0 8px 24px rgba(0,0,0,0.12); border: 1px solid #d8c9ad;}
435
- .recipe-hero h1 {font-family: 'Lora', serif !important; font-size: 36px !important;
436
- margin: 0 0 6px !important; color: #6b4a2a !important;}
437
- .step-card {background: #fffbf0; border-left: 4px solid #a85c2a;
438
- border-radius: 8px; padding: 18px 22px; margin: 12px 0;}
439
- .step-card h3 {font-family: 'Lora', serif !important; margin: 0 !important;}
440
- .step-card p {font-size: 17px !important; line-height: 1.6;}
441
- button.primary {background: #a85c2a !important; font-family: 'Inter', sans-serif !important;
442
- font-weight: 600 !important; font-size: 16px !important; padding: 14px 22px !important;}
443
- """
444
-
445
- with gr.Blocks(css=CSS, title="Cocina Conmigo") as demo:
446
- gr.Markdown("# 👩‍🍳 Cocina Conmigo")
447
- fridge = gr.Image(label="📸 Foto de tu refri o despensa", type="pil")
448
- btn = gr.Button("¿Qué cocino?", variant="primary")
449
- with gr.Column(elem_classes=["recipe-hero"]):
450
- title = gr.Markdown()
451
- final_img = gr.Image(show_label=False)
452
- steps_box = gr.Column()
453
- progress = gr.Image(label="📸 Tómame foto de tu progreso", type="pil")
454
- feedback = gr.Markdown()
455
- # callbacks omitidos
456
- ```
457
-
458
- ### 11.6 LoRA fine-tune del Planner en Modal
459
- ```python
460
- @app.function(image=image_train, gpu="A10G", timeout=60*60*2,
461
- volumes={"/cache": modal.Volume.from_name("hf-cache", create_if_missing=True)})
462
- def train_planner():
463
- import os; os.environ["HF_HOME"] = "/cache"
464
- from transformers import AutoModelForCausalLM, AutoTokenizer
465
- from peft import LoraConfig, get_peft_model
466
- from trl import SFTTrainer, SFTConfig
467
- from datasets import load_dataset
468
-
469
- base = "openbmb/MiniCPM-4-Base"
470
- tok = AutoTokenizer.from_pretrained(base, trust_remote_code=True)
471
- model = AutoModelForCausalLM.from_pretrained(base, trust_remote_code=True,
472
- device_map="cuda", torch_dtype="bfloat16")
473
- model = get_peft_model(model, LoraConfig(r=16, lora_alpha=32,
474
- target_modules="all-linear"))
475
- ds = load_dataset("tu-usuario/recetas-mexicanas-sft", split="train")
476
- SFTTrainer(model=model, tokenizer=tok, train_dataset=ds,
477
- args=SFTConfig(output_dir="/cache/out", num_train_epochs=2,
478
- per_device_train_batch_size=4, learning_rate=2e-4,
479
- push_to_hub=True,
480
- hub_model_id="tu-usuario/cocinaconmigo-4b-mx")
481
- ).train()
482
- ```
483
-
484
- ---
485
-
486
- ## 12. Lectura recomendada antes del Día 1
487
-
488
- - `Context/guia-tecnologias.md` (sección 3 Modal, sección 4 llama.cpp).
489
- - HF Black Forest Labs: <https://huggingface.co/black-forest-labs> — confirma versión Flux.2 Klein.
490
- - HF MiniCPM-V: <https://huggingface.co/openbmb> — versión vision con GGUF.
491
- - Modal stable-diffusion example: <https://github.com/modal-labs/modal-examples/tree/main/06_gpu_and_ml/stable_diffusion>.
492
- - Diffusers img2img: <https://huggingface.co/docs/diffusers/using-diffusers/img2img>.
493
- - Gradio Workflows: <https://www.gradio.app/guides> (busca el guide más reciente).
494
- - Cohere Labs voice: confirma con sponsor el modelo exacto disponible.
495
-
496
- > Cocina con tu mamá una vez antes de empezar a programar. Te va a aclarar más sobre qué necesita tu app que cualquier brainstorm. Suerte.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Strategy/plan.md DELETED
@@ -1,245 +0,0 @@
1
- # Plan ganador — "Cocina Conmigo"
2
-
3
- > Un sous-chef multimodal que ve lo que tienes en el refri, te dice qué cocinar, te muestra cómo debe verse cada paso con Flux.2, y te narra todo por voz mientras cocinas con las manos llenas.
4
- >
5
- > Hackathon "Small models / Big adventures" — junio 2026.
6
-
7
- ---
8
-
9
- ## TL;DR
10
-
11
- **Idea elegida:** **Cocina Conmigo** — un copiloto de cocina hands-free que combina visión, razonamiento, generación de imagen en tiempo real, y voz, para acompañarte de principio a fin: desde *"¿qué cocino con esto?"* hasta *"¿voy bien?"*.
12
-
13
- **Por qué esta y no otra:** es la única idea que **(1) está fuera de las 11 ideas pre-cocinadas por OpenBMB**, **(2) usa Flux.2 + voces + Workflows como núcleo**, y **(3) tiene utilidad real, diaria y universal**. Nadie cocina como hobby; todos cocinan por necesidad.
14
-
15
- ---
16
-
17
- ## Por qué cambió el plan respecto a iteraciones anteriores
18
-
19
- | Iteración | Idea | Por qué se descartó |
20
- |---|---|---|
21
- | v1 | Abuelita (parent phone helper) | **Está en la lista pre-cocinada de OpenBMB para Backyard AI.** 5-15 equipos van a hacer la misma cosa. |
22
- | v2 | Cuentacuentos (storyteller ilustrado) | **Está en la lista pre-cocinada de OpenBMB para Thousand Token Wood ("voice storyteller").** Mismo problema de saturación. |
23
- | v3 (ésta) | **Cocina Conmigo** | Refinamiento de **tu propia idea #1**, ahora viable de verdad gracias a Flux.2. **No está en ninguna lista pre-cocinada.** |
24
-
25
- La regla estratégica: **usar los modelos de los sponsors, no copiar sus templates de proyecto.**
26
-
27
- ---
28
-
29
- ## Las 12 ideas en zona prohibida (clúster OpenBMB)
30
-
31
- | Backyard AI | Thousand Token Wood |
32
- |---|---|
33
- | Parent phone helper | Voice storyteller |
34
- | Receipt / bill explainer | Visual mystery box |
35
- | Shop menu / repair manual | AI museum |
36
- | Offline personal assistant / voice companion | Doodle creature |
37
- | | Dream postcard gen |
38
- | | Omni-modal adventure |
39
- | | Tiny local NPC / character agent |
40
-
41
- Y de tus 5 ideas originales, también caen:
42
- - #3 cortes de cabello (tú mismo dijiste "ya está muy trabajado")
43
- - #4 museum Q&A (choca con "AI museum")
44
-
45
- **Quedan vivas, fuera de zona prohibida:**
46
- - #1 Recetas (→ **Cocina Conmigo**, esta propuesta)
47
- - #2 Detector de intenciones (no usa Flux.2, demo aburrida)
48
- - #5 Outfits con armario (alternativa B, ver final del documento)
49
-
50
- ---
51
-
52
- ## El producto en una frase
53
-
54
- > *"Mi mamá me pidió que le enseñara a hacer ramen. Le construí un sous-chef que vive en su tablet."*
55
-
56
- ---
57
-
58
- ## Las 4 historias del demo
59
-
60
- ### 1. *"Tengo esto en el refri"*
61
- ```
62
- 👩 Mamá toma foto del refri abierto.
63
- 🤖 [MiniCPM-V] "Veo: pollo, cebolla, jitomate, cilantro, tortillas, queso."
64
- 🤖 [LLM] "Te puedo proponer: tinga de pollo, enchiladas, o quesadillas. ¿Qué traes ganas?"
65
- 👩 "Tinga."
66
- 🤖 [Flux.2] genera foto del platillo final, hermosa, mexicana.
67
- 🤖 "Perfecto. Te tomará 35 minutos. ¿Empezamos?"
68
- ```
69
-
70
- ### 2. *"Cocina paso a paso"* (hands-free)
71
- ```
72
- 🤖 [Flux.2] muestra: olla con cebolla acitronándose
73
- 🤖 [Voz OpenBMB] "Pica la cebolla en cubitos chicos y ponla en aceite caliente."
74
- 👩 (cocinando, manos sucias)
75
- 👩 "¿Cuánto tiempo?"
76
- 🤖 [Voz] "Hasta que se vea transparente. Como 4 minutos."
77
- ```
78
-
79
- ### 3. *"¿Voy bien?"* (visión en loop)
80
- ```
81
- 👩 (toma foto del sartén con cebolla)
82
- 🤖 [MiniCPM-V] compara contra imagen objetivo.
83
- 🤖 [Voz Cohere — el "tip-giver"] "Le falta un poquito. Súbele 1 minuto más, está bien."
84
- ```
85
-
86
- ### 4. *"No tengo cilantro"* (replan adaptativo)
87
- ```
88
- 👩 "No tengo cilantro."
89
- 🤖 [LLM] re-planea sobre la marcha.
90
- 🤖 [Voz] "No pasa nada. Le ponemos perejil o nada. Sigue siendo tinga."
91
- 🤖 [Flux.2] regenera la foto del plato final, ahora sin cilantro.
92
- ```
93
-
94
- Las 4 historias usan los **mismos 5 modelos**. Una sola pipeline.
95
-
96
- ---
97
-
98
- ## Por qué este plan **gana** este hackathon
99
-
100
- ### 1. "Build for someone you actually know" → Backyard AI track
101
- La descripción literal del track dice: *"Solve a real problem for someone you actually know. Pick a person — a neighbor, a parent, a small-business owner..."*. Tu mamá. Tu hermana. Tu hermano que vive solo. **Todos** cocinan. Pocas apps de hackathon van a tener un usuario tan cercano y tan recurrente.
102
-
103
- ### 2. Aprovecha **todos** los assets sponsor sin copiar templates
104
- | Asset | Cómo se usa |
105
- |---|---|
106
- | **Flux.2 Klein 9B** (sponsor) | Genera la imagen-objetivo del platillo + "esto debes ver" en cada paso · i2i para ajustes |
107
- | **MiniCPM-V** (OpenBMB) | Visión: identifica ingredientes + valida progreso ("¿voy bien?") |
108
- | **MiniCPM razonamiento** (OpenBMB) | Recipe Planner: arma receta + replan adaptativo |
109
- | **OpenBMB voice / TTS** | Voz principal del sous-chef (cálida, paciente) |
110
- | **Cohere Labs voice** (sponsor) | Segunda voz: tips, advertencias ("¡cuidado, se quema!") |
111
- | **Whisper-tiny** | STT: preguntas hands-free mientras cocinas |
112
- | **Gradio Workflows** | UI de nodos visible: Vision → Planner → Illustrator → Narrator → Validator |
113
- | **Modal $250** | Hostea Flux.2 en GPU + dataset sintético + LoRA fine-tune |
114
- | **OpenAI Codex $100** | Pair-programmer y generador de dataset de recetas |
115
-
116
- Todos los sponsors tocados. Cero ideas copiadas.
117
-
118
- ### 3. **Innovación técnica concreta**: el bucle visual cerrado
119
- La mayoría de "recipe apps" del mundo son listas de pasos. Cocina Conmigo introduce un **closed-loop visual**:
120
-
121
- ```
122
- [Flux.2 muestra paso ideal] ──▶ [Usuario cocina]
123
- ▲ │
124
- │ ▼
125
- [LLM ajusta plan] ◀── [MiniCPM-V valida foto del usuario]
126
- ```
127
-
128
- Esto es un agente real, no un wrapper. Best Agent badge en juego.
129
-
130
- ### 4. Demo apetitoso = video viral
131
- Persona real cocinando + voz cálida + ilustraciones live + "¡me quedó igual!" + plato final que se come frente a la cámara. Best Demo + Community Choice por inercia. **Nadie va a recordar la submission #14 de "voice storyteller"; van a recordar el video donde tu mamá hace tinga con AI.**
132
-
133
- ### 5. Diferenciación cultural sostenible
134
- - **Español-mexicano-first** — diferenciador en hackathon US-céntrico.
135
- - **Cocina mexicana** como dataset de fine-tune — territorio que pocos van a tocar.
136
- - "Para mi mamá" como historia: emocional + universal.
137
-
138
- ---
139
-
140
- ## Arquitectura (resumen — ver `arquitectura.html`)
141
-
142
- 5 nodos en un Gradio Workflow visible:
143
-
144
- ```
145
- [📸/🎙️ Input] ──▶ [👁️ Vision MiniCPM-V] ──▶ [🧠 Recipe Planner] ──▶ [🎨 Step Illustrator Flux.2]
146
-
147
-
148
- [🔊 Sous-Chef Narrator OpenBMB] + [🎭 Tip-Giver Cohere]
149
-
150
-
151
- [✅ Progress Validator] ──▶ loop al usuario
152
- ```
153
-
154
- | Nodo | Modelo | Tamaño | Rol |
155
- |---|---|---|---|
156
- | Vision In | MiniCPM-V 2.6 / 4 (Q4 GGUF) | ~2-4B | Identifica ingredientes + valida progreso |
157
- | Planner | MiniCPM-4 4B (LoRA en cocina mexicana) | ~4B | Genera receta JSON estructurado · replan |
158
- | Illustrator | Flux.2 Klein 9B (Modal GPU) | 9B | Imagen final + paso-a-paso, i2i para consistencia |
159
- | Narrator | OpenBMB voice / Kokoro | ~1B | Voz principal: instrucciones |
160
- | Tip-Giver | Cohere Labs voice | ~1B | Segunda voz: warnings, encouragement |
161
- | STT (opcional) | Whisper-tiny | ~40M | "¿voy bien?" "¿cuánto tiempo?" |
162
-
163
- **Total: ~17B parámetros** (cap 32B ✓)
164
-
165
- **Donde corre:**
166
- - Vision, Planner, voces, STT → CPU del HF Space (llama.cpp + bindings ligeros)
167
- - **Flux.2 → endpoint Modal con GPU L4** (no aguanta CPU del Space)
168
-
169
- > Mismo tradeoff que los planes anteriores: **rompemos Off the Grid** intencionalmente para preservar calidad de imagen y latencia. A cambio calificamos para Modal Awards.
170
-
171
- ---
172
-
173
- ## Badges objetivo (5/6)
174
-
175
- | Badge | Cómo |
176
- |---|---|
177
- | ✓ **Llama Champion** | Vision + Planner via `llama-cpp-python` con GGUF Q4 |
178
- | ✓ **Well-Tuned** | LoRA del Planner en dataset de cocina mexicana, publicado en HF |
179
- | ✓ **Off-Brand** | UI estilo "tarjeta de receta" + modo cocina hands-free, no parece Gradio default |
180
- | ✓ **Sharing is Caring** | Dataset de recetas mexicanas + agent traces + recetas generadas, todo al Hub |
181
- | ✓ **Field Notes** | Blog: "Le construí un sous-chef a mi mamá" |
182
- | ✗ **Off the Grid** | Sacrificio consciente: Flux.2 corre en Modal |
183
-
184
- 5 badges + Modal-powered fuerte = competitivo para **Bonus Quest Champion ($2K)**.
185
-
186
- ---
187
-
188
- ## Premios objetivo (proyección)
189
-
190
- | Premio | Probabilidad | Por qué |
191
- |---|---|---|
192
- | **Backyard AI Track** ($1K–$4K) | **Alta** | Idea es texto literal del track. Demo emocional. |
193
- | **Modal Awards** ($3K–$10K credits) | **Alta** | Flux en Modal en runtime + entrenamiento offline. Modal-powered de manual. |
194
- | **OpenBMB Award** ($1K–$2.5K) | **Alta** | Usa modelos OpenBMB en 3 roles (vision, planner, voice) sin copiar template |
195
- | **Best Demo** ($1K) | **Alta** | Persona cocinando + comida final + voz = video apetitoso |
196
- | **Community Choice** ($2K) | **Alta** | Apela a memoria emocional universal (tu mamá cocinando) |
197
- | **Bonus Quest Champion** ($2K) | Media-alta | 5/6 badges es competitivo |
198
- | **Best Agent** ($1K) | Media-alta | Closed-loop multi-agent real (5 agentes) |
199
- | **Off-Brand** ($1.5K) | Media | UI tarjeta-de-receta tiene buenas chances |
200
- | **Tiny Titan** ($1.5K) | Baja | Flux.2 9B nos saca del rango ≤4B |
201
-
202
- **Cota razonable acumulada:** $5K–$12K cash + $3K–$10K Modal credits.
203
-
204
- ---
205
-
206
- ## Las 3 condiciones que pone Idea.md
207
-
208
- | Condición | Cómo se cumple |
209
- |---|---|
210
- | **Innovador** | Closed-loop visual (Flux genera ideal → usuario cocina → vision valida → planner ajusta) — no existe en apps de receta |
211
- | **Fresco** | Combina Flux.2 (nuevo) + Workflows (lanzado ayer) + voces multi-sponsor + cocina hands-free. Ninguna submission tendrá esa combinación. |
212
- | **Útil** | Cocinar es diario, universal, recurrente. La app reemplaza Google + YouTube + adivinar. |
213
-
214
- ---
215
-
216
- ## Decisiones que tienes que tomar tú
217
-
218
- | Decisión | Recomendación |
219
- |---|---|
220
- | ¿Cocina Conmigo o Mi Espejo (outfits)? | **Cocina.** Menor riesgo técnico (Flux generando platos > generando personas reales con ropa). Más universal. |
221
- | ¿Cocina mexicana o cocina general? | **Mexicana.** Diferenciador + fine-tune en dataset acotado y rico. |
222
- | ¿Persona real para el demo? | **Sí, no negociable.** Tu mamá, tu pareja, tu vecina. Que coma frente a la cámara al final. |
223
- | ¿Empiezas con texto o con voz/foto? | **Empieza con foto del refri + texto.** Voz se agrega en Día 7-9. |
224
- | ¿Cuántos pasos por receta? | 5-7 pasos. Más es muy largo para el demo, menos no es una receta. |
225
-
226
- ---
227
-
228
- ## Plan B — alternativa "Mi Espejo"
229
-
230
- Si por cualquier razón Cocina Conmigo no avanza (ej. Flux.2 genera platillos feos consistentemente), pivota a **"Mi Espejo"** (refinamiento de tu idea #5):
231
-
232
- - 📸 Subes foto tuya + fotos del armario.
233
- - 🧠 Stylist LLM combina outfits según ocasión + tendencia.
234
- - 🎨 **Flux.2 i2i te genera vistiendo cada combinación.**
235
- - 🔊 Voz comenta el look.
236
-
237
- Mismas badges, mismo track (Backyard), pero más alto wow visual y más alto riesgo (uncanny valley con personas reales). **Es plan B**, no plan A.
238
-
239
- ---
240
-
241
- ## Siguiente paso
242
-
243
- Lee **`estrategia.md`** (timeline 10 días, gasto Modal/Codex, riesgos+mitigaciones, snippets) y **`arquitectura.html`** (diagrama del sistema + las 4 historias del demo + Workflow visual). Luego abre Codex CLI y haz el "hola mundo" del Día 1: un endpoint Modal que devuelve una imagen Flux.2 de un platillo dado un nombre de receta.
244
-
245
- > *"Cocinar es la última cosa que la IA debería poder ayudarte a hacer bien. Y por eso es la mejor cosa que puedes ganar haciendo."*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Strategy/plan_implementacion.md DELETED
@@ -1,674 +0,0 @@
1
- # Implementation Plan — "Cook With Me"
2
-
3
- > Step-by-step implementation guide for developers building the multimodal cooking sous-chef Gradio app for Hugging Face Spaces.
4
- >
5
- > **Hackathon:** Small models / Big adventures — June 2026
6
- > **Read first:** `plan.md` (the *what* and *why*) and `estrategia.md` (the *how* at a strategic level). This document is the *how* at a tactical level — turn this into code.
7
-
8
- ---
9
-
10
- ## 0. Locked decisions (do not re-discuss)
11
-
12
- | Decision | Value | Reason |
13
- |---|---|---|
14
- | UI framework | **Gradio** | Hackathon requirement |
15
- | Hosting | **Hugging Face Space** | Hackathon requirement |
16
- | Inference runtime (text + vision) | **llama.cpp** via `llama-cpp-python` | Runs inside the Space CPU, no external APIs needed for now. Future: migrate to Modal |
17
- | Image generation | **FLUX.2 Klein 9B** (`black-forest-labs/FLUX.2-klein-9B`) | Sponsor model; runs in the Space if a GPU Space is rented (or via `enable_model_cpu_offload()` as fallback). Plan to migrate this specific component to Modal post-hackathon |
18
- | Recipe planner / reasoning | **`openbmb/MiniCPM-V-4`** (GGUF) | Provided requirement |
19
- | Vision (ingredient ID + progress validator) | **`openbmb/MiniCPM-V-4.6`** (GGUF) | Provided requirement |
20
- | Text-to-speech | **OpenBMB VoxCPM2** | Provided requirement |
21
- | Recipe dataset | **`thedevastator/better-recipes-for-a-better-life`** (Kaggle) — international cuisine | Provided requirement; not limited to Mexican food |
22
- | App language | **English only** | Provided requirement |
23
- | Final output | **Recipe + step images + voice + nutritional values** | Provided requirement |
24
- | External API calls at runtime | **None** | "llama.cpp inside the Space" mandate |
25
-
26
- ---
27
-
28
- ## 1. Architecture (final, English-only, llama.cpp-first)
29
-
30
- ```
31
- ┌──────────────────────────────────────┐
32
- │ Hugging Face Space (Gradio) │
33
- │ (CPU + optional GPU upgrade) │
34
- ├──────────────────────────────────────┤
35
- 📸 Fridge photo ─────▶│ [Vision Agent] │
36
- │ MiniCPM-V-4.6 GGUF (llama.cpp) │
37
- │ → list[ingredient] │
38
- │ │ │
39
- │ ▼ │
40
- 🥘 User picks dish ───▶│ [Recipe Planner] │
41
- │ MiniCPM-V-4 GGUF (llama.cpp) │
42
- │ + retrieval over Kaggle dataset │
43
- │ → Recipe JSON (steps, nutrition) │
44
- │ │ │
45
- │ ▼ │
46
- │ [Step Illustrator] │
47
- │ FLUX.2 Klein 9B (diffusers) │
48
- │ → PNG per step + final dish │
49
- │ │ │
50
- │ ▼ │
51
- │ [Narrator] │
52
- │ VoxCPM2 → MP3 per step │
53
- │ │ │
54
- │ ▼ │
55
- 📸 Progress photo ────▶│ [Progress Validator] │
56
- │ MiniCPM-V-4.6 (vision compare) │
57
- │ → "go / wait / fix" + tip │
58
- └──────────────────────────────────────┘
59
- ```
60
-
61
- **Total parameter count (≤ 32B requirement):**
62
- - MiniCPM-V-4 (reasoning) ≈ 4B
63
- - MiniCPM-V-4.6 (vision) ≈ 4.6B
64
- - FLUX.2 Klein ≈ 9B
65
- - VoxCPM2 ≈ 1B (estimate)
66
- - **Total ≈ 18.6B ✓**
67
-
68
- ---
69
-
70
- ## 2. Repository layout
71
-
72
- ```
73
- cook-with-me/
74
- ├── app.py # Gradio entrypoint (Space looks for this)
75
- ├── requirements.txt
76
- ├── packages.txt # apt packages (ffmpeg, libsndfile1)
77
- ├── README.md # Space card (HF requires YAML frontmatter)
78
- ├── .gitignore
79
- ├── src/
80
- │ ├── __init__.py
81
- │ ├── config.py # paths, model IDs, constants
82
- │ ├── models/
83
- │ │ ├── __init__.py
84
- │ │ ├── vision.py # MiniCPM-V-4.6 wrapper (llama-cpp)
85
- │ │ ├── planner.py # MiniCPM-V-4 wrapper (llama-cpp)
86
- │ │ ├── illustrator.py # FLUX.2 Klein wrapper (diffusers)
87
- │ │ ├── narrator.py # VoxCPM2 wrapper
88
- │ │ └── loader.py # lazy singletons + GGUF download
89
- │ ├── agents/
90
- │ │ ├── mise_en_place.py # ingredient identification
91
- │ │ ├── recipe_planner.py # builds Recipe object
92
- │ │ ├── step_illustrator.py # per-step image gen
93
- │ │ ├── narrator.py # per-step TTS
94
- │ │ └── progress_validator.py
95
- │ ├── data/
96
- │ │ ├── recipe_index.py # loads Kaggle dataset, builds retrieval
97
- │ │ └── nutrition.py # USDA-style nutrition computation
98
- │ ├── pipeline.py # Recipe state machine, orchestration
99
- │ ├── prompts/
100
- │ │ ├── vision_prompt.txt
101
- │ │ ├── planner_system.txt
102
- │ │ └── validator_prompt.txt
103
- │ └── ui/
104
- │ ├── theme.py # custom CSS (Off-Brand badge)
105
- │ └── components.py # reusable Gradio Blocks pieces
106
- ├── scripts/
107
- │ ├── download_models.py # pre-warms GGUF + Flux weights at build time
108
- │ ├── build_recipe_index.py # caches Kaggle dataset locally
109
- │ └── smoke_test.py # end-to-end validation before push
110
- └── assets/
111
- ├── sample_fridge_1.jpg
112
- └── sample_progress_1.jpg
113
- ```
114
-
115
- ---
116
-
117
- ## 3. Phase-by-phase plan (10 days)
118
-
119
- > Each phase has: **goal**, **tasks**, **deliverable**, **verification check**. Do not move to the next phase if verification fails.
120
-
121
- ---
122
-
123
- ### Phase 0 — Day 0 (½ day): Account + tooling setup
124
-
125
- **Goal:** every credential and CLI is ready before writing code.
126
-
127
- **Tasks**
128
- 1. Create or confirm Hugging Face account; generate a **write token** (Settings → Access Tokens). Store as `HF_TOKEN` env var locally.
129
- 2. Install Hugging Face CLI: `pip install -U huggingface_hub` then `huggingface-cli login`.
130
- 3. Install Kaggle CLI: `pip install kaggle`. Place `kaggle.json` (Account → API → Create New Token) in `~/.kaggle/kaggle.json` with `chmod 600`.
131
- 4. Install OpenAI Codex CLI (pair-programmer) and verify your $100 credit is active.
132
- 5. Install local Python 3.11 venv: `python -m venv .venv && source .venv/bin/activate`.
133
- 6. Create the repo locally: `git init cook-with-me && cd cook-with-me`.
134
- 7. Create an empty Hugging Face Space: huggingface.co → New Space → SDK = **Gradio**, Hardware = **CPU basic** (upgrade later if you need GPU for FLUX). Clone it and copy your repo skeleton into it.
135
- 8. Verify model availability: open in a browser and confirm pages exist:
136
- - `huggingface.co/openbmb/MiniCPM-V-4`
137
- - `huggingface.co/openbmb/MiniCPM-V-4-6`
138
- - `huggingface.co/openbmb/VoxCPM2` (or whatever the exact repo name is — search "VoxCPM" on HF)
139
- - `huggingface.co/black-forest-labs/FLUX.2-klein-9B`
140
-
141
- **Deliverable:** empty Space deployed showing "Hello World" Gradio.
142
-
143
- **Verify:** `https://huggingface.co/spaces/<you>/cook-with-me` loads.
144
-
145
- ---
146
-
147
- ### Phase 1 — Day 1: Project skeleton + recipe dataset ingestion
148
-
149
- **Goal:** the Kaggle dataset is downloaded, parsed, and cached as a local artifact ready for retrieval.
150
-
151
- **Tasks**
152
- 1. Write `requirements.txt` (initial version — packages will be added as phases progress):
153
- ```text
154
- gradio>=4.44
155
- huggingface_hub>=0.24
156
- llama-cpp-python>=0.3.2
157
- numpy
158
- pandas
159
- Pillow
160
- pydantic>=2
161
- sentence-transformers
162
- ```
163
- 2. Write `packages.txt`:
164
- ```text
165
- ffmpeg
166
- libsndfile1
167
- ```
168
- 3. Write `scripts/build_recipe_index.py`:
169
- - Use `kagglehub.load_dataset(KaggleDatasetAdapter.PANDAS, "thedevastator/better-recipes-for-a-better-life", file_path)` — discover `file_path` by listing the dataset files first via `kagglehub.dataset_download`.
170
- - Normalize columns: `name`, `ingredients` (list[str]), `instructions` (list[str]), `cuisine` (str if present, else "international"), `prep_time`, `servings`.
171
- - Drop rows missing critical fields. Lowercase + strip ingredient strings.
172
- - Save to `data/recipes.parquet` (~5–50MB depending on dataset size).
173
- - Build sentence embeddings of the recipe **name + first 3 ingredients** using `sentence-transformers/all-MiniLM-L6-v2` and save to `data/recipes_emb.npy`.
174
- - This script runs **once locally**; commit the parquet + npy files to the repo (or to a private HF Dataset, then download in `app.py`). If files exceed 100MB, push to a HF Dataset repo: `<you>/cook-with-me-recipes`.
175
- 4. Write `src/data/recipe_index.py`:
176
- - `class RecipeIndex` with `.search(ingredients: list[str], top_k=5) -> list[RecipeRow]`.
177
- - Build a query string from ingredients, embed it, cosine-similarity against the cached embeddings, return top-k.
178
-
179
- **Deliverable:** `python -c "from src.data.recipe_index import RecipeIndex; r=RecipeIndex(); print(r.search(['chicken','onion','tomato']))"` prints 5 sensible recipes.
180
-
181
- **Verify:** at least 3 of the top-5 results contain ≥2 of the input ingredients.
182
-
183
- ---
184
-
185
- ### Phase 2 — Day 2: Vision agent (Mise en Place) — MiniCPM-V-4.6 via llama.cpp
186
-
187
- **Goal:** given a fridge photo, return a clean list of English ingredient names.
188
-
189
- **Background:** llama.cpp supports multimodal models through a vision projector (`mmproj-*.gguf`) plus the language model GGUF. MiniCPM-V family ships both files on the Hub.
190
-
191
- **Tasks**
192
- 1. Find the GGUF release of MiniCPM-V-4.6. Search HF for `MiniCPM-V-4_6-gguf` or `openbmb/MiniCPM-V-4_6-gguf`. You need **two** files:
193
- - `Model-Q4_K_M.gguf` (or similar quant)
194
- - `mmproj-model-f16.gguf` (the vision projector)
195
- 2. Write `src/models/loader.py`:
196
- ```python
197
- from huggingface_hub import hf_hub_download
198
- from llama_cpp import Llama
199
- from llama_cpp.llama_chat_format import MiniCPMv26ChatHandler # or matching handler
200
-
201
- _vision = None
202
-
203
- def get_vision_model():
204
- global _vision
205
- if _vision is None:
206
- model_path = hf_hub_download(
207
- repo_id="openbmb/MiniCPM-V-4_6-gguf", # confirm exact repo
208
- filename="Model-Q4_K_M.gguf",
209
- )
210
- mmproj_path = hf_hub_download(
211
- repo_id="openbmb/MiniCPM-V-4_6-gguf",
212
- filename="mmproj-model-f16.gguf",
213
- )
214
- handler = MiniCPMv26ChatHandler(clip_model_path=mmproj_path)
215
- _vision = Llama(
216
- model_path=model_path,
217
- chat_handler=handler,
218
- n_ctx=4096,
219
- n_threads=4,
220
- verbose=False,
221
- )
222
- return _vision
223
- ```
224
- 3. Write `src/agents/mise_en_place.py`:
225
- ```python
226
- import base64, io, json
227
- from PIL import Image
228
- from src.models.loader import get_vision_model
229
-
230
- PROMPT = (
231
- "You are an ingredient detector. Look at the fridge/pantry photo and "
232
- "list every edible ingredient you can identify. Return strict JSON: "
233
- '{"ingredients": ["chicken", "onion", "tomato", ...]} '
234
- "Lowercase, English, no brand names, no containers."
235
- )
236
-
237
- def _img_to_data_url(img: Image.Image) -> str:
238
- buf = io.BytesIO(); img.save(buf, "JPEG", quality=85)
239
- b64 = base64.b64encode(buf.getvalue()).decode()
240
- return f"data:image/jpeg;base64,{b64}"
241
-
242
- def identify_ingredients(image: Image.Image) -> list[str]:
243
- llm = get_vision_model()
244
- out = llm.create_chat_completion(messages=[
245
- {"role": "user", "content": [
246
- {"type": "image_url", "image_url": {"url": _img_to_data_url(image)}},
247
- {"type": "text", "text": PROMPT},
248
- ]}
249
- ], temperature=0.2, response_format={"type": "json_object"})
250
- data = json.loads(out["choices"][0]["message"]["content"])
251
- return [s.lower().strip() for s in data["ingredients"]]
252
- ```
253
- 4. Test locally with 5 sample fridge photos.
254
-
255
- **Deliverable:** the function returns a non-empty English list with ≥80% precision on a clean fridge photo.
256
-
257
- **Verify:** stash these 5 results in `tests/vision_smoke.json` for regression checks.
258
-
259
- ---
260
-
261
- ### Phase 3 — Day 3: Recipe Planner — MiniCPM-V-4 via llama.cpp + retrieval
262
-
263
- **Goal:** given a list of ingredients (and optionally a chosen dish), return a fully structured `Recipe` JSON including steps, durations, visual descriptions, and nutritional values.
264
-
265
- **Tasks**
266
- 1. Find or convert MiniCPM-V-4 to GGUF. Likely repo: `openbmb/MiniCPM-V-4-gguf` or community quants. Pick `Q4_K_M`.
267
- 2. Add to `src/models/loader.py` a `get_planner_model()` (same pattern as vision but without `chat_handler`).
268
- 3. Write `src/agents/recipe_planner.py`:
269
- - **Step A — propose:** call planner with `Tengo: [ingredients]. Propose 3 dish options that fit. Reply JSON.`
270
- - **Step B — retrieve:** for the chosen dish name, call `RecipeIndex.search(...)` and pick the closest match. Use it as a *grounded reference*.
271
- - **Step C — restructure:** prompt the planner with both the user's available ingredients and the retrieved reference recipe, asking it to output the canonical `Recipe` JSON schema below. The retrieval grounds the model and prevents hallucinated steps.
272
- - **Step D — nutrition:** from the recipe ingredients, compute approximate nutritional values per serving. See Phase 3.5.
273
- 4. Define the canonical schema in `src/pipeline.py` using Pydantic:
274
- ```python
275
- from pydantic import BaseModel
276
- from typing import Optional
277
-
278
- class Step(BaseModel):
279
- n: int
280
- instruction: str # English, imperative
281
- duration: str # "4 minutes"
282
- visual: str # English visual description for FLUX prompt
283
- tip: Optional[str] = None
284
-
285
- class Nutrition(BaseModel):
286
- calories: int # per serving
287
- protein_g: float
288
- carbs_g: float
289
- fat_g: float
290
- fiber_g: float
291
-
292
- class Recipe(BaseModel):
293
- name: str
294
- cuisine: str
295
- servings: int
296
- total_time_minutes: int
297
- options: list[dict] # only populated on "propose" call
298
- ingredients_have: list[str]
299
- ingredients_missing: list[str]
300
- substitutes: dict[str, list[str]]
301
- steps: list[Step]
302
- final_dish_visual: str
303
- nutrition_per_serving: Nutrition
304
- ```
305
- 5. Write the system prompt (`src/prompts/planner_system.txt`):
306
- - Persona: international chef
307
- - Hard rule: output JSON only, matching schema
308
- - Hard rule: prefer dishes feasible with available ingredients
309
- - Hard rule: 5–7 steps, each ≤ 25 words, each with a concrete `visual` field for image generation
310
- - Hard rule: include `nutrition_per_serving` (model is allowed to estimate; you'll override with `data/nutrition.py` for accuracy)
311
- 6. Use `response_format={"type": "json_object"}` in the chat completion call. Set `temperature=0.7, top_p=0.95, enable_thinking=True` for the propose step (creative); `temperature=0.4` for the structured-output step (deterministic).
312
-
313
- **Deliverable:** for `["chicken","onion","tomato","tortilla","cheese"]` and chosen dish "chicken tinga", the function returns a valid `Recipe` Pydantic object with 5–7 steps.
314
-
315
- **Verify:** the JSON parses, each step has all required fields, and total inference time on Space CPU < 60 seconds.
316
-
317
- ---
318
-
319
- ### Phase 3.5 — Day 3 (afternoon): Nutritional values
320
-
321
- **Goal:** the recipe ends with reliable per-serving nutrition (not hallucinated by the LLM).
322
-
323
- **Approach:** small, embedded reference table beats LLM math.
324
-
325
- **Tasks**
326
- 1. Bundle `data/nutrition_table.csv` — a 200-row CSV mapping common English ingredient names to per-100g macros (kcal, protein, carbs, fat, fiber). Source: USDA FoodData Central CSV download (free, public domain). Trim columns; commit to repo.
327
- 2. Write `src/data/nutrition.py`:
328
- - `parse_quantity(line: str) -> (grams, ingredient_name)` — handle "2 cups flour", "200 g chicken", "1 tbsp olive oil". Use a small regex + a unit-to-grams table (cup=240, tbsp=15, tsp=5, oz=28.35).
329
- - `compute_nutrition(ingredient_lines: list[str], servings: int) -> Nutrition` — sum per-100g values weighted by grams, divide by servings.
330
- - If a line cannot be parsed, skip it and log; don't crash.
331
- 3. After the planner returns a recipe, **overwrite** `recipe.nutrition_per_serving` with the computed value. Keep the LLM's value only as a fallback when the parser yields zero.
332
-
333
- **Deliverable:** for a known recipe (e.g., spaghetti with tomato sauce, 4 servings), computed calories per serving is within ±25% of online references.
334
-
335
- ---
336
-
337
- ### Phase 4 — Day 4: Step Illustrator — FLUX.2 Klein 9B
338
-
339
- **Goal:** generate an appetizing image for the final dish + one image per step.
340
-
341
- **Constraint:** FLUX.2 Klein on CPU is impractical; on a free Space CPU it would take ~10 minutes per image. Two paths:
342
- - **Path A (recommended for the hackathon):** upgrade the Space to a GPU instance (T4 or A10G — paid, but $20 HF credits cover it for a week of development). Code stays unchanged.
343
- - **Path B (fallback):** run FLUX in `enable_model_cpu_offload()` mode with `num_inference_steps=4` and accept ~3 min/image — only feasible for pre-rendered demo recipes, not live runs.
344
-
345
- **Tasks**
346
- 1. Add to `requirements.txt`:
347
- ```text
348
- diffusers>=0.31
349
- transformers>=4.45
350
- accelerate
351
- torch
352
- safetensors
353
- ```
354
- 2. Write `src/models/illustrator.py`:
355
- ```python
356
- import torch
357
- from diffusers import Flux2KleinPipeline
358
-
359
- _pipe = None
360
-
361
- def get_flux():
362
- global _pipe
363
- if _pipe is None:
364
- dtype = torch.bfloat16
365
- _pipe = Flux2KleinPipeline.from_pretrained(
366
- "black-forest-labs/FLUX.2-klein-9B",
367
- torch_dtype=dtype,
368
- )
369
- _pipe.enable_model_cpu_offload()
370
- return _pipe
371
-
372
- def render(prompt: str, seed: int = 0) -> "PIL.Image.Image":
373
- pipe = get_flux()
374
- device = "cuda" if torch.cuda.is_available() else "cpu"
375
- img = pipe(
376
- prompt=prompt,
377
- height=1024, width=1024,
378
- guidance_scale=1.0,
379
- num_inference_steps=4,
380
- generator=torch.Generator(device=device).manual_seed(seed),
381
- ).images[0]
382
- return img
383
- ```
384
- 3. Write `src/agents/step_illustrator.py`:
385
- - For each `Step.visual`, build a prompt like:
386
- > `f"Top-down photo of a kitchen pan or plate showing {visual}. {cuisine} home cooking, warm natural lighting, recipe magazine style, photorealistic, appetizing."`
387
- - Generate the **final dish image first**, then the per-step images, all in **one Python loop** (no parallelism — FLUX holds the GPU).
388
- - Cache results on disk keyed by `hash(prompt)` to avoid re-renders on re-runs.
389
- - Emit Gradio progress updates so the UI doesn't appear frozen.
390
- 4. **Critical tuning:** keep `num_inference_steps=4` (Klein is distilled). Higher counts blow latency and offer minimal quality gain at this scale.
391
-
392
- **Deliverable:** for a 5-step recipe, all 6 images (final + 5 steps) render in:
393
- - < 1 minute on T4 GPU Space
394
- - < 8 minutes on CPU offload (acceptable only for pre-cached demos)
395
-
396
- **Verify:** show the 6 images to an unprompted human; ≥4 should be described as "appetizing".
397
-
398
- ---
399
-
400
- ### Phase 5 — Day 5: Narrator — VoxCPM2
401
-
402
- **Goal:** every step's instruction is rendered to an MP3 in a warm, clear English voice.
403
-
404
- **Tasks**
405
- 1. Confirm the exact VoxCPM2 repo name on HF (`openbmb/VoxCPM2` or similar). Read its README for the inference snippet — TTS APIs vary widely between models.
406
- 2. Add to `requirements.txt`: `soundfile`, `torchaudio`, `numpy`. If VoxCPM2 ships GGUF, use it via `llama-cpp-python` audio extension (if available); otherwise load via `transformers` directly.
407
- 3. Write `src/models/narrator.py`:
408
- ```python
409
- _tts = None
410
-
411
- def get_tts():
412
- global _tts
413
- if _tts is None:
414
- # placeholder — replace with the exact VoxCPM2 loading code from its README
415
- from transformers import AutoModel, AutoProcessor
416
- _tts = ... # load on CPU; VoxCPM2 is small (~1B)
417
- return _tts
418
-
419
- def synthesize(text: str, voice: str = "warm_female_en") -> bytes:
420
- """Returns MP3 bytes."""
421
- tts = get_tts()
422
- wav = tts.generate(text, voice=voice) # API depends on VoxCPM2
423
- # encode wav -> mp3 with soundfile + ffmpeg-python or pydub
424
- return mp3_bytes
425
- ```
426
- 4. Write `src/agents/narrator.py`:
427
- - For each step, synthesize `step.instruction`. If `step.tip` is set, synthesize a separate "tip" clip.
428
- - Save MP3 files in a per-recipe temp directory; return file paths to Gradio.
429
- 5. Pre-render all step audio when the recipe is finalized — never stream per-step in the demo (too much UI lag).
430
-
431
- **Deliverable:** clicking "Play" on step 1 in the UI plays clear English narration.
432
-
433
- **Verify:** on a 5-step recipe, total TTS rendering time < 30 seconds on CPU.
434
-
435
- ---
436
-
437
- ### Phase 6 — Day 6: Gradio UI (Off-Brand)
438
-
439
- **Goal:** the Space looks like a recipe magazine, not stock Gradio.
440
-
441
- **Tasks**
442
- 1. Write `src/ui/theme.py`:
443
- ```python
444
- import gradio as gr
445
-
446
- theme = gr.themes.Soft(
447
- primary_hue="orange",
448
- neutral_hue="stone",
449
- font=[gr.themes.GoogleFont("Inter"), "sans-serif"],
450
- font_mono=[gr.themes.GoogleFont("JetBrains Mono"), "monospace"],
451
- )
452
-
453
- CSS = """
454
- .gradio-container { background: #f5ecd9 !important; }
455
- .recipe-hero { background:#fffbf0; border-radius:14px; padding:28px; }
456
- .recipe-hero h1 { font-family:'Lora',serif!important; font-size:36px!important; color:#6b4a2a!important; }
457
- .step-card { background:#fffbf0; border-left:4px solid #a85c2a; border-radius:8px; padding:18px 22px; margin:12px 0; }
458
- .nutri-grid { display:grid; grid-template-columns:repeat(5,1fr); gap:12px; margin-top:24px; }
459
- .nutri-cell { background:#fffbf0; border:1px solid #d8c9ad; border-radius:10px; padding:12px; text-align:center; }
460
- """
461
- ```
462
- 2. Write `app.py` with three tabs:
463
- - **Tab 1 — Cook**: fridge photo input → ingredient chips → 3 dish options → selected recipe card with hero image, steps (image + text + audio play button each), nutrition grid at the bottom.
464
- - **Tab 2 — Check Progress**: upload a progress photo + select active step → validator returns badge (`go/wait/fix`) + tip + audio.
465
- - **Tab 3 — About / Tech**: README-style explanation, badges, model list.
466
- 3. Use `gr.Blocks` with `gr.State` to hold the current `Recipe` Pydantic object across UI events. Serialize to/from `dict` since Pydantic objects don't survive Gradio state by default — wrap in `state.value = recipe.model_dump()`.
467
- 4. Wire callbacks:
468
- - `btn_propose.click(fn=on_propose, inputs=[fridge_photo], outputs=[ingredient_chips, dish_options, state])`
469
- - `dish_options.select(fn=on_pick_dish, inputs=[state, picked_dish], outputs=[recipe_card, hero_img, steps_column, nutrition_grid, state])`
470
- - `progress_image.upload(fn=on_validate, inputs=[state, current_step_idx, progress_image], outputs=[verdict_md, tip_audio])`
471
-
472
- **Deliverable:** end-to-end run from a sample fridge photo to a fully rendered recipe card with audio and nutrition. No Gradio default look anywhere.
473
-
474
- ---
475
-
476
- ### Phase 7 — Day 7: Progress Validator (closed loop)
477
-
478
- **Goal:** user uploads a progress photo, app says "go / wait / fix" with a voiced tip.
479
-
480
- **Tasks**
481
- 1. Write `src/agents/progress_validator.py`:
482
- ```python
483
- PROMPT = """Compare these two cooking photos.
484
- Photo 1 (target): how it should look after the step "{instruction}".
485
- Photo 2 (user's pan/plate): the user's current progress.
486
- Reply strict JSON: {"verdict": "go|wait|fix", "feedback": "...", "tip": "..."}
487
- - "go": looks right, move to next step
488
- - "wait": needs more time, do not change anything yet
489
- - "fix": something is off; suggest a concrete adjustment in one sentence
490
- """
491
- def validate(target_img, user_img, step_instruction): ...
492
- ```
493
- 2. Use the same vision model singleton as Phase 2 — both calls share weights.
494
- 3. Render the verdict as a colored badge (green/amber/red) and play the tip via VoxCPM2.
495
-
496
- **Deliverable:** running the validator on 5 real progress photos returns the correct verdict on ≥3.
497
-
498
- ---
499
-
500
- ### Phase 8 — Day 8: Fine-tune the Planner on the Kaggle dataset (Well-Tuned badge)
501
-
502
- > **Important caveat:** The user instruction says "for now keep inference on llama.cpp inside HF Space, future migration to Modal." Fine-tuning still **requires GPU**, so training itself happens on Modal (one-shot, offline) or on a rented Colab/Lambda GPU. Inference of the resulting model stays on llama.cpp inside the Space (as GGUF). This does **not** violate the runtime constraint — only the build pipeline touches a GPU.
503
-
504
- **Goal:** publish a fine-tuned Planner GGUF to the Hub and load it from the Space.
505
-
506
- **Tasks**
507
- 1. **Build SFT dataset** (`scripts/build_sft_dataset.py`):
508
- - Load Kaggle `better-recipes` dataset.
509
- - For each recipe, build a `(prompt, completion)` pair where `prompt` is `"Available ingredients: X, Y, Z. Propose recipe."` and `completion` is the full canonical `Recipe` JSON.
510
- - Generate ~1000 pairs, push to `<you>/cook-with-me-sft` HF Dataset.
511
- 2. **LoRA training** (`scripts/train_planner.py` — to be run on a GPU machine, not the Space):
512
- ```python
513
- # peft + trl SFTTrainer, base = openbmb/MiniCPM-V-4
514
- # r=16, alpha=32, lr=2e-4, epochs=2, batch=4
515
- # push_to_hub=True, hub_model_id="<you>/cook-with-me-planner-4b"
516
- ```
517
- 3. **Convert to GGUF** (Day 8 evening):
518
- - Use `llama.cpp/convert_hf_to_gguf.py` then `quantize` to `Q4_K_M`.
519
- - Push GGUF to `<you>/cook-with-me-planner-4b-gguf`.
520
- 4. Update `src/models/loader.py` to point at your GGUF instead of the base model.
521
-
522
- **Deliverable:** the Space loads your fine-tuned Planner GGUF and produces JSON recipes that are noticeably better-formatted than the base model on a held-out test set.
523
-
524
- ---
525
-
526
- ### Phase 9 — Day 9: End-to-end test, performance pass, pre-warm cache
527
-
528
- **Goal:** the Space loads in <60s and a full recipe (text + 5 images + 5 audios + nutrition) renders in <2 minutes on the chosen hardware.
529
-
530
- **Tasks**
531
- 1. Write `scripts/smoke_test.py` that runs the full pipeline on 3 sample fridge photos and asserts:
532
- - Each ingredient list is non-empty
533
- - Each recipe has 5–7 steps
534
- - Each step has a non-empty image and audio path
535
- - Nutrition has all 5 macros set
536
- 2. Implement **on-disk caching** for FLUX outputs (key = SHA256 of prompt) so re-runs of the same recipe are instant. Save to `~/.cache/cook-with-me/flux/`.
537
- 3. Pre-render and commit **3 fully-prepared demo recipes** (chicken tinga, pasta carbonara, chicken tikka) so judges see results in <5s on first click.
538
- 4. Add error handling at every UI boundary: a model failure should display a friendly message, not a stack trace.
539
- 5. Add a "Loading models..." progress bar on first request — first cold start can take 90s.
540
-
541
- **Deliverable:** smoke test passes on the live Space.
542
-
543
- ---
544
-
545
- ### Phase 10 — Day 10: README, demo video, social post, submit
546
-
547
- **Tasks**
548
- 1. Write `README.md` with the required HF Space frontmatter:
549
- ```yaml
550
- ---
551
- title: Cook With Me
552
- emoji: 🍲
553
- colorFrom: orange
554
- colorTo: yellow
555
- sdk: gradio
556
- sdk_version: 4.44.0
557
- app_file: app.py
558
- pinned: false
559
- license: apache-2.0
560
- ---
561
- ```
562
- Followed by:
563
- - One-paragraph pitch
564
- - 60-second demo video embed
565
- - Architecture diagram (export from `arquitectura.html` as PNG)
566
- - Section: "How closed-loop visual cooking guidance works"
567
- - Models used (with HF links + total parameter count)
568
- - Badges declared
569
- - Build / run instructions
570
- 2. Record a 60–90 second demo video: real person cooks a recipe end-to-end with the app guiding via voice, ending with the cooked plate on camera.
571
- 3. Write the Field Notes blog post: one of the engineering surprises (e.g., "FLUX.2 step images at 4 steps look better than 8 — here's why" or "Closed-loop validation needs the same vision model on both sides").
572
- 4. Social post on X / LinkedIn with the demo video.
573
- 5. Submit on the hackathon platform.
574
-
575
- ---
576
-
577
- ## 4. Tools usage matrix (when to reach for what)
578
-
579
- | Phase | Primary tools | Why |
580
- |---|---|---|
581
- | 0 — setup | HF CLI, Kaggle CLI, OpenAI Codex CLI | one-shot config |
582
- | 1 — data | `kagglehub`, `pandas`, `sentence-transformers` | offline dataset prep |
583
- | 2 — vision | `llama-cpp-python` + `MiniCPMv26ChatHandler` | runs inside Space, badge: Llama Champion |
584
- | 3 — planner | `llama-cpp-python` + retrieval over local parquet | grounded JSON output |
585
- | 3.5 — nutrition | local CSV + regex parser | reliable, no LLM math |
586
- | 4 — illustrator | `diffusers` + `Flux2KleinPipeline` | sponsor model showcase |
587
- | 5 — narrator | VoxCPM2 via `transformers` (or its native API) | local TTS |
588
- | 6 — UI | `gradio` + custom CSS theme | Off-Brand badge |
589
- | 7 — validator | same vision singleton as phase 2 | closed-loop innovation, Best Agent |
590
- | 8 — fine-tune | `peft`, `trl`, `llama.cpp` convert/quantize, on a GPU machine | Well-Tuned badge |
591
- | 9 — test/cache | `pytest`, `hashlib`, on-disk FLUX cache | demo reliability |
592
- | 10 — submit | HF Spaces, video tool, social | shipping |
593
-
594
- ---
595
-
596
- ## 5. Performance budget on the HF Space
597
-
598
- | Operation | Target latency | Hardware needed |
599
- |---|---|---|
600
- | Vision: ingredient ID | < 8 s | CPU 4-thread |
601
- | Planner: propose 3 dishes | < 12 s | CPU 4-thread |
602
- | Planner: build full recipe JSON | < 20 s | CPU 4-thread |
603
- | Nutrition computation | < 0.1 s | CPU |
604
- | FLUX: 1 image (4 steps) | < 12 s on T4 / < 90 s on CPU offload | GPU strongly recommended |
605
- | FLUX: 6 images (final + 5 steps) | < 80 s on T4 | GPU |
606
- | VoxCPM2: 1 step narration | < 5 s | CPU |
607
- | Validator: 1 progress check | < 8 s | CPU |
608
- | **Full recipe end-to-end** | **< 2 min on T4 Space** | — |
609
-
610
- **Hardware decision:** rent a T4 Space (~$0.40/hr) for the demo week. The $20 HF credits cover ~50 hours.
611
-
612
- ---
613
-
614
- ## 6. Risks and mitigations (delta from `estrategia.md`)
615
-
616
- | Risk | Mitigation |
617
- |---|---|
618
- | MiniCPM-V-4 has no public GGUF | Convert yourself with `llama.cpp/convert_hf_to_gguf.py`. Allow a half-day buffer in Phase 2. |
619
- | llama-cpp-python's MiniCPM-V chat handler version mismatch | Pin `llama-cpp-python==0.3.2` minimum; test the handler import on Day 2. If it fails, fall back to MiniCPM-V-2.6 GGUF (well-supported) for vision and document the swap. |
620
- | FLUX.2 Klein 9B too slow on free CPU Space | Upgrade to a paid GPU Space (~$10 for the demo week). Document this in the README so judges expect it. |
621
- | VoxCPM2 docs sparse | Drop to Kokoro-82M or Piper TTS as a backup. Lose the OpenBMB voice angle but keep the audio. |
622
- | Kaggle dataset has format quirks (HTML in instructions, missing fields) | The Phase 1 normalization step handles this; budget 2 hours. |
623
- | Nutrition CSV missing exotic ingredients | Skip-and-log strategy already designed; demo-day recipes use common ingredients only. |
624
- | Total params >32B if VoxCPM2 turns out to be 7B | Check size in Phase 0; if too large, drop to a smaller TTS. |
625
-
626
- ---
627
-
628
- ## 7. "Day-1 hello world" checklist
629
-
630
- Before writing any agent code, get this minimal end-to-end loop working — it proves your stack:
631
-
632
- 1. ☐ Empty Gradio Space deployed, shows "Hello"
633
- 2. ☐ `huggingface-cli login` works locally
634
- 3. ☐ `kaggle datasets download thedevastator/better-recipes-for-a-better-life` succeeds
635
- 4. ☐ `from llama_cpp import Llama` runs in your venv
636
- 5. ☐ Download one tiny GGUF (e.g., TinyLlama Q4) and call it from a Gradio textbox round-trip
637
- 6. ☐ Push the round-trip to the Space; confirm it answers in the cloud
638
-
639
- **Only after all 6 are checked, start Phase 1.**
640
-
641
- ---
642
-
643
- ## 8. Where this plan differs from `estrategia.md` (deltas to communicate)
644
-
645
- | Topic | `estrategia.md` (Spanish, Mexican-cuisine focus) | This document (current requirements) |
646
- |---|---|---|
647
- | Language | Spanish-first | **English only** |
648
- | Cuisine | Mexican | **International** (Kaggle dataset) |
649
- | Voice models | OpenBMB voice + Cohere Labs | **VoxCPM2** only (single voice) |
650
- | Vision model | MiniCPM-V 2.6 / 4 | **MiniCPM-V-4.6** |
651
- | Reasoning model | MiniCPM-4 4B | **MiniCPM-V-4** |
652
- | FLUX runtime | Modal endpoint | **Inside Space (llama.cpp principle)**; Modal kept as a future migration target only |
653
- | External APIs at runtime | Allowed (Modal, OpenAI optional) | **None** — full local inference inside Space |
654
- | Nutritional info | Not specified | **Required** at end of recipe |
655
- | Fine-tune dataset | 200 synthetic Mexican recipes | **Kaggle better-recipes (international)** |
656
-
657
- If anything in `plan.md` or `estrategia.md` conflicts with this document, **this document wins** — it reflects the latest user requirements.
658
-
659
- ---
660
-
661
- ## 9. Definition of done
662
-
663
- The implementation is complete when **all** of these are true:
664
-
665
- - [ ] Public HF Space `https://huggingface.co/spaces/<you>/cook-with-me` loads
666
- - [ ] App is fully in English
667
- - [ ] Fridge photo → ingredient list → 3 dish options → full recipe with images, audio, and nutrition works end-to-end
668
- - [ ] Progress validator returns sensible verdicts on 3+ test photos
669
- - [ ] All inference (vision, planner, TTS) runs through llama.cpp / local diffusers — **no external API calls at runtime**
670
- - [ ] Total parameters declared in README ≤ 32B
671
- - [ ] Fine-tuned Planner GGUF published to HF Hub (Well-Tuned badge)
672
- - [ ] Demo video (60–90s) recorded with a real person cooking
673
- - [ ] Field Notes blog post published
674
- - [ ] Submitted on the hackathon platform before deadline
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app.py CHANGED
@@ -1,5 +1,4 @@
1
  import logging
2
- logging.basicConfig(level=logging.INFO)
3
  log = logging.getLogger(__name__)
4
 
5
  from typing import Any
@@ -7,11 +6,12 @@ from typing import Any
7
  import gradio as gr
8
  from PIL import Image
9
 
 
10
  from src.agents.mise_en_place import identify_ingredients
11
- from src.agents.progress_validator import validate
12
- from src.agents.recipe_planner import plan_recipe, propose_dishes
13
- from src.agents.step_illustrator import illustrate_recipe
14
- from src.data.nutrition import compute_nutrition
15
  from src.ui.components import (
16
  DishOptions,
17
  IngredientChips,
@@ -19,265 +19,135 @@ from src.ui.components import (
19
  RecipeHero,
20
  StepCard,
21
  VerdictBadge,
 
22
  )
23
  from src.ui.theme import CSS, theme
24
 
25
-
26
- # ---------------------------------------------------------------------------
27
- # Callbacks
28
- # ---------------------------------------------------------------------------
29
-
30
- def _clean_ingredients(items: list | None) -> list[str]:
31
- """Normalize a raw ingredient list (dedup, lowercase, strip empties)."""
32
- out, seen = [], set()
33
- for it in (items or []):
34
- name = str(it).strip().lower()
35
- if name and name not in seen:
36
- seen.add(name)
37
- out.append(name)
38
- return out
39
-
40
-
41
- def on_propose(fridge_image: Image.Image | None, state: dict | None):
42
- """Photo → ingredients → 3 dish options (and fill the editable list)."""
43
  state = state or {}
44
- if fridge_image is None:
45
- return (
46
- IngredientChips.render({}),
47
- DishOptions.render({}),
48
- gr.update(choices=[], value=None),
49
- state,
50
- gr.update(choices=[], value=[]),
51
- )
52
-
53
  ingredients = identify_ingredients(fridge_image)
54
- options = propose_dishes(ingredients)
55
 
56
- state.update({
57
- "ingredients_have": ingredients,
58
- "options": [o.model_dump() for o in options],
59
- })
 
 
 
 
 
 
 
60
 
61
- radio_choices = [o.name for o in options]
62
- return (
63
- IngredientChips.render({"have": ingredients, "missing": []}),
64
- DishOptions.render({"options": state["options"]}),
65
- gr.update(choices=radio_choices, value=radio_choices[0] if radio_choices else None),
66
- state,
67
- gr.update(choices=ingredients, value=ingredients),
68
- )
69
-
70
-
71
- def on_update_ingredients(state: dict | None, ingredients: list | None):
72
- """Manual edit of the ingredient list → refresh chips + re-propose dishes."""
73
- state = state or {}
74
- ingredients = _clean_ingredients(ingredients)
75
- state["ingredients_have"] = ingredients
76
-
77
- if not ingredients:
78
- state["options"] = []
79
- return (
80
- IngredientChips.render({}),
81
- DishOptions.render({}),
82
- gr.update(choices=[], value=None),
83
- state,
84
- )
85
 
86
- options = propose_dishes(ingredients)
87
- state["options"] = [o.model_dump() for o in options]
88
- radio_choices = [o.name for o in options]
89
- return (
90
- IngredientChips.render({"have": ingredients, "missing": []}),
91
- DishOptions.render({"options": state["options"]}),
92
- gr.update(choices=radio_choices, value=radio_choices[0] if radio_choices else None),
93
- state,
94
- )
95
-
96
-
97
- def on_cook(state: dict | None, dish_name: str | None, illustrate: bool, ingredients: list | None):
98
- """Chosen dish → full recipe + nutrition (+ FLUX images if requested)."""
99
- state = state or {}
100
- if not dish_name:
101
- return (
102
- RecipeHero.render({}),
103
- StepCard.render({}),
104
- NutritionGrid.render({"nutrition": {}}),
105
- state,
106
- )
107
-
108
- # Prefer the (possibly hand-edited) ingredient list from the editor.
109
- ingredients = _clean_ingredients(ingredients) or state.get("ingredients_have", [])
110
- state["ingredients_have"] = ingredients
111
- recipe = plan_recipe(dish_name, ingredients)
112
-
113
- nutrition = compute_nutrition(ingredients, recipe.servings)
114
- recipe.nutrition = nutrition
115
- state["recipe"] = recipe.model_dump()
116
-
117
- if illustrate:
118
- log.info("Generating FLUX step images via Modal...")
119
- recipe = illustrate_recipe(recipe)
120
- state["recipe"] = recipe.model_dump()
121
-
122
- return (
123
- RecipeHero.render(recipe.model_dump()),
124
- StepCard.render({"steps": [s.model_dump() for s in recipe.steps]}),
125
- NutritionGrid.render({"nutrition": nutrition}),
126
- state,
127
- )
128
-
129
-
130
- def on_validate(state: dict | None, step_idx: float, progress_image: Image.Image | None):
131
- """Progress photo + step number → verdict badge."""
132
- state = state or {}
133
- recipe = state.get("recipe", {})
134
- steps = recipe.get("steps", [])
135
- idx = max(0, int(step_idx) - 1)
136
- instruction = steps[idx]["instruction"] if idx < len(steps) else "Cook the dish properly."
137
- result = validate(progress_image, instruction)
138
- return VerdictBadge.render(result)
139
-
140
-
141
- # ---------------------------------------------------------------------------
142
- # UI
143
- # ---------------------------------------------------------------------------
144
 
 
 
 
145
  def build_ui() -> gr.Blocks:
146
  initial_state: dict[str, Any] = {}
147
 
148
- with gr.Blocks(title="Cook With Me", theme=theme, css=CSS) as demo:
149
  gr.Markdown(
150
  "# 🍲 Cook With Me\n"
151
- "_Snap your fridge · Pick a dish · Cook step by step · Check your progress._"
152
  )
153
 
154
  state = gr.State(initial_state)
155
 
156
  with gr.Tabs():
157
- # ----------------------------------------------------------------
158
- # Tab 1 — Cook
159
- # ----------------------------------------------------------------
160
- with gr.Tab("🍳 Cook"):
161
  with gr.Row():
162
- # Left — inputs
163
  with gr.Column(scale=1):
164
  fridge_input = gr.Image(
165
  label="📸 Photo of your fridge or pantry",
166
  type="pil",
167
- height=300,
168
  )
169
- propose_btn = gr.Button("🔍 What can I cook?", variant="primary")
170
 
171
  gr.Markdown("### Ingredients I see")
172
  chips = gr.HTML(IngredientChips.render({}))
173
 
174
- ingredient_editor = gr.Dropdown(
175
- choices=[],
176
- value=[],
177
- multiselect=True,
178
- allow_custom_value=True,
179
- label="✏️ Add or remove ingredients (type + Enter to add, ✕ to remove)",
180
- interactive=True,
181
- )
182
- update_btn = gr.Button("🔄 Update ingredients & dishes")
183
-
184
  gr.Markdown("### Pick a dish")
185
- dish_options_html = gr.HTML(DishOptions.render({}))
186
- dish_radio = gr.Radio(
187
- choices=[],
188
- label="Choose one",
189
- interactive=True,
190
- )
191
 
192
- with gr.Accordion("⚙️ Generation options", open=False):
193
- illustrate_chk = gr.Checkbox(
194
- value=False,
195
- label="🎨 Generate step images with FLUX.2 (requires Modal deployment)",
196
- )
197
 
198
- cook_btn = gr.Button("👨‍🍳 Build my recipe", variant="primary")
199
 
200
- # Right — recipe output
201
  with gr.Column(scale=2):
202
  hero = gr.HTML(RecipeHero.render({}))
203
  steps_panel = gr.HTML(StepCard.render({}))
204
  nutrition_panel = gr.HTML(NutritionGrid.render({"nutrition": {}}))
205
 
206
- # ----------------------------------------------------------------
207
- # Tab 2 — Check Progress
208
- # ----------------------------------------------------------------
209
- with gr.Tab("📷 Check Progress"):
210
- gr.Markdown(
211
- "Upload a photo of your pan or plate. The vision model compares it "
212
- "against the current recipe step and tells you if you can move on."
213
- )
214
  with gr.Row():
215
  with gr.Column():
216
  step_idx = gr.Number(value=1, precision=0, label="Active step #")
217
- progress_input = gr.Image(
218
- label="📸 Your pan / plate",
219
- type="pil",
220
- height=300,
221
- )
222
- validate_btn = gr.Button("✅ How am I doing?", variant="primary")
223
  with gr.Column():
224
  verdict_panel = gr.HTML(VerdictBadge.render({}))
 
225
 
226
- # ----------------------------------------------------------------
227
- # Tab 3 — About
228
- # ----------------------------------------------------------------
229
- with gr.Tab("ℹ️ About"):
230
  gr.Markdown(
231
  """
232
- ### How it works
233
- 1. **Snap** your fridge the fine-tuned vision model (MiniCPM-V-4.6) identifies every ingredient.
234
- 2. **Pick** one of three AI-suggested dishes tailored to what you have.
235
- 3. **Cook** step by step with a generated recipe, per-serving nutrition, and optional FLUX.2 step images.
236
- 4. **Check** your progress upload a photo of your pan and get a *go / wait / fix* verdict.
237
-
238
- ### Models
239
- | Role | Model | Params |
240
- |---|---|---|
241
- | Vision (ingredients + validator) | `openbmb/MiniCPM-V-4.6` (fine-tuned) | ~4.6B |
242
- | Recipe Planner | `openbmb/MiniCPM4.1-8B` (fine-tuned on Kaggle recipes) | ~8B |
243
- | Step Illustrator | `FLUX.2-klein-9B` via Modal | ~9B |
244
-
245
- **Total ≤ 21.6B params** (cap: 32B ✓)
246
-
247
- ### Badges targeted
248
- ✓ Well-Tuned · ✓ Off-Brand · ✓ Sharing is Caring · ✓ Field Notes
249
-
250
- ### Hackathon
251
- Hugging Face Small Models / Big Adventures · June 2026 · Track: Backyard AI
 
 
 
 
 
252
  """
253
  )
254
 
255
- # --------------------------------------------------------------------
256
- # Wire callbacks
257
- # --------------------------------------------------------------------
258
  propose_btn.click(
259
  fn=on_propose,
260
  inputs=[fridge_input, state],
261
- outputs=[chips, dish_options_html, dish_radio, state, ingredient_editor],
262
- )
263
-
264
- update_btn.click(
265
- fn=on_update_ingredients,
266
- inputs=[state, ingredient_editor],
267
- outputs=[chips, dish_options_html, dish_radio, state],
268
- )
269
-
270
- cook_btn.click(
271
- fn=on_cook,
272
- inputs=[state, dish_radio, illustrate_chk, ingredient_editor],
273
- outputs=[hero, steps_panel, nutrition_panel, state],
274
- )
275
-
276
- validate_btn.click(
277
- fn=on_validate,
278
- inputs=[state, step_idx, progress_input],
279
- outputs=[verdict_panel],
280
  )
 
 
 
 
 
 
 
 
 
 
281
 
282
  return demo
283
 
@@ -289,4 +159,6 @@ if __name__ == "__main__":
289
  server_port=int(__import__("os").environ.get("PORT", 7860)),
290
  show_error=True,
291
  inbrowser=True,
292
- )
 
 
 
1
  import logging
 
2
  log = logging.getLogger(__name__)
3
 
4
  from typing import Any
 
6
  import gradio as gr
7
  from PIL import Image
8
 
9
+ # from src import config
10
  from src.agents.mise_en_place import identify_ingredients
11
+ # from src.agents.progress_validator import validate
12
+ # from src.agents.recipe_planner import plan_recipe, propose_dishes
13
+ # from src.data.nutrition import compute_nutrition
14
+ # from src.pipeline import Recipe
15
  from src.ui.components import (
16
  DishOptions,
17
  IngredientChips,
 
19
  RecipeHero,
20
  StepCard,
21
  VerdictBadge,
22
+ recipe_to_state,
23
  )
24
  from src.ui.theme import CSS, theme
25
 
26
+ def on_propose(fridge_image: Image.Image | None, state: dict | None) -> tuple[str, str, list[str], dict]:
27
+ """Photo → ingredients → 3 dish options."""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  state = state or {}
 
 
 
 
 
 
 
 
 
29
  ingredients = identify_ingredients(fridge_image)
30
+ # options = propose_dishes(ingredients)
31
 
32
+ # state.update({
33
+ # "ingredients_have": ingredients,
34
+ # "ingredients_missing": [],
35
+ # "options": [o.model_dump() for o in options],
36
+ # })
37
+ chips_html = IngredientChips.render({"have": ingredients, "missing": []})
38
+ log.info(ingredients)
39
+ # options_html = DishOptions.render({"options": state["options"]})
40
+ # radio_choices = [o.name for o in options]
41
+ # return chips_html, options_html, gr.update(choices=radio_choices, value=radio_choices[0] if radio_choices else None), state
42
+ return chips_html
43
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
+ # ----------------
47
+ # UI definition
48
+ # ----------------
49
  def build_ui() -> gr.Blocks:
50
  initial_state: dict[str, Any] = {}
51
 
52
+ with gr.Blocks(title="Cook With Me") as demo:
53
  gr.Markdown(
54
  "# 🍲 Cook With Me\n"
55
+ "_A multimodal sous-chef. See it. Plan it. Show it. Cook it._"
56
  )
57
 
58
  state = gr.State(initial_state)
59
 
60
  with gr.Tabs():
61
+ # --- Tab 1: Cook ------------------------------------------------
62
+ with gr.Tab("Cook"):
 
 
63
  with gr.Row():
 
64
  with gr.Column(scale=1):
65
  fridge_input = gr.Image(
66
  label="📸 Photo of your fridge or pantry",
67
  type="pil",
68
+ height=320,
69
  )
70
+ propose_btn = gr.Button("What can I cook?", variant="primary")
71
 
72
  gr.Markdown("### Ingredients I see")
73
  chips = gr.HTML(IngredientChips.render({}))
74
 
 
 
 
 
 
 
 
 
 
 
75
  gr.Markdown("### Pick a dish")
76
+ options = gr.HTML(DishOptions.render({}))
77
+ dish_radio = gr.Radio(choices=[], label="Choose one", interactive=True)
 
 
 
 
78
 
79
+ with gr.Accordion("Generation options", open=False):
80
+ illustrate_chk = gr.Checkbox(value=False, label="Render step images (FLUX, slow on CPU)")
81
+ narrate_chk = gr.Checkbox(value=False, label="Generate voice narration (VoxCPM2)")
 
 
82
 
83
+ cook_btn = gr.Button("Build recipe", variant="primary")
84
 
 
85
  with gr.Column(scale=2):
86
  hero = gr.HTML(RecipeHero.render({}))
87
  steps_panel = gr.HTML(StepCard.render({}))
88
  nutrition_panel = gr.HTML(NutritionGrid.render({"nutrition": {}}))
89
 
90
+ # --- Tab 2: Check Progress -------------------------------------
91
+ with gr.Tab("Check Progress"):
92
+ gr.Markdown("Upload a photo of your pan or plate; the same vision model that planned your recipe will compare it against the target step.")
 
 
 
 
 
93
  with gr.Row():
94
  with gr.Column():
95
  step_idx = gr.Number(value=1, precision=0, label="Active step #")
96
+ progress_input = gr.Image(label="📸 Your pan / plate", type="pil", height=320)
97
+ validate_btn = gr.Button("How am I doing?", variant="primary")
 
 
 
 
98
  with gr.Column():
99
  verdict_panel = gr.HTML(VerdictBadge.render({}))
100
+ verdict_audio = gr.Audio(label="Tip (voice)", autoplay=False)
101
 
102
+ # --- Tab 3: About ----------------------------------------------
103
+ with gr.Tab("About"):
 
 
104
  gr.Markdown(
105
  """
106
+ ### Models
107
+ - **Vision** — `openbmb/MiniCPM-V-4_6-gguf` via `llama-cpp-python` (~4.6B)
108
+ - **Planner** `openbmb/MiniCPM-V-4-gguf` via `llama-cpp-python` (~4B)
109
+ - **Illustrator** `black-forest-labs/FLUX.2-klein-9B` via `diffusers` (9B)
110
+ - **Narrator** — `openbmb/VoxCPM2` via `transformers` (~1B)
111
+ - **Retrieval** — `sentence-transformers/all-MiniLM-L6-v2` (22M)
112
+ **Total ≈ 18.6B params** (≤ 32B requirement ✓).
113
+ ### Pipeline
114
+ ```
115
+ Fridge photo Vision ingredients
116
+
117
+
118
+ Planner (+ Kaggle retrieval) → Recipe JSON
119
+
120
+
121
+ Illustrator (FLUX) → hero + per-step images
122
+
123
+
124
+ Narrator (VoxCPM2) → MP3 per step
125
+
126
+
127
+ Progress photo → Validator (same vision model) → go|wait|fix
128
+ ```
129
+ ### Badges targeted
130
+ ✓ Llama Champion · ✓ Well-Tuned · ✓ Off-Brand · ✓ Sharing is Caring · ✓ Field Notes
131
  """
132
  )
133
 
134
+ # Wire callbacks ----------------------------------------------------
 
 
135
  propose_btn.click(
136
  fn=on_propose,
137
  inputs=[fridge_input, state],
138
+ # outputs=[chips, options, dish_radio, state],
139
+ outputs=[chips],
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
140
  )
141
+ # cook_btn.click(
142
+ # fn=on_pick_dish,
143
+ # inputs=[state, dish_radio, illustrate_chk, narrate_chk],
144
+ # outputs=[hero, steps_panel, nutrition_panel, chips, state],
145
+ # )
146
+ # validate_btn.click(
147
+ # fn=on_validate,
148
+ # inputs=[state, step_idx, progress_input],
149
+ # outputs=[verdict_panel, verdict_audio],
150
+ # )
151
 
152
  return demo
153
 
 
159
  server_port=int(__import__("os").environ.get("PORT", 7860)),
160
  show_error=True,
161
  inbrowser=True,
162
+ theme=theme,
163
+ css=CSS
164
+ )
modal_app/__init__.py DELETED
File without changes
modal_app/flux_endpoint.py DELETED
@@ -1,124 +0,0 @@
1
- """Modal FLUX.2 Klein endpoint.
2
-
3
- Deploy once with:
4
- modal deploy modal_app/flux_endpoint.py
5
-
6
- Then the HF Space calls it via modal.Function.lookup().
7
- """
8
- import io
9
- import modal
10
-
11
- # ---------------------------------------------------------------------------
12
- # App & image
13
- # ---------------------------------------------------------------------------
14
-
15
- app = modal.App("cook-with-me-flux")
16
-
17
- image = (
18
- modal.Image.debian_slim(python_version="3.12")
19
- .pip_install(
20
- "torch==2.7.0", # >=2.5 needed: diffusers custom-op schema uses PEP604 unions
21
- "torchvision==0.22.0", # matches torch 2.7.0; silences diffusers image-processor fallback
22
- "diffusers>=0.38", # FLUX.2 support
23
- "transformers>=4.45",
24
- "accelerate",
25
- "safetensors",
26
- "Pillow",
27
- "huggingface_hub>=1.17",
28
- "sentencepiece",
29
- )
30
- )
31
-
32
- # HF token secret so Modal can pull gated/private model weights
33
- hf_secret = modal.Secret.from_name("huggingface-secret")
34
-
35
- # Tried in order. FLUX models are gated (need license acceptance on HF);
36
- # SDXL-Turbo is public and always works, so it's the guaranteed fallback.
37
- FLUX_MODEL = "black-forest-labs/FLUX.2-klein-9B"
38
- FLUX_FALLBACK = "black-forest-labs/FLUX.1-schnell"
39
- SDXL_TURBO = "stabilityai/sdxl-turbo" # non-gated, fast (1-2 steps)
40
-
41
- # ---------------------------------------------------------------------------
42
- # GPU class
43
- # ---------------------------------------------------------------------------
44
-
45
- @app.cls(
46
- image=image,
47
- gpu="L4",
48
- scaledown_window=180, # keep warm 3 min after last request
49
- secrets=[hf_secret],
50
- )
51
- class FluxKlein:
52
- @modal.enter()
53
- def load(self):
54
- import torch
55
-
56
- dtype = torch.bfloat16
57
- self.steps = 4
58
-
59
- # 1) FLUX.2-klein (gated) ------------------------------------------------
60
- try:
61
- from diffusers import FluxPipeline
62
- self.pipe = FluxPipeline.from_pretrained(FLUX_MODEL, torch_dtype=dtype).to("cuda")
63
- self.guidance, self.steps, self.backend = 1.0, 4, "FLUX.2-klein-9B"
64
- print(f"Loaded {self.backend}")
65
- return
66
- except Exception as e:
67
- print(f"FLUX.2-klein unavailable ({type(e).__name__}); trying FLUX.1-schnell...")
68
-
69
- # 2) FLUX.1-schnell (gated) ---------------------------------------------
70
- try:
71
- from diffusers import FluxPipeline
72
- self.pipe = FluxPipeline.from_pretrained(FLUX_FALLBACK, torch_dtype=dtype).to("cuda")
73
- self.guidance, self.steps, self.backend = 0.0, 4, "FLUX.1-schnell"
74
- print(f"Loaded {self.backend}")
75
- return
76
- except Exception as e:
77
- print(f"FLUX.1-schnell unavailable ({type(e).__name__}); falling back to SDXL-Turbo...")
78
-
79
- # 3) SDXL-Turbo (public, always works) ----------------------------------
80
- from diffusers import AutoPipelineForText2Image
81
- self.pipe = AutoPipelineForText2Image.from_pretrained(
82
- SDXL_TURBO, torch_dtype=torch.float16, variant="fp16"
83
- ).to("cuda")
84
- self.guidance, self.steps, self.backend = 0.0, 2, "SDXL-Turbo"
85
- print(f"Loaded {self.backend}")
86
-
87
- @modal.method()
88
- def render_step(self, prompt: str, seed: int = 42) -> bytes:
89
- """Generate a 512×512 PNG and return its raw bytes."""
90
- import torch
91
-
92
- img = self.pipe(
93
- prompt=prompt,
94
- height=512,
95
- width=512,
96
- guidance_scale=self.guidance,
97
- num_inference_steps=self.steps,
98
- generator=torch.Generator(device="cuda").manual_seed(seed),
99
- ).images[0]
100
-
101
- buf = io.BytesIO()
102
- img.save(buf, format="PNG")
103
- return buf.getvalue()
104
-
105
-
106
- # ---------------------------------------------------------------------------
107
- # Local test entrypoint
108
- # ---------------------------------------------------------------------------
109
-
110
- @app.local_entrypoint()
111
- def test():
112
- import os
113
- flux = FluxKlein()
114
- png = flux.render_step.remote(
115
- "Top-down photo of a kitchen pan with sautéed onions. "
116
- "Mexican cooking. Warm lighting. Photorealistic.",
117
- seed=0,
118
- )
119
- out = os.path.join(os.path.dirname(__file__), "..", "data", "test_flux.png")
120
- out = os.path.abspath(out)
121
- os.makedirs(os.path.dirname(out), exist_ok=True)
122
- with open(out, "wb") as f:
123
- f.write(png)
124
- print(f"Saved {out} ({len(png)} bytes)")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
modal_app/planner_endpoint.py DELETED
@@ -1,117 +0,0 @@
1
- """Modal endpoint for the fine-tuned MiniCPM4.1-8B recipe planner.
2
-
3
- Runs in its OWN container because MiniCPM4.1's custom code requires
4
- transformers 4.x (CacheLayerMixin + is_torch_fx_available), which conflicts
5
- with the MiniCPM-V-4.6 vision model in the main app (needs transformers 5.x).
6
-
7
- Deploy:
8
- modal deploy modal_app/planner_endpoint.py
9
-
10
- The Gradio app calls it via modal.Cls.from_name("cook-with-me-planner",
11
- "Planner").infer.remote(prompt, ...).
12
- """
13
- from __future__ import annotations
14
-
15
- import os
16
-
17
- import modal
18
-
19
- app = modal.App("cook-with-me-planner")
20
-
21
- # 8B bf16 weights cached on a volume so cold starts don't re-download ~16GB.
22
- hf_cache = modal.Volume.from_name("cook-with-me-planner-cache", create_if_missing=True)
23
- hf_secret = modal.Secret.from_name("huggingface-secret")
24
-
25
- image = (
26
- modal.Image.debian_slim(python_version="3.12")
27
- .pip_install(
28
- "torch==2.4.0",
29
- # MiniCPM4.1 custom code needs BOTH CacheLayerMixin (>=4.54) and
30
- # is_torch_fx_available (removed in 5.0) — only 4.54..4.x has both.
31
- "transformers>=4.54,<5.0",
32
- "huggingface_hub>=0.26,<1.0",
33
- "accelerate",
34
- "sentencepiece",
35
- "safetensors",
36
- )
37
- .env({"HF_HOME": "/cache/hf"})
38
- )
39
-
40
- # Fine-tuned weights; tokenizer pulled from base (FT tokenizer_config was saved
41
- # by transformers 5.x and is not readable by 4.x).
42
- PLANNER_REPO = os.environ.get("COOK_WITH_ME_PLANNER_FT_REPO", "eldinosaur/cook-with-me-planner-8b")
43
- BASE_REPO = "openbmb/MiniCPM4.1-8B"
44
-
45
-
46
- @app.cls(
47
- image=image,
48
- gpu="L4",
49
- volumes={"/cache": hf_cache},
50
- secrets=[hf_secret],
51
- scaledown_window=240,
52
- timeout=600,
53
- )
54
- class Planner:
55
- @modal.enter()
56
- def load(self):
57
- import torch
58
- from transformers import AutoModelForCausalLM, AutoTokenizer
59
-
60
- print(f"Loading planner weights from {PLANNER_REPO}...")
61
- self.tokenizer = AutoTokenizer.from_pretrained(BASE_REPO, trust_remote_code=True)
62
- if self.tokenizer.pad_token is None:
63
- self.tokenizer.pad_token = self.tokenizer.eos_token
64
- self.model = AutoModelForCausalLM.from_pretrained(
65
- PLANNER_REPO,
66
- torch_dtype=torch.bfloat16,
67
- trust_remote_code=True,
68
- device_map="cuda",
69
- ).eval()
70
- print("Planner ready.")
71
-
72
- @modal.method()
73
- def infer(self, prompt: str, max_new_tokens: int = 1024, temperature: float = 0.0) -> str:
74
- import torch
75
-
76
- messages = [{"role": "user", "content": prompt}]
77
- # enable_thinking=False -> direct JSON, no <think> reasoning preamble
78
- try:
79
- enc = self.tokenizer.apply_chat_template(
80
- messages,
81
- add_generation_prompt=True,
82
- tokenize=True,
83
- return_tensors="pt",
84
- return_dict=True,
85
- enable_thinking=False,
86
- )
87
- except TypeError:
88
- enc = self.tokenizer.apply_chat_template(
89
- messages, add_generation_prompt=True, tokenize=True,
90
- return_tensors="pt", return_dict=True,
91
- )
92
-
93
- input_ids = enc["input_ids"].to(self.model.device)
94
- input_len = input_ids.shape[1]
95
- gen_inputs = {"input_ids": input_ids}
96
- if enc.get("attention_mask") is not None:
97
- gen_inputs["attention_mask"] = enc["attention_mask"].to(self.model.device)
98
-
99
- gen_kwargs = dict(max_new_tokens=max_new_tokens, repetition_penalty=1.05)
100
- if temperature and temperature > 0:
101
- gen_kwargs.update(do_sample=True, temperature=temperature, top_p=0.9)
102
- else:
103
- gen_kwargs.update(do_sample=False)
104
-
105
- with torch.no_grad():
106
- out = self.model.generate(**gen_inputs, **gen_kwargs)
107
- return self.tokenizer.decode(out[0][input_len:], skip_special_tokens=True)
108
-
109
-
110
- @app.local_entrypoint()
111
- def test():
112
- prompt = (
113
- "You are a creative chef. Available ingredients: tomato, onion, garlic, pasta, olive oil.\n"
114
- 'Respond ONLY with JSON: {"options": [{"name": "...", "why": "..."}, {"name": "...", "why": "..."}, {"name": "...", "why": "..."}]}'
115
- )
116
- out = Planner().infer.remote(prompt, max_new_tokens=400)
117
- print("OUTPUT:\n", out)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
modal_app/serve_app.py DELETED
@@ -1,102 +0,0 @@
1
- """Serve the full Cook With Me Gradio app on Modal GPU.
2
-
3
- This gives a permanent public URL (*.modal.run) that runs the real models:
4
- - MiniCPM-V-4.6 (vision: ingredients + progress validation)
5
- - MiniCPM4.1-8B (planner: dish proposals + recipes)
6
- - FLUX.2-klein (step images, via the separate cook-with-me-flux endpoint)
7
-
8
- Deploy with:
9
- modal deploy modal_app/serve_app.py
10
- Or run a temporary dev session (auto-stops on Ctrl-C):
11
- modal serve modal_app/serve_app.py
12
-
13
- Both models live in one A100-40GB container (~25GB VRAM total).
14
- Set the fine-tuned planner repo via the COOK_WITH_ME_PLANNER_FT_REPO env
15
- on the Modal function once training finishes.
16
- """
17
- from __future__ import annotations
18
-
19
- from pathlib import Path
20
-
21
- import modal
22
-
23
- LOCAL_ROOT = Path(__file__).resolve().parent.parent
24
- REMOTE_ROOT = "/root/cook"
25
-
26
- app = modal.App("cook-with-me-app")
27
-
28
- # HF model cache persisted across restarts (avoids re-downloading ~25GB)
29
- hf_cache = modal.Volume.from_name("cook-with-me-hf-cache", create_if_missing=True)
30
- hf_secret = modal.Secret.from_name("huggingface-secret")
31
-
32
- image = (
33
- modal.Image.debian_slim(python_version="3.12")
34
- .pip_install(
35
- "torch==2.4.0",
36
- "torchvision==0.19.0",
37
- "transformers>=5.0",
38
- "accelerate",
39
- "safetensors",
40
- "sentencepiece",
41
- "Pillow",
42
- "av",
43
- "pydantic>=2",
44
- "gradio==6.15.2",
45
- "huggingface_hub>=1.17",
46
- "modal",
47
- )
48
- .env({
49
- "COOK_WITH_ME_CACHE": "/cache/cook",
50
- # Use the fine-tuned planner pushed by scripts/train_planner.py
51
- "COOK_WITH_ME_PLANNER_FT_REPO": "eldinosaur/cook-with-me-planner-8b",
52
- })
53
- .add_local_dir(
54
- str(LOCAL_ROOT),
55
- REMOTE_ROOT,
56
- ignore=[
57
- "data/*", ".git/*", "**/__pycache__", "**/*.pyc",
58
- "assets/*", ".venv/*", "venv/*",
59
- ],
60
- )
61
- )
62
-
63
-
64
- @app.function(
65
- image=image,
66
- gpu="L40S",
67
- secrets=[hf_secret],
68
- volumes={"/cache": hf_cache},
69
- timeout=3600,
70
- scaledown_window=300, # stay warm 5 min after last request
71
- max_containers=1,
72
- )
73
- @modal.concurrent(max_inputs=20)
74
- @modal.asgi_app()
75
- def serve():
76
- import os
77
- import sys
78
- import types
79
-
80
- # --- env: cache model downloads on the volume, before any HF import ---
81
- os.environ["HF_HOME"] = "/cache/hf"
82
- os.environ.setdefault("HF_HUB_ENABLE_HF_TRANSFER", "0")
83
-
84
- # --- mock `spaces` so @spaces.GPU becomes a no-op (we're already on GPU) ---
85
- spaces_mock = types.ModuleType("spaces")
86
- spaces_mock.GPU = lambda *a, **k: (lambda fn: fn)
87
- sys.modules["spaces"] = spaces_mock
88
-
89
- # --- make the mounted project importable ---
90
- sys.path.insert(0, REMOTE_ROOT)
91
-
92
- import gradio as gr
93
- from fastapi import FastAPI
94
-
95
- # Importing app triggers the vision model load (module-level singleton).
96
- from app import build_ui
97
-
98
- demo = build_ui()
99
- demo.queue(max_size=20)
100
-
101
- fastapi_app = FastAPI()
102
- return gr.mount_gradio_app(app=fastapi_app, blocks=demo, path="/")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
packages.txt DELETED
@@ -1,2 +0,0 @@
1
- ffmpeg
2
- libsndfile1
 
 
 
requirements.txt CHANGED
@@ -1,7 +1,10 @@
 
 
1
  gradio==6.15.2
2
  huggingface_hub>=1.17
3
 
4
- # Vision model
 
5
  torch
6
  torchvision
7
  spaces
@@ -9,7 +12,4 @@ Pillow
9
  transformers>=4.45
10
  accelerate
11
  safetensors
12
- av
13
-
14
- # Pipeline & data
15
- pydantic>=2
 
1
+ # --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
2
+ # llama-cpp-python
3
  gradio==6.15.2
4
  huggingface_hub>=1.17
5
 
6
+
7
+ # --- Librerías añadidas y desbloqueadas para MiniCPM-V-4.6 ---
8
  torch
9
  torchvision
10
  spaces
 
12
  transformers>=4.45
13
  accelerate
14
  safetensors
15
+ av
 
 
 
scripts/build_recipe_dataset.py DELETED
@@ -1,281 +0,0 @@
1
- """Build the SFT dataset for the MiniCPM4.1-8B recipe planner.
2
-
3
- Reads the Kaggle "better-recipes-for-a-better-life" dataset and produces
4
- supervised fine-tuning pairs for BOTH planner tasks, matching the exact
5
- prompt formats the app uses (src/prompts/planner_propose.txt and
6
- planner_recipe.txt):
7
-
8
- 1. propose : ingredients -> {"options": [{name, why} x3]}
9
- 2. recipe : dish + ingredients -> {"name", "cuisine", "servings",
10
- "total_time_minutes", "final_dish_visual", "steps":[...]}
11
-
12
- Run locally (once) before fine-tuning:
13
- python scripts/build_recipe_dataset.py
14
-
15
- Requires:
16
- pip install kagglehub pandas pyarrow datasets huggingface_hub tqdm
17
- ~/.kaggle/kaggle.json with your credentials
18
- """
19
- from __future__ import annotations
20
-
21
- import json
22
- import random
23
- import re
24
- import sys
25
- from pathlib import Path
26
-
27
- ROOT = Path(__file__).resolve().parent.parent
28
- sys.path.insert(0, str(ROOT))
29
-
30
- import pandas as pd
31
- from tqdm import tqdm
32
-
33
- from src import config
34
-
35
- random.seed(42)
36
-
37
- HF_DATASET_REPO = "eldinosaur/cook-with-me-recipes-sft"
38
-
39
- # ---------------------------------------------------------------------------
40
- # 1. Download (use ONLY recipes.csv — test_recipes.csv has a different schema
41
- # whose capitalized columns shadowed the real data in the old version)
42
- # ---------------------------------------------------------------------------
43
-
44
- print("Pulling Kaggle dataset…")
45
- import kagglehub
46
-
47
- raw_path = Path(kagglehub.dataset_download(config.KAGGLE_DATASET))
48
- main_csv = raw_path / "recipes.csv"
49
- print(f"Reading {main_csv}")
50
-
51
- # cp1252 decodes the fraction/symbol bytes that show up as � under utf-8
52
- try:
53
- raw_df = pd.read_csv(main_csv, encoding="cp1252", on_bad_lines="skip")
54
- except Exception:
55
- raw_df = pd.read_csv(main_csv, encoding="utf-8", on_bad_lines="skip")
56
-
57
- print(f"Rows: {len(raw_df)} columns: {list(raw_df.columns)}")
58
-
59
-
60
- # ---------------------------------------------------------------------------
61
- # 2. Cleaning helpers
62
- # ---------------------------------------------------------------------------
63
-
64
- _UNIT = (
65
- r"(cups?|tablespoons?|tbsps?|teaspoons?|tsps?|pounds?|lbs?|ounces?|ozs?|"
66
- r"grams?|kgs?|mls?|liters?|pinch(?:es)?|dash(?:es)?|cloves?|cans?|"
67
- r"packages?|pkgs?|sheets?|slices?|sticks?|quarts?|pints?|jars?|bunch(?:es)?|"
68
- r"heads?|stalks?|sprigs?|pieces?|fillets?)"
69
- )
70
- _PREP_WORDS = {
71
- "peeled", "chopped", "diced", "sliced", "minced", "cored", "thawed",
72
- "drained", "rinsed", "softened", "melted", "beaten", "divided", "cubed",
73
- "to taste", "optional", "or more", "plus more", "for garnish", "for serving",
74
- "lightly beaten", "room temperature", "at room temperature", "finely chopped",
75
- "thinly sliced", "cut into", "more", "and", "or other", "such as",
76
- }
77
-
78
-
79
- def _clean_text(val: str) -> str:
80
- if not isinstance(val, str):
81
- return ""
82
- # drop any remaining replacement chars and collapse whitespace
83
- val = val.replace("�", " ")
84
- return re.sub(r"[ \t]+", " ", val).strip()
85
-
86
-
87
- def _simplify_ingredient(raw: str) -> str:
88
- s = re.sub(r"\([^)]*\)", "", raw) # remove parentheticals
89
- s = _clean_text(s).lower()
90
- s = re.sub(r"^[\d\s./¼½¾⅓⅔⅛+-]+", "", s) # leading quantities
91
- s = re.sub(rf"^{_UNIT}\b\.?\s*", "", s) # leading unit word
92
- s = re.sub(r"^(of|the|a|an)\s+", "", s)
93
- s = s.split(",")[0] # drop trailing prep clause
94
- s = re.sub(r"[^a-z\s-]", "", s) # keep letters only
95
- s = re.sub(r"\s+", " ", s).strip()
96
- return s
97
-
98
-
99
- def _ingredient_list(raw: str) -> list[str]:
100
- if not isinstance(raw, str):
101
- return []
102
- out, seen = [], set()
103
- for part in raw.split(","):
104
- name = _simplify_ingredient(part)
105
- if not name or len(name) < 3 or len(name.split()) > 4:
106
- continue
107
- if name in _PREP_WORDS or name in seen:
108
- continue
109
- seen.add(name)
110
- out.append(name)
111
- return out
112
-
113
-
114
- def _steps_from_directions(raw: str) -> list[str]:
115
- if not isinstance(raw, str):
116
- return []
117
- raw = _clean_text(raw.replace("\r", "\n"))
118
- # Prefer explicit newlines; otherwise split into sentences.
119
- parts = [p.strip() for p in raw.split("\n") if p.strip()]
120
- if len(parts) < 2:
121
- parts = [p.strip() for p in re.split(r"(?<=[.!?])\s+(?=[A-Z])", raw) if p.strip()]
122
- # merge very short fragments into the previous step
123
- steps: list[str] = []
124
- for p in parts:
125
- if steps and len(p) < 25:
126
- steps[-1] = steps[-1] + " " + p
127
- else:
128
- steps.append(p)
129
- return [s for s in steps if len(s) > 15]
130
-
131
-
132
- def _minutes(row) -> int:
133
- for col in ("total_time", "cook_time", "prep_time"):
134
- v = row.get(col)
135
- if isinstance(v, str):
136
- h = re.search(r"(\d+)\s*hr", v)
137
- m = re.search(r"(\d+)\s*min", v)
138
- total = (int(h.group(1)) * 60 if h else 0) + (int(m.group(1)) if m else 0)
139
- if total:
140
- return total
141
- return 0
142
-
143
-
144
- def _cuisine(row) -> str:
145
- cp = row.get("cuisine_path")
146
- if isinstance(cp, str):
147
- segs = [s for s in cp.split("/") if s]
148
- if segs:
149
- return segs[0].replace("-", " ").strip().title()
150
- return "International"
151
-
152
-
153
- def _distribute(total: int, n: int) -> list[int]:
154
- if n <= 0:
155
- return []
156
- if total <= 0:
157
- total = n * 6
158
- base = max(2, total // n)
159
- durs = [base] * n
160
- durs[-1] = max(2, total - base * (n - 1))
161
- return durs
162
-
163
-
164
- # ---------------------------------------------------------------------------
165
- # 3. Normalize into clean recipe records
166
- # ---------------------------------------------------------------------------
167
-
168
- recipes: list[dict] = []
169
- for _, r in tqdm(raw_df.iterrows(), total=len(raw_df), desc="Normalizing"):
170
- name = _clean_text(r.get("recipe_name", ""))
171
- ings = _ingredient_list(r.get("ingredients", ""))
172
- steps = _steps_from_directions(r.get("directions", ""))
173
- if not name or len(ings) < 3 or len(steps) < 2:
174
- continue
175
- steps = steps[:7]
176
- if len(steps) < 4 and len(steps) >= 2:
177
- pass # keep short recipes too, 2-3 steps is fine
178
- minutes = _minutes(r) or len(steps) * 6
179
- try:
180
- servings = int(float(str(r.get("servings", "2")).split()[0]))
181
- except Exception:
182
- servings = 2
183
- servings = min(max(servings, 1), 12)
184
- recipes.append({
185
- "name": name,
186
- "ingredients": ings[:14],
187
- "steps": steps,
188
- "cuisine": _cuisine(r),
189
- "minutes": int(minutes),
190
- "servings": servings,
191
- })
192
-
193
- print(f"\nClean recipes: {len(recipes)}")
194
-
195
- config.DATA_DIR.mkdir(parents=True, exist_ok=True)
196
- pd.DataFrame(recipes).to_parquet(config.RECIPES_PARQUET, index=False)
197
- print(f"Saved -> {config.RECIPES_PARQUET}")
198
-
199
-
200
- # ---------------------------------------------------------------------------
201
- # 4. Build SFT pairs matching the app's exact prompt formats
202
- # ---------------------------------------------------------------------------
203
-
204
- PROPOSE_TMPL = (config.PROMPTS_DIR / "planner_propose.txt").read_text(encoding="utf-8")
205
- RECIPE_TMPL = (config.PROMPTS_DIR / "planner_recipe.txt").read_text(encoding="utf-8")
206
-
207
- _WHY = [
208
- "Uses your {a} and {b} for a quick, satisfying result.",
209
- "A fresh way to combine {a} with {b}.",
210
- "Turns {a} and {b} into a comforting classic.",
211
- "Light and flavorful, built around {a} and {b}.",
212
- "Makes the most of {a}, {b} and a few pantry staples.",
213
- ]
214
-
215
-
216
- def _recipe_json(rec: dict) -> str:
217
- durs = _distribute(rec["minutes"], len(rec["steps"]))
218
- steps = [
219
- {"n": i + 1, "instruction": s, "duration": f"{d} min", "tip": None}
220
- for i, (s, d) in enumerate(zip(rec["steps"], durs))
221
- ]
222
- obj = {
223
- "name": rec["name"],
224
- "cuisine": rec["cuisine"],
225
- "servings": rec["servings"],
226
- "total_time_minutes": rec["minutes"],
227
- "final_dish_visual": f"A beautifully plated {rec['name'].lower()}, ready to serve.",
228
- "steps": steps,
229
- }
230
- return json.dumps(obj, ensure_ascii=False)
231
-
232
-
233
- def _propose_json(rec: dict, others: list[dict]) -> str:
234
- a = rec["ingredients"][0] if rec["ingredients"] else "your ingredients"
235
- b = rec["ingredients"][1] if len(rec["ingredients"]) > 1 else "pantry staples"
236
- options = [{"name": rec["name"], "why": random.choice(_WHY).format(a=a, b=b)}]
237
- for o in others:
238
- oa = o["ingredients"][0] if o["ingredients"] else a
239
- ob = o["ingredients"][1] if len(o["ingredients"]) > 1 else b
240
- options.append({"name": o["name"], "why": random.choice(_WHY).format(a=oa, b=ob)})
241
- return json.dumps({"options": options}, ensure_ascii=False)
242
-
243
-
244
- sft_path = config.DATA_DIR / "recipes_sft.jsonl"
245
- n_recipe = n_propose = 0
246
- with open(sft_path, "w", encoding="utf-8") as f:
247
- for idx, rec in enumerate(tqdm(recipes, desc="Building SFT")):
248
- ing_str = ", ".join(rec["ingredients"])
249
-
250
- # --- recipe task ---
251
- user_recipe = RECIPE_TMPL.replace("{dish_name}", rec["name"]).replace("{ingredients}", ing_str)
252
- f.write(json.dumps({"messages": [
253
- {"role": "user", "content": user_recipe},
254
- {"role": "assistant", "content": _recipe_json(rec)},
255
- ]}, ensure_ascii=False) + "\n")
256
- n_recipe += 1
257
-
258
- # --- propose task (use two other recipes as alternative options) ---
259
- others = [recipes[(idx + 7) % len(recipes)], recipes[(idx + 53) % len(recipes)]]
260
- user_propose = PROPOSE_TMPL.replace("{ingredients}", ing_str)
261
- f.write(json.dumps({"messages": [
262
- {"role": "user", "content": user_propose},
263
- {"role": "assistant", "content": _propose_json(rec, others)},
264
- ]}, ensure_ascii=False) + "\n")
265
- n_propose += 1
266
-
267
- print(f"\nSFT pairs: {n_recipe} recipe + {n_propose} propose = {n_recipe + n_propose} -> {sft_path}")
268
-
269
-
270
- # ---------------------------------------------------------------------------
271
- # 5. Push to HF Hub
272
- # ---------------------------------------------------------------------------
273
-
274
- if HF_DATASET_REPO:
275
- from datasets import load_dataset
276
-
277
- ds = load_dataset("json", data_files=str(sft_path), split="train")
278
- ds.push_to_hub(HF_DATASET_REPO)
279
- print(f"Pushed {len(ds)} rows to {HF_DATASET_REPO}")
280
-
281
- print("\nDone.")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
scripts/diag_planner.py DELETED
@@ -1,73 +0,0 @@
1
- """Diagnose why the fine-tuned planner produces empty generations.
2
-
3
- modal run scripts/diag_planner.py
4
- """
5
- import modal
6
-
7
- app = modal.App("cook-with-me-diag")
8
-
9
- image = (
10
- modal.Image.debian_slim(python_version="3.12")
11
- .pip_install(
12
- "torch==2.4.0",
13
- "transformers>=4.54,<5.0", # window with BOTH CacheLayerMixin and is_torch_fx_available
14
- "huggingface_hub>=0.26,<1.0",
15
- "accelerate",
16
- "sentencepiece",
17
- )
18
- )
19
- hf_secret = modal.Secret.from_name("huggingface-secret")
20
-
21
- MODEL_ID = "eldinosaur/cook-with-me-planner-8b" # fine-tuned model under transformers 4.x
22
-
23
-
24
- @app.function(image=image, gpu="L4", secrets=[hf_secret], timeout=900)
25
- def diag():
26
- import torch
27
- import transformers
28
- print("transformers version:", transformers.__version__)
29
-
30
- from transformers import AutoModelForCausalLM, AutoTokenizer
31
-
32
- print("Loading tokenizer (from base) + model (from FT)...")
33
- tok = AutoTokenizer.from_pretrained("openbmb/MiniCPM4.1-8B", trust_remote_code=True)
34
- model = AutoModelForCausalLM.from_pretrained(
35
- MODEL_ID, torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="cuda"
36
- ).eval()
37
- print("has generate:", hasattr(model, "generate"))
38
- print("class mro:", [c.__name__ for c in type(model).__mro__])
39
-
40
- prompt = (
41
- "You are a chef. Given ingredients: tomato, onion, garlic, pasta, olive oil.\n"
42
- 'Return ONLY JSON: {"options": [{"name": "...", "why": "..."}, ...]} with 3 dish ideas.'
43
- )
44
- messages = [{"role": "user", "content": prompt}]
45
-
46
- # Mirror the fixed planner.py path
47
- try:
48
- enc = tok.apply_chat_template(
49
- messages, add_generation_prompt=True, tokenize=True,
50
- return_tensors="pt", return_dict=True,
51
- )
52
- input_ids = enc["input_ids"].to("cuda")
53
- input_len = input_ids.shape[1]
54
- gen_inputs = {"input_ids": input_ids}
55
- if enc.get("attention_mask") is not None:
56
- gen_inputs["attention_mask"] = enc["attention_mask"].to("cuda")
57
- print("input length:", input_len)
58
- with torch.no_grad():
59
- out = model.generate(**gen_inputs, max_new_tokens=400, do_sample=False)
60
- text = tok.decode(out[0][input_len:], skip_special_tokens=True)
61
- print("=== GENERATION OK (transformers 4.x, cache on) ===")
62
- print("OUTPUT:", repr(text[:1000]))
63
- except Exception as e:
64
- import traceback
65
- print("=== GENERATION FAILED ===")
66
- print("Exception type:", type(e).__name__)
67
- print("Exception repr:", repr(e))
68
- traceback.print_exc()
69
-
70
-
71
- @app.local_entrypoint()
72
- def main():
73
- diag.remote()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
scripts/train_planner.py DELETED
@@ -1,172 +0,0 @@
1
- """Fine-tune MiniCPM4.1-8B on the recipe SFT dataset via Modal (A10G GPU).
2
-
3
- Usage:
4
- modal run scripts/train_planner.py
5
-
6
- After training, the adapter is merged and the full model is pushed to HF Hub
7
- as <HF_USERNAME>/cook-with-me-planner-8b
8
-
9
- Set HF_USERNAME below (or export HF_TOKEN env var before running).
10
- """
11
- from __future__ import annotations
12
-
13
- import modal
14
-
15
- # ---------------------------------------------------------------------------
16
- # Config — change these two values
17
- # ---------------------------------------------------------------------------
18
- HF_USERNAME = "eldinosaur"
19
- SFT_DATASET_REPO = f"{HF_USERNAME}/cook-with-me-recipes-sft"
20
- OUTPUT_REPO = f"{HF_USERNAME}/cook-with-me-planner-8b"
21
- BASE_MODEL = "openbmb/MiniCPM4.1-8B"
22
- # ---------------------------------------------------------------------------
23
-
24
- app = modal.App("cook-with-me-train")
25
-
26
- volume = modal.Volume.from_name("cook-with-me-train-vol", create_if_missing=True)
27
-
28
- train_image = (
29
- modal.Image.debian_slim(python_version="3.12")
30
- .pip_install(
31
- "torch==2.4.0",
32
- "transformers>=5.0",
33
- "peft>=0.12",
34
- "trl>=0.10",
35
- "accelerate",
36
- "datasets",
37
- "huggingface_hub>=1.17",
38
- "bitsandbytes",
39
- "sentencepiece",
40
- "safetensors",
41
- )
42
- )
43
-
44
- hf_secret = modal.Secret.from_name("huggingface-secret")
45
-
46
-
47
- @app.function(
48
- image=train_image,
49
- gpu="A10G",
50
- timeout=60 * 60 * 3, # 3-hour hard cap
51
- secrets=[hf_secret],
52
- volumes={"/vol": volume},
53
- )
54
- def train():
55
- import os
56
- import torch
57
- from datasets import load_dataset
58
- from peft import LoraConfig, get_peft_model, TaskType
59
- from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
60
- from trl import SFTTrainer, SFTConfig
61
-
62
- os.environ.setdefault("HF_HOME", "/vol/hf_cache")
63
-
64
- # MiniCPM4.1-8B custom code references is_torch_fx_available which was
65
- # removed in transformers 5.x. Patch it back before loading the model.
66
- import transformers.utils.import_utils as _iutils
67
- if not hasattr(_iutils, "is_torch_fx_available"):
68
- def _is_torch_fx_available():
69
- try:
70
- import torch.fx # noqa: F401
71
- return True
72
- except ImportError:
73
- return False
74
- _iutils.is_torch_fx_available = _is_torch_fx_available
75
-
76
- # ---- Load tokenizer & model ----
77
- print(f"Loading {BASE_MODEL}…")
78
- tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)
79
- if tokenizer.pad_token is None:
80
- tokenizer.pad_token = tokenizer.eos_token
81
-
82
- model = AutoModelForCausalLM.from_pretrained(
83
- BASE_MODEL,
84
- torch_dtype=torch.bfloat16,
85
- trust_remote_code=True,
86
- device_map="cuda",
87
- )
88
-
89
- # ---- LoRA config ----
90
- lora_cfg = LoraConfig(
91
- task_type=TaskType.CAUSAL_LM,
92
- r=16,
93
- lora_alpha=32,
94
- lora_dropout=0.05,
95
- target_modules="all-linear",
96
- bias="none",
97
- )
98
- model = get_peft_model(model, lora_cfg)
99
- model.print_trainable_parameters()
100
-
101
- # ---- Dataset ----
102
- print(f"Loading dataset {SFT_DATASET_REPO}…")
103
- ds = load_dataset(SFT_DATASET_REPO, split="train")
104
-
105
- def _format(example):
106
- return {"text": tokenizer.apply_chat_template(
107
- example["messages"], tokenize=False, add_generation_prompt=False
108
- )}
109
-
110
- ds = ds.map(_format, remove_columns=ds.column_names)
111
-
112
- # ---- Training ----
113
- output_dir = "/vol/planner_out"
114
- trainer = SFTTrainer(
115
- model=model,
116
- processing_class=tokenizer,
117
- train_dataset=ds,
118
- args=SFTConfig(
119
- output_dir=output_dir,
120
- num_train_epochs=3, # 2046 examples — 3 epochs converges without overfitting
121
- per_device_train_batch_size=2,
122
- gradient_accumulation_steps=4,
123
- learning_rate=2e-4,
124
- lr_scheduler_type="cosine",
125
- warmup_ratio=0.05,
126
- bf16=True,
127
- logging_steps=20,
128
- save_steps=200,
129
- max_length=2048,
130
- dataset_text_field="text",
131
- ),
132
- )
133
- trainer.train()
134
- trainer.save_model(output_dir)
135
-
136
- # ---- Merge LoRA + push ----
137
- print("Merging LoRA adapter…")
138
- from peft import PeftModel
139
-
140
- base = AutoModelForCausalLM.from_pretrained(
141
- BASE_MODEL, torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="cpu"
142
- )
143
- merged = PeftModel.from_pretrained(base, output_dir)
144
- merged = merged.merge_and_unload()
145
-
146
- # MiniCPM custom code declares `_tied_weights_keys` as a list, but
147
- # transformers 5.x's save path calls `.keys()` on it. Patch the walker
148
- # to tolerate both list and dict formats before saving/pushing.
149
- import transformers.modeling_utils as _mu
150
-
151
- def _safe_get_tied_weight_keys(model, *args, **kwargs):
152
- keys = []
153
- for module_name, module in model.named_modules():
154
- tied = getattr(module, "_tied_weights_keys", None)
155
- if not tied:
156
- continue
157
- names = tied.keys() if isinstance(tied, dict) else tied
158
- for k in names:
159
- keys.append(f"{module_name}.{k}" if module_name else k)
160
- return keys
161
-
162
- _mu._get_tied_weight_keys = _safe_get_tied_weight_keys
163
-
164
- print(f"Pushing merged model to {OUTPUT_REPO}…")
165
- merged.push_to_hub(OUTPUT_REPO, private=False)
166
- tokenizer.push_to_hub(OUTPUT_REPO, private=False)
167
- print("Done.")
168
-
169
-
170
- @app.local_entrypoint()
171
- def main():
172
- train.remote()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/agents/progress_validator.py DELETED
@@ -1,84 +0,0 @@
1
- """Progress validation agent: compare cooking photo against target step."""
2
- from __future__ import annotations
3
-
4
- import logging
5
- from typing import Optional
6
-
7
- import spaces
8
- import torch
9
- from PIL import Image
10
-
11
- from src import config
12
- from src.agents.mise_en_place import model, processor
13
- from src.agents.recipe_planner import _extract_json
14
-
15
- log = logging.getLogger(__name__)
16
-
17
- _VALIDATOR_PROMPT = (config.PROMPTS_DIR / "validator_prompt.txt").read_text(encoding="utf-8")
18
-
19
-
20
- @spaces.GPU(duration=45)
21
- def validate(image: Optional[Image.Image], step_instruction: str) -> dict:
22
- """Compare a cooking-progress photo to the target step description.
23
-
24
- Returns a dict with keys: verdict ('go'|'wait'|'fix'), feedback, tip.
25
- """
26
- if image is None:
27
- return {
28
- "verdict": "wait",
29
- "feedback": "No image provided.",
30
- "tip": "Upload a photo of your cooking progress to get feedback.",
31
- }
32
- try:
33
- img = image.convert("RGB")
34
- prompt = _VALIDATOR_PROMPT.replace("{step_instruction}", step_instruction)
35
-
36
- messages = [{"role": "user", "content": [
37
- {"type": "image", "image": img},
38
- {"type": "text", "text": prompt},
39
- ]}]
40
-
41
- inputs = processor.apply_chat_template(
42
- messages,
43
- add_generation_prompt=True,
44
- tokenize=True,
45
- return_dict=True,
46
- return_tensors="pt",
47
- enable_thinking=False,
48
- processor_kwargs={"downsample_mode": "16x", "max_slice_nums": 9, "use_image_id": True},
49
- )
50
- device = model.device
51
- inputs = {k: v.to(device) if isinstance(v, torch.Tensor) else v for k, v in inputs.items()}
52
- for k, v in inputs.items():
53
- if isinstance(v, torch.Tensor) and torch.is_floating_point(v):
54
- inputs[k] = v.to(dtype=torch.bfloat16)
55
-
56
- with torch.no_grad():
57
- generated_ids = model.generate(
58
- **inputs,
59
- max_new_tokens=256,
60
- do_sample=False,
61
- downsample_mode="16x",
62
- )
63
-
64
- trimmed = [out[len(inp):] for inp, out in zip(inputs["input_ids"], generated_ids)]
65
- raw = processor.batch_decode(trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
66
- log.info("validate raw: %s", raw[:400])
67
-
68
- data = _extract_json(raw)
69
- verdict = str(data.get("verdict", "wait"))
70
- if verdict not in ("go", "wait", "fix"):
71
- verdict = "wait"
72
-
73
- return {
74
- "verdict": verdict,
75
- "feedback": str(data.get("feedback", "")),
76
- "tip": str(data.get("tip", "")),
77
- }
78
- except Exception as exc:
79
- log.warning("validate failed: %s", exc)
80
- return {
81
- "verdict": "wait",
82
- "feedback": "Could not analyse the photo.",
83
- "tip": "Make sure the image is well-lit and in focus.",
84
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/agents/recipe_planner.py DELETED
@@ -1,167 +0,0 @@
1
- """Recipe planner agent: propose dishes + generate step-by-step recipe.
2
-
3
- Uses openbmb/MiniCPM4.1-8B (text-only) as the primary planner.
4
- Falls back to the shared vision model (MiniCPM-V-4.6) when the planner
5
- model is unavailable (e.g. insufficient RAM on the Space).
6
- """
7
- from __future__ import annotations
8
-
9
- import json
10
- import logging
11
- import re
12
-
13
- import spaces
14
- import torch
15
-
16
- from src import config
17
- from src.pipeline import DishOption, Recipe, RecipeStep
18
-
19
- log = logging.getLogger(__name__)
20
-
21
- _PROPOSE_PROMPT = (config.PROMPTS_DIR / "planner_propose.txt").read_text(encoding="utf-8")
22
- _RECIPE_PROMPT = (config.PROMPTS_DIR / "planner_recipe.txt").read_text(encoding="utf-8")
23
-
24
-
25
- # ---------------------------------------------------------------------------
26
- # JSON extraction helpers
27
- # ---------------------------------------------------------------------------
28
-
29
- def _extract_json(text: str) -> dict:
30
- """Robustly extract the first JSON object from raw model output."""
31
- text = text.strip()
32
- try:
33
- return json.loads(text)
34
- except Exception:
35
- pass
36
- # Markdown code-block
37
- m = re.search(r"```(?:json)?\s*(\{.*?\})\s*```", text, re.DOTALL)
38
- if m:
39
- try:
40
- return json.loads(m.group(1))
41
- except Exception:
42
- pass
43
- # First {...} block with minor auto-fixes
44
- m = re.search(r"\{.*\}", text, re.DOTALL)
45
- if m:
46
- candidate = m.group(0)
47
- candidate = candidate.replace("'", '"')
48
- candidate = re.sub(r",\s*([}\]])", r"\1", candidate)
49
- try:
50
- return json.loads(candidate)
51
- except Exception:
52
- pass
53
- log.warning("Could not extract JSON from output (first 300 chars): %.300s", text)
54
- return {}
55
-
56
-
57
- # ---------------------------------------------------------------------------
58
- # Inference dispatcher
59
- # ---------------------------------------------------------------------------
60
-
61
- def _infer(prompt: str, max_new_tokens: int = 1024, temperature: float = 0.0) -> str:
62
- """Run text inference.
63
-
64
- Primary: the dedicated MiniCPM4.1-8B planner Modal endpoint (transformers
65
- 4.x). Falls back to the local vision model (text-only) if the endpoint is
66
- unavailable or returns nothing.
67
- """
68
- try:
69
- import modal
70
- cls = modal.Cls.from_name(config.PLANNER_MODAL_APP, config.PLANNER_MODAL_CLS)
71
- out = cls().infer.remote(prompt, max_new_tokens=max_new_tokens, temperature=temperature)
72
- if out and out.strip():
73
- return out
74
- log.warning("Planner endpoint returned empty — falling back to vision model.")
75
- except Exception as exc:
76
- log.warning("Planner endpoint call failed: %s — falling back to vision model.", exc)
77
-
78
- # Fallback: use the vision model in text-only mode
79
- log.warning("Using vision model as text fallback.")
80
- from src.agents.mise_en_place import model as vis_model, processor as vis_proc
81
-
82
- messages = [{"role": "user", "content": [{"type": "text", "text": prompt}]}]
83
- inputs = vis_proc.apply_chat_template(
84
- messages,
85
- add_generation_prompt=True,
86
- tokenize=True,
87
- return_dict=True,
88
- return_tensors="pt",
89
- enable_thinking=False,
90
- )
91
- device = vis_model.device
92
- inputs = {k: v.to(device) if isinstance(v, torch.Tensor) else v for k, v in inputs.items()}
93
- for k, v in inputs.items():
94
- if isinstance(v, torch.Tensor) and torch.is_floating_point(v):
95
- inputs[k] = v.to(dtype=torch.bfloat16)
96
-
97
- with torch.no_grad():
98
- generated_ids = vis_model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False)
99
-
100
- trimmed = [out[len(inp):] for inp, out in zip(inputs["input_ids"], generated_ids)]
101
- return vis_proc.batch_decode(trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
102
-
103
-
104
- # ---------------------------------------------------------------------------
105
- # Public agent functions
106
- # ---------------------------------------------------------------------------
107
-
108
- @spaces.GPU(duration=90)
109
- def propose_dishes(ingredients: list[str]) -> list[DishOption]:
110
- """Given detected ingredients, return up to 3 dish proposals."""
111
- try:
112
- prompt = _PROPOSE_PROMPT.replace("{ingredients}", ", ".join(ingredients))
113
- raw = _infer(prompt, max_new_tokens=512, temperature=0.7)
114
- log.info("propose_dishes raw: %.500s", raw)
115
- data = _extract_json(raw)
116
- options = data.get("options", [])
117
- return [
118
- DishOption(name=str(o.get("name", "Dish")), why=str(o.get("why", "")))
119
- for o in options[:3]
120
- if o.get("name")
121
- ] or [DishOption(name="Simple Stir-fry", why="Quick and adaptable to most ingredients.")]
122
- except Exception as exc:
123
- log.warning("propose_dishes failed: %s", exc)
124
- return [DishOption(name="Simple Stir-fry", why="Quick and adaptable to most ingredients.")]
125
-
126
-
127
- @spaces.GPU(duration=120)
128
- def plan_recipe(dish_name: str, ingredients: list[str]) -> Recipe:
129
- """Generate a full step-by-step recipe for the chosen dish."""
130
- try:
131
- prompt = (
132
- _RECIPE_PROMPT
133
- .replace("{dish_name}", dish_name)
134
- .replace("{ingredients}", ", ".join(ingredients))
135
- )
136
- raw = _infer(prompt, max_new_tokens=1024, temperature=0.0)
137
- log.info("plan_recipe raw: %.800s", raw)
138
- data = _extract_json(raw)
139
-
140
- raw_steps = data.get("steps", [])
141
- steps = []
142
- for i, s in enumerate(raw_steps, start=1):
143
- if not s.get("instruction"):
144
- continue
145
- tip_val = s.get("tip")
146
- steps.append(RecipeStep(
147
- n=int(s.get("n", i)),
148
- instruction=str(s["instruction"]),
149
- duration=str(s.get("duration", "5 min")),
150
- tip=str(tip_val) if tip_val and str(tip_val).lower() not in ("null", "none") else None,
151
- visual=str(s.get("visual", "")),
152
- ))
153
-
154
- return Recipe(
155
- name=str(data.get("name", dish_name)),
156
- cuisine=str(data.get("cuisine", "International")),
157
- servings=int(data.get("servings", 2)),
158
- total_time_minutes=int(data.get("total_time_minutes", 30)),
159
- final_dish_visual=str(data.get("final_dish_visual", "")),
160
- steps=steps or [RecipeStep(n=1, instruction="Prepare and cook ingredients to taste.", duration="20 min")],
161
- )
162
- except Exception as exc:
163
- log.warning("plan_recipe failed: %s", exc)
164
- return Recipe(
165
- name=dish_name,
166
- steps=[RecipeStep(n=1, instruction="Prepare and cook ingredients to taste.", duration="20 min")],
167
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/agents/step_illustrator.py DELETED
@@ -1,81 +0,0 @@
1
- """Step image generator — delegates to the deployed Modal FLUX.2 endpoint."""
2
- from __future__ import annotations
3
-
4
- import base64
5
- import logging
6
- from typing import Optional
7
-
8
- from src import config
9
- from src.pipeline import Recipe, RecipeStep
10
-
11
- log = logging.getLogger(__name__)
12
-
13
-
14
- # ---------------------------------------------------------------------------
15
- # Helpers
16
- # ---------------------------------------------------------------------------
17
-
18
- def _b64(png_bytes: bytes) -> str:
19
- return base64.b64encode(png_bytes).decode()
20
-
21
-
22
- def _step_prompt(visual: str, cuisine: str, n: int) -> str:
23
- desc = visual.strip() or f"cooking step {n}"
24
- return (
25
- f"Top-down photo of a kitchen pan or plate showing {desc}. "
26
- f"{cuisine} home cooking. Warm natural lighting. "
27
- "Recipe magazine style. Photorealistic. Appetizing."
28
- )
29
-
30
-
31
- def _dish_prompt(visual: str, cuisine: str) -> str:
32
- desc = visual.strip() or "the finished plated dish, garnished and beautifully presented"
33
- return (
34
- f"Top-down photo of a {desc} on a rustic wooden table. "
35
- f"{cuisine} home cooking. Warm natural lighting. "
36
- "Recipe magazine style. Photorealistic. Appetizing."
37
- )
38
-
39
-
40
- # ---------------------------------------------------------------------------
41
- # Modal call
42
- # ---------------------------------------------------------------------------
43
-
44
- def _call_modal(prompt: str, seed: int = 42) -> Optional[bytes]:
45
- """Call the deployed Modal FLUX endpoint. Returns PNG bytes or None."""
46
- try:
47
- import modal
48
- cls = modal.Cls.from_name(config.MODAL_APP_NAME, config.MODAL_CLS_NAME)
49
- return cls().render_step.remote(prompt, seed=seed)
50
- except Exception as exc:
51
- log.warning("Modal FLUX call failed: %s", exc)
52
- return None
53
-
54
-
55
- # ---------------------------------------------------------------------------
56
- # Public function
57
- # ---------------------------------------------------------------------------
58
-
59
- def illustrate_recipe(recipe: Recipe) -> Recipe:
60
- """Generate FLUX images for every step + final dish.
61
-
62
- Mutates and returns the same Recipe with image_b64 fields populated
63
- (or left as None when Modal is unavailable).
64
- """
65
- cuisine = recipe.cuisine or "International"
66
-
67
- # Final dish hero image
68
- final_bytes = _call_modal(_dish_prompt(recipe.final_dish_visual, cuisine), seed=0)
69
- if final_bytes:
70
- recipe.final_dish_image_b64 = _b64(final_bytes)
71
- log.info("Generated final dish image.")
72
-
73
- # Per-step images (sequential to respect GPU limits on Modal)
74
- for step in recipe.steps:
75
- prompt = _step_prompt(step.visual, cuisine, step.n)
76
- step_bytes = _call_modal(prompt, seed=step.n)
77
- if step_bytes:
78
- step.image_b64 = _b64(step_bytes)
79
- log.info("Generated image for step %d.", step.n)
80
-
81
- return recipe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/config.py CHANGED
@@ -21,21 +21,10 @@ VISION_REPO = "openbmb/MiniCPM-V-4_6-GGUF"
21
  VISION_MODEL_FILE = "MiniCPM-V-4_6-Q4_K_M.gguf"
22
  VISION_MMPROJ_FILE = "mmproj-model-f16.gguf"
23
 
24
- # Base model; set COOK_WITH_ME_PLANNER_REPO to point at a fine-tuned HF repo
25
- PLANNER_REPO = os.environ.get("COOK_WITH_ME_PLANNER_REPO", "openbmb/MiniCPM4.1-8B")
26
- PLANNER_FINETUNED_REPO = os.environ.get("COOK_WITH_ME_PLANNER_FT_REPO", "") # set after fine-tune
27
 
28
- # Modal app names
29
- MODAL_APP_NAME = "cook-with-me-flux"
30
- MODAL_CLS_NAME = "FluxKlein"
31
-
32
- # Planner runs in its own Modal app (transformers 4.x, conflicts with the
33
- # vision model's transformers 5.x — so it can't live in the same container).
34
- PLANNER_MODAL_APP = "cook-with-me-planner"
35
- PLANNER_MODAL_CLS = "Planner"
36
-
37
- FLUX_REPO = os.environ.get("COOK_WITH_ME_FLUX_REPO", "black-forest-labs/FLUX.2-klein-9B")
38
- FLUX_FALLBACK_REPO = "black-forest-labs/FLUX.1-schnell"
39
  NARRATOR_REPO = "openbmb/VoxCPM2"
40
  EMBED_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
41
 
 
21
  VISION_MODEL_FILE = "MiniCPM-V-4_6-Q4_K_M.gguf"
22
  VISION_MMPROJ_FILE = "mmproj-model-f16.gguf"
23
 
24
+ PLANNER_REPO = "openbmb/MiniCPM-V-4-gguf"
25
+ PLANNER_MODEL_FILE = "Model-Q4_K_M.gguf"
 
26
 
27
+ FLUX_REPO = "black-forest-labs/FLUX.2-klein-9B"
 
 
 
 
 
 
 
 
 
 
28
  NARRATOR_REPO = "openbmb/VoxCPM2"
29
  EMBED_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
30
 
src/data/__init__.py DELETED
File without changes
src/data/nutrition.py DELETED
@@ -1,112 +0,0 @@
1
- """Per-serving macro estimator — ingredient lookup, no extra model call needed."""
2
- from __future__ import annotations
3
-
4
- # (calories kcal, protein g, carbs g, fat g, fiber g) per 100 g
5
- _MACROS: dict[str, tuple[float, float, float, float, float]] = {
6
- # proteins
7
- "chicken": (165, 31, 0, 3.6, 0),
8
- "beef": (250, 26, 0, 16, 0),
9
- "pork": (242, 27, 0, 14, 0),
10
- "fish": (130, 20, 0, 5, 0),
11
- "salmon": (208, 20, 0, 13, 0),
12
- "tuna": (130, 29, 0, 0.5, 0),
13
- "shrimp": (99, 24, 0, 0.3, 0),
14
- "egg": (155, 13, 1.1, 11, 0),
15
- "eggs": (155, 13, 1.1, 11, 0),
16
- "tofu": (76, 8, 1.9, 4.8, 0.3),
17
- # dairy
18
- "milk": (61, 3.2, 4.8, 3.3, 0),
19
- "cheese": (402, 25, 1.3, 33, 0),
20
- "butter": (717, 0.9, 0.1, 81, 0),
21
- "yogurt": (59, 3.5, 4.7, 3.3, 0),
22
- "cream": (340, 2.1, 2.8, 36, 0),
23
- # starches
24
- "rice": (130, 2.7, 28, 0.3, 0.4),
25
- "pasta": (158, 5.8, 31, 0.9, 1.8),
26
- "bread": (265, 9, 49, 3.2, 2.7),
27
- "potato": (77, 2, 17, 0.1, 2.2),
28
- "potatoes": (77, 2, 17, 0.1, 2.2),
29
- "flour": (364, 10, 76, 1, 2.7),
30
- "oats": (389, 17, 66, 7, 10.6),
31
- "quinoa": (120, 4.1, 21, 1.9, 2.8),
32
- "lentils": (116, 9, 20, 0.4, 7.9),
33
- "beans": (347, 21, 60, 1.2, 15),
34
- "chickpeas": (164, 8.9, 27, 2.6, 7.6),
35
- # vegetables
36
- "tomato": (18, 0.9, 3.9, 0.2, 1.2),
37
- "tomatoes": (18, 0.9, 3.9, 0.2, 1.2),
38
- "onion": (40, 1.1, 9.3, 0.1, 1.7),
39
- "onions": (40, 1.1, 9.3, 0.1, 1.7),
40
- "garlic": (149, 6.4, 33, 0.5, 2.1),
41
- "carrot": (41, 0.9, 10, 0.2, 2.8),
42
- "carrots": (41, 0.9, 10, 0.2, 2.8),
43
- "broccoli": (34, 2.8, 7, 0.4, 2.6),
44
- "spinach": (23, 2.9, 3.6, 0.4, 2.2),
45
- "pepper": (31, 1, 6, 0.3, 2.1),
46
- "peppers": (31, 1, 6, 0.3, 2.1),
47
- "mushroom": (22, 3.1, 3.3, 0.3, 1),
48
- "mushrooms": (22, 3.1, 3.3, 0.3, 1),
49
- "zucchini": (17, 1.2, 3.1, 0.3, 1),
50
- "corn": (86, 3.3, 19, 1.4, 2.7),
51
- "lettuce": (15, 1.4, 2.9, 0.2, 1.3),
52
- "cucumber": (16, 0.7, 3.6, 0.1, 0.5),
53
- "eggplant": (25, 1, 5.9, 0.2, 3),
54
- "cabbage": (25, 1.3, 5.8, 0.1, 2.5),
55
- "celery": (16, 0.7, 3, 0.2, 1.6),
56
- "leek": (61, 1.5, 14, 0.3, 1.8),
57
- # fruits
58
- "apple": (52, 0.3, 14, 0.2, 2.4),
59
- "banana": (89, 1.1, 23, 0.3, 2.6),
60
- "lemon": (29, 1.1, 9.3, 0.3, 2.8),
61
- "lime": (30, 0.7, 10.5, 0.2, 2.8),
62
- "orange": (47, 0.9, 12, 0.1, 2.4),
63
- # fats & condiments
64
- "olive oil": (884, 0, 0, 100, 0),
65
- "oil": (884, 0, 0, 100, 0),
66
- "soy sauce": (53, 8.1, 4.9, 0.1, 0.8),
67
- "honey": (304, 0.3, 82, 0, 0.2),
68
- "sugar": (387, 0, 100, 0, 0),
69
- "salt": (0, 0, 0, 0, 0),
70
- "vinegar": (18, 0, 0.9, 0, 0),
71
- }
72
-
73
- # Typical portion weight per ingredient (grams)
74
- _GRAMS: dict[str, int] = {
75
- "egg": 50, "eggs": 100,
76
- "butter": 15,
77
- "olive oil": 14, "oil": 14,
78
- "soy sauce": 15,
79
- "salt": 3,
80
- "garlic": 10,
81
- "honey": 21,
82
- "sugar": 12,
83
- "lemon": 30, "lime": 30,
84
- }
85
- _DEFAULT_GRAMS = 80
86
-
87
-
88
- def compute_nutrition(ingredients: list[str], servings: int = 2) -> dict[str, float]:
89
- """Return per-serving macro estimates keyed to the NutritionGrid format."""
90
- cal = prot = carb = fat = fib = 0.0
91
- for ing in ingredients:
92
- key = ing.lower().strip()
93
- row = _MACROS.get(key) or _MACROS.get(key.split()[0]) if key else None
94
- if row is None:
95
- continue
96
- grams = _GRAMS.get(key, _DEFAULT_GRAMS)
97
- f = grams / 100
98
- c, p, cb, ft, fb = row
99
- cal += c * f
100
- prot += p * f
101
- carb += cb * f
102
- fat += ft * f
103
- fib += fb * f
104
-
105
- sv = max(servings, 1)
106
- return {
107
- "calories": round(cal / sv),
108
- "protein_g": round(prot / sv, 1),
109
- "carbs_g": round(carb / sv, 1),
110
- "fat_g": round(fat / sv, 1),
111
- "fiber_g": round(fib / sv, 1),
112
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/models/planner.py DELETED
@@ -1,103 +0,0 @@
1
- """MiniCPM4.1-8B text-only planner — lazy singleton."""
2
- from __future__ import annotations
3
-
4
- import logging
5
- import os
6
- from typing import Any, Optional, Tuple
7
-
8
- import torch
9
-
10
- from src import config
11
-
12
- log = logging.getLogger(__name__)
13
-
14
- _model: Any = None
15
- _tokenizer: Any = None
16
-
17
-
18
- def get_planner() -> Tuple[Optional[Any], Optional[Any]]:
19
- """Return (model, tokenizer). Loads once; returns (None, None) on failure."""
20
- global _model, _tokenizer
21
- if _model is not None:
22
- return _model, _tokenizer
23
-
24
- # Prefer fine-tuned repo when available
25
- model_id = config.PLANNER_FINETUNED_REPO or config.PLANNER_REPO
26
- try:
27
- # MiniCPM4.1 custom code imports is_torch_fx_available, which was
28
- # removed in transformers 5.x. Patch it back before loading.
29
- import transformers.utils.import_utils as _iutils
30
- if not hasattr(_iutils, "is_torch_fx_available"):
31
- def _is_torch_fx_available():
32
- try:
33
- import torch.fx # noqa: F401
34
- return True
35
- except ImportError:
36
- return False
37
- _iutils.is_torch_fx_available = _is_torch_fx_available
38
-
39
- from transformers import AutoModelForCausalLM, AutoTokenizer
40
-
41
- device_map = "auto" if os.environ.get("SPACE_ID") else (
42
- "cuda" if torch.cuda.is_available() else "cpu"
43
- )
44
- log.info("Loading planner model %s (device_map=%s)...", model_id, device_map)
45
- _tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
46
- _model = AutoModelForCausalLM.from_pretrained(
47
- model_id,
48
- torch_dtype=torch.bfloat16,
49
- trust_remote_code=True,
50
- device_map=device_map,
51
- ).eval()
52
- log.info("Planner model ready.")
53
- except Exception as exc:
54
- log.error("Could not load planner model '%s': %s", model_id, exc)
55
- _model = None
56
- _tokenizer = None
57
-
58
- return _model, _tokenizer
59
-
60
-
61
- def infer(prompt: str, max_new_tokens: int = 1024, temperature: float = 0.0) -> str:
62
- """Run text inference with the planner model.
63
-
64
- Returns empty string if the model is unavailable.
65
- """
66
- model, tokenizer = get_planner()
67
- if model is None or tokenizer is None:
68
- return ""
69
-
70
- try:
71
- messages = [{"role": "user", "content": prompt}]
72
-
73
- # return_dict=True yields a BatchEncoding (dict-like) with input_ids +
74
- # attention_mask. NOTE: BatchEncoding is NOT a `dict` instance, so we
75
- # must access it via mapping keys, never via tensor attrs like .shape.
76
- enc = tokenizer.apply_chat_template(
77
- messages,
78
- add_generation_prompt=True,
79
- tokenize=True,
80
- return_tensors="pt",
81
- return_dict=True,
82
- )
83
- input_ids = enc["input_ids"].to(model.device)
84
- input_len = input_ids.shape[1]
85
-
86
- gen_inputs = {"input_ids": input_ids}
87
- attn = enc.get("attention_mask")
88
- if attn is not None:
89
- gen_inputs["attention_mask"] = attn.to(model.device)
90
-
91
- gen_kwargs: dict = dict(max_new_tokens=max_new_tokens, do_sample=False)
92
- if temperature > 0:
93
- gen_kwargs.update(do_sample=True, temperature=temperature, top_p=0.95)
94
-
95
- with torch.no_grad():
96
- output = model.generate(**gen_inputs, **gen_kwargs)
97
-
98
- token_ids = output[0][input_len:]
99
- return tokenizer.decode(token_ids, skip_special_tokens=True)
100
-
101
- except Exception as exc:
102
- log.error("Planner inference error: %r", exc, exc_info=True)
103
- return ""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/pipeline.py DELETED
@@ -1,32 +0,0 @@
1
- """Shared data models for the Cook-with-Me pipeline."""
2
- from __future__ import annotations
3
-
4
- from typing import Optional
5
- from pydantic import BaseModel, Field
6
-
7
-
8
- class DishOption(BaseModel):
9
- name: str
10
- why: str = ""
11
-
12
-
13
- class RecipeStep(BaseModel):
14
- n: int = 1
15
- instruction: str
16
- duration: str = "5 min"
17
- tip: Optional[str] = None
18
- visual: str = ""
19
- image_path: Optional[str] = None
20
- image_b64: Optional[str] = None # base64 PNG from FLUX
21
-
22
-
23
- class Recipe(BaseModel):
24
- name: str
25
- cuisine: str = "International"
26
- servings: int = 2
27
- total_time_minutes: int = 30
28
- steps: list[RecipeStep] = Field(default_factory=list)
29
- nutrition: dict = Field(default_factory=dict)
30
- final_dish_visual: str = ""
31
- final_dish_image_path: Optional[str] = None
32
- final_dish_image_b64: Optional[str] = None # base64 PNG from FLUX
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/prompts/planner_propose.txt DELETED
@@ -1,11 +0,0 @@
1
- You are a creative chef assistant. Given a list of available ingredients, suggest exactly 3 diverse and delicious dishes.
2
-
3
- Available ingredients: {ingredients}
4
-
5
- Rules:
6
- - Each dish must be realistic to make with the listed ingredients
7
- - Vary the style: aim for different cuisines or preparations
8
- - Be specific with dish names (e.g., "Garlic Butter Shrimp Pasta" not "Pasta")
9
-
10
- Respond ONLY with valid JSON and nothing else — no explanation, no markdown fences:
11
- {"options": [{"name": "Dish Name 1", "why": "One sentence on why this works with the ingredients"}, {"name": "Dish Name 2", "why": "..."}, {"name": "Dish Name 3", "why": "..."}]}
 
 
 
 
 
 
 
 
 
 
 
 
src/prompts/planner_recipe.txt DELETED
@@ -1,11 +0,0 @@
1
- You are a professional chef writing a clear, detailed recipe.
2
-
3
- Dish to prepare: {dish_name}
4
- Available ingredients: {ingredients}
5
-
6
- Create a complete recipe with 4 to 7 steps. Each step must be specific and actionable.
7
-
8
- Respond ONLY with valid JSON and nothing else — no explanation, no markdown fences:
9
- {"name": "Full Recipe Title", "cuisine": "Cuisine type", "servings": 2, "total_time_minutes": 30, "final_dish_visual": "One evocative sentence describing how the finished dish looks and smells", "steps": [{"n": 1, "instruction": "Detailed step description.", "duration": "5 min", "tip": "Optional chef tip or null"}, {"n": 2, "instruction": "...", "duration": "3 min", "tip": null}]}
10
-
11
- Important: tip must be a string or null, never omit it.
 
 
 
 
 
 
 
 
 
 
 
 
src/prompts/validator_prompt.txt DELETED
@@ -1,14 +0,0 @@
1
- You are a supportive cooking coach reviewing a student's progress photo.
2
-
3
- The step they are working on:
4
- "{step_instruction}"
5
-
6
- Look carefully at the photo and decide:
7
- - "go" → the step is correctly completed, they can move on
8
- - "wait" → it's progressing but needs more time (undercooked, still mixing, etc.)
9
- - "fix" → there is a clear mistake that needs correction right now
10
-
11
- Respond ONLY with valid JSON and nothing else:
12
- {"verdict": "go", "feedback": "One sentence describing exactly what you see in the photo.", "tip": "One specific, actionable piece of advice for the cook."}
13
-
14
- verdict must be exactly one of: go, wait, fix.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/ui/components.py CHANGED
@@ -80,7 +80,7 @@ class TemplatedHTML(gr.HTML):
80
  class RecipeHero(TemplatedHTML):
81
  css_template = """
82
  .cwm-hero {
83
- background: #fffbf0 !important;
84
  border: 1px solid #d8c9ad;
85
  border-radius: 16px;
86
  padding: 32px;
@@ -94,15 +94,15 @@ class RecipeHero(TemplatedHTML):
94
  background: #efe3c8;
95
  }
96
  .cwm-hero h1 {
97
- font-family: 'Lora', serif; font-size: 38px; color: #6b4a2a !important;
98
  margin: 0 0 8px;
99
  }
100
  .cwm-hero .meta {
101
- color: #8a6a3a !important; font-size: 14px; letter-spacing: 0.04em;
102
  text-transform: uppercase; margin-bottom: 18px;
103
  }
104
  .cwm-hero .visual {
105
- font-family: 'Lora', serif; font-style: italic; color: #6b4a2a !important;
106
  font-size: 17px; line-height: 1.55;
107
  }
108
  @media (max-width: 720px) { .cwm-hero { grid-template-columns: 1fr; } }
@@ -115,14 +115,11 @@ class RecipeHero(TemplatedHTML):
115
  servings = state.get("servings") or 0
116
  time = state.get("total_time_minutes") or 0
117
  visual = html.escape(state.get("final_dish_visual") or "")
118
- img_b64 = state.get("final_dish_image_b64") or ""
119
- img_path = state.get("final_dish_image_path") or ""
120
- if img_b64:
121
- img_tag = f'<img src="data:image/png;base64,{img_b64}" alt="final dish"/>'
122
- elif img_path:
123
- img_tag = f'<img src="/file={html.escape(img_path)}" alt="final dish"/>'
124
- else:
125
- img_tag = '<div style="background:#efe3c8;border-radius:12px;height:320px;display:flex;align-items:center;justify-content:center;color:#8a6a3a;font-family:\'Lora\',serif;font-style:italic;">Image will appear here</div>'
126
  return f"""
127
  <div class="cwm-hero">
128
  <div>{img_tag}</div>
@@ -189,15 +186,15 @@ class IngredientChips(TemplatedHTML):
189
  class DishOptions(TemplatedHTML):
190
  css_template = """
191
  .cwm-options { display: grid; grid-template-columns: repeat(3, 1fr); gap: 14px; }
192
- .cwm-options .cwm-option {
193
- background: #fffbf0 !important; border: 1px solid #d8c9ad; border-radius: 12px;
194
  padding: 18px; text-align: left;
195
  }
196
- .cwm-options .cwm-option h3 {
197
- font-family: 'Lora', serif; font-size: 19px; color: #6b4a2a !important;
198
  margin: 0 0 6px;
199
  }
200
- .cwm-options .cwm-option p { color: #7a5a35 !important; font-size: 14px; line-height: 1.45; margin: 0; }
201
  @media (max-width: 720px) { .cwm-options { grid-template-columns: 1fr; } }
202
  """
203
 
@@ -220,32 +217,32 @@ class DishOptions(TemplatedHTML):
220
  class StepCard(TemplatedHTML):
221
  css_template = """
222
  .cwm-steps { display: flex; flex-direction: column; gap: 16px; }
223
- .cwm-steps .cwm-step {
224
  display: grid; grid-template-columns: 220px 1fr; gap: 22px;
225
- background: #fffbf0 !important; border-left: 4px solid #a85c2a; border-radius: 10px;
226
  padding: 18px 22px;
227
  }
228
- .cwm-steps .cwm-step img {
229
  width: 220px; height: 160px; object-fit: cover; border-radius: 8px;
230
  background: #efe3c8;
231
  }
232
- .cwm-steps .cwm-step .placeholder {
233
  width: 220px; height: 160px; border-radius: 8px;
234
  background: linear-gradient(135deg,#efe3c8,#dccaa3);
235
  display:flex; align-items:center; justify-content:center;
236
- color: #8a6a3a !important; font-family: 'Lora', serif; font-size: 14px;
237
  }
238
- .cwm-steps .cwm-step h3 {
239
- font-family: 'Lora', serif; color: #6b4a2a !important; margin: 0 0 6px; font-size: 22px;
240
  }
241
- .cwm-steps .cwm-step p { font-size: 16px; line-height: 1.55; color: #4a3722 !important; margin: 0 0 8px; }
242
- .cwm-steps .cwm-step .duration {
243
- display: inline-block; background: #a85c2a !important; color: #fffbf0 !important;
244
  border-radius: 999px; padding: 3px 10px; font-size: 12px; letter-spacing: 0.04em;
245
  }
246
- .cwm-steps .cwm-step .tip {
247
- margin-top: 10px; padding: 10px 12px; background: #fff3d8 !important;
248
- border-radius: 8px; font-size: 14px; color: #6b4a2a !important;
249
  }
250
  .cwm-step .tip::before { content: "💡 "; }
251
  @media (max-width: 720px) { .cwm-step { grid-template-columns: 1fr; } .cwm-step img, .cwm-step .placeholder { width: 100%; } }
@@ -263,14 +260,11 @@ class StepCard(TemplatedHTML):
263
  dur = html.escape(s.get("duration", ""))
264
  tip = s.get("tip")
265
  visual = html.escape(s.get("visual", ""))
266
- img_b64 = s.get("image_b64") or ""
267
- img_path = s.get("image_path") or ""
268
- if img_b64:
269
- img_block = f'<img src="data:image/png;base64,{img_b64}" alt="step {n}"/>'
270
- elif img_path:
271
- img_block = f'<img src="/file={html.escape(img_path)}" alt="step {n}"/>'
272
- else:
273
- img_block = f'<div class="placeholder">{visual[:80] if visual else f"Step {n}"}</div>'
274
  tip_block = f'<div class="tip">{html.escape(tip)}</div>' if tip else ""
275
  cards.append(f"""
276
  <div class="cwm-step">
@@ -293,22 +287,22 @@ class NutritionGrid(TemplatedHTML):
293
  css_template = """
294
  .cwm-nutri-wrap { margin-top: 10px; }
295
  .cwm-nutri-title {
296
- font-family: 'Lora', serif; color: #6b4a2a !important; font-size: 22px; margin: 0 0 14px;
297
  }
298
  .cwm-nutri {
299
  display: grid; grid-template-columns: repeat(5, 1fr); gap: 12px;
300
  }
301
- .cwm-nutri .cwm-nutri-cell {
302
- background: #fffbf0 !important; border: 1px solid #d8c9ad; border-radius: 10px;
303
  padding: 14px 10px; text-align: center;
304
  }
305
- .cwm-nutri .cwm-nutri-cell .v {
306
- font-family: 'Lora', serif; font-size: 24px; font-weight: 700; color: #6b4a2a !important;
307
  display: block;
308
  }
309
- .cwm-nutri .cwm-nutri-cell .l {
310
  font-size: 11px; letter-spacing: 0.08em; text-transform: uppercase;
311
- color: #8a6a3a !important; margin-top: 4px;
312
  }
313
  @media (max-width: 720px) { .cwm-nutri { grid-template-columns: repeat(2, 1fr); } }
314
  """
@@ -343,7 +337,7 @@ class VerdictBadge(TemplatedHTML):
343
  css_template = """
344
  .cwm-verdict {
345
  display: flex; align-items: center; gap: 18px;
346
- background: #fffbf0 !important; border-radius: 12px; padding: 18px 22px;
347
  border: 1px solid #d8c9ad;
348
  }
349
  .cwm-verdict.go { border-left: 6px solid #4f8b4a; }
@@ -357,8 +351,8 @@ class VerdictBadge(TemplatedHTML):
357
  .cwm-verdict.go .cwm-verdict-pill { background: #4f8b4a; }
358
  .cwm-verdict.wait .cwm-verdict-pill { background: #d4a23c; }
359
  .cwm-verdict.fix .cwm-verdict-pill { background: #b94a3a; }
360
- .cwm-verdict-text { font-size: 16px; color: #4a3722 !important; line-height: 1.5; }
361
- .cwm-verdict-text small { color: #8a6a3a !important; display: block; margin-top: 4px; }
362
  .cwm-verdict-empty {
363
  color: #b39870; font-style: italic; padding: 14px 0;
364
  }
 
80
  class RecipeHero(TemplatedHTML):
81
  css_template = """
82
  .cwm-hero {
83
+ background: #fffbf0;
84
  border: 1px solid #d8c9ad;
85
  border-radius: 16px;
86
  padding: 32px;
 
94
  background: #efe3c8;
95
  }
96
  .cwm-hero h1 {
97
+ font-family: 'Lora', serif; font-size: 38px; color: #6b4a2a;
98
  margin: 0 0 8px;
99
  }
100
  .cwm-hero .meta {
101
+ color: #8a6a3a; font-size: 14px; letter-spacing: 0.04em;
102
  text-transform: uppercase; margin-bottom: 18px;
103
  }
104
  .cwm-hero .visual {
105
+ font-family: 'Lora', serif; font-style: italic; color: #6b4a2a;
106
  font-size: 17px; line-height: 1.55;
107
  }
108
  @media (max-width: 720px) { .cwm-hero { grid-template-columns: 1fr; } }
 
115
  servings = state.get("servings") or 0
116
  time = state.get("total_time_minutes") or 0
117
  visual = html.escape(state.get("final_dish_visual") or "")
118
+ img = state.get("final_dish_image_path") or ""
119
+ img_tag = (
120
+ f'<img src="/file={html.escape(img)}" alt="final dish"/>'
121
+ if img else '<div class="cwm-hero" style="background:#efe3c8;border-radius:12px;height:320px;"></div>'
122
+ )
 
 
 
123
  return f"""
124
  <div class="cwm-hero">
125
  <div>{img_tag}</div>
 
186
  class DishOptions(TemplatedHTML):
187
  css_template = """
188
  .cwm-options { display: grid; grid-template-columns: repeat(3, 1fr); gap: 14px; }
189
+ .cwm-option {
190
+ background: #fffbf0; border: 1px solid #d8c9ad; border-radius: 12px;
191
  padding: 18px; text-align: left;
192
  }
193
+ .cwm-option h3 {
194
+ font-family: 'Lora', serif; font-size: 19px; color: #6b4a2a;
195
  margin: 0 0 6px;
196
  }
197
+ .cwm-option p { color: #7a5a35; font-size: 14px; line-height: 1.45; margin: 0; }
198
  @media (max-width: 720px) { .cwm-options { grid-template-columns: 1fr; } }
199
  """
200
 
 
217
  class StepCard(TemplatedHTML):
218
  css_template = """
219
  .cwm-steps { display: flex; flex-direction: column; gap: 16px; }
220
+ .cwm-step {
221
  display: grid; grid-template-columns: 220px 1fr; gap: 22px;
222
+ background: #fffbf0; border-left: 4px solid #a85c2a; border-radius: 10px;
223
  padding: 18px 22px;
224
  }
225
+ .cwm-step img {
226
  width: 220px; height: 160px; object-fit: cover; border-radius: 8px;
227
  background: #efe3c8;
228
  }
229
+ .cwm-step .placeholder {
230
  width: 220px; height: 160px; border-radius: 8px;
231
  background: linear-gradient(135deg,#efe3c8,#dccaa3);
232
  display:flex; align-items:center; justify-content:center;
233
+ color: #8a6a3a; font-family: 'Lora', serif; font-size: 14px;
234
  }
235
+ .cwm-step h3 {
236
+ font-family: 'Lora', serif; color: #6b4a2a; margin: 0 0 6px; font-size: 22px;
237
  }
238
+ .cwm-step p { font-size: 16px; line-height: 1.55; color: #4a3722; margin: 0 0 8px; }
239
+ .cwm-step .duration {
240
+ display: inline-block; background: #a85c2a; color: #fffbf0;
241
  border-radius: 999px; padding: 3px 10px; font-size: 12px; letter-spacing: 0.04em;
242
  }
243
+ .cwm-step .tip {
244
+ margin-top: 10px; padding: 10px 12px; background: #fff3d8;
245
+ border-radius: 8px; font-size: 14px; color: #6b4a2a;
246
  }
247
  .cwm-step .tip::before { content: "💡 "; }
248
  @media (max-width: 720px) { .cwm-step { grid-template-columns: 1fr; } .cwm-step img, .cwm-step .placeholder { width: 100%; } }
 
260
  dur = html.escape(s.get("duration", ""))
261
  tip = s.get("tip")
262
  visual = html.escape(s.get("visual", ""))
263
+ img = s.get("image_path")
264
+ img_block = (
265
+ f'<img src="/file={html.escape(img)}" alt="step {n}"/>'
266
+ if img else f'<div class="placeholder">{visual[:80]}</div>'
267
+ )
 
 
 
268
  tip_block = f'<div class="tip">{html.escape(tip)}</div>' if tip else ""
269
  cards.append(f"""
270
  <div class="cwm-step">
 
287
  css_template = """
288
  .cwm-nutri-wrap { margin-top: 10px; }
289
  .cwm-nutri-title {
290
+ font-family: 'Lora', serif; color: #6b4a2a; font-size: 22px; margin: 0 0 14px;
291
  }
292
  .cwm-nutri {
293
  display: grid; grid-template-columns: repeat(5, 1fr); gap: 12px;
294
  }
295
+ .cwm-nutri-cell {
296
+ background: #fffbf0; border: 1px solid #d8c9ad; border-radius: 10px;
297
  padding: 14px 10px; text-align: center;
298
  }
299
+ .cwm-nutri-cell .v {
300
+ font-family: 'Lora', serif; font-size: 24px; font-weight: 700; color: #6b4a2a;
301
  display: block;
302
  }
303
+ .cwm-nutri-cell .l {
304
  font-size: 11px; letter-spacing: 0.08em; text-transform: uppercase;
305
+ color: #8a6a3a; margin-top: 4px;
306
  }
307
  @media (max-width: 720px) { .cwm-nutri { grid-template-columns: repeat(2, 1fr); } }
308
  """
 
337
  css_template = """
338
  .cwm-verdict {
339
  display: flex; align-items: center; gap: 18px;
340
+ background: #fffbf0; border-radius: 12px; padding: 18px 22px;
341
  border: 1px solid #d8c9ad;
342
  }
343
  .cwm-verdict.go { border-left: 6px solid #4f8b4a; }
 
351
  .cwm-verdict.go .cwm-verdict-pill { background: #4f8b4a; }
352
  .cwm-verdict.wait .cwm-verdict-pill { background: #d4a23c; }
353
  .cwm-verdict.fix .cwm-verdict-pill { background: #b94a3a; }
354
+ .cwm-verdict-text { font-size: 16px; color: #4a3722; line-height: 1.5; }
355
+ .cwm-verdict-text small { color: #8a6a3a; display: block; margin-top: 4px; }
356
  .cwm-verdict-empty {
357
  color: #b39870; font-style: italic; padding: 14px 0;
358
  }
src/ui/components.pyi CHANGED
@@ -63,14 +63,11 @@ class RecipeHero(TemplatedHTML):
63
  servings = state.get("servings") or 0
64
  time = state.get("total_time_minutes") or 0
65
  visual = html.escape(state.get("final_dish_visual") or "")
66
- img_b64 = state.get("final_dish_image_b64") or ""
67
- img_path = state.get("final_dish_image_path") or ""
68
- if img_b64:
69
- img_tag = f'<img src="data:image/png;base64,{img_b64}" alt="final dish"/>'
70
- elif img_path:
71
- img_tag = f'<img src="/file={html.escape(img_path)}" alt="final dish"/>'
72
- else:
73
- img_tag = '<div style="background:#efe3c8;border-radius:12px;height:320px;display:flex;align-items:center;justify-content:center;color:#8a6a3a;font-family:\'Lora\',serif;font-style:italic;">Image will appear here</div>'
74
  return f"""
75
  <div class="cwm-hero">
76
  <div>{img_tag}</div>
 
63
  servings = state.get("servings") or 0
64
  time = state.get("total_time_minutes") or 0
65
  visual = html.escape(state.get("final_dish_visual") or "")
66
+ img = state.get("final_dish_image_path") or ""
67
+ img_tag = (
68
+ f'<img src="/file={html.escape(img)}" alt="final dish"/>'
69
+ if img else '<div class="cwm-hero" style="background:#efe3c8;border-radius:12px;height:320px;"></div>'
70
+ )
 
 
 
71
  return f"""
72
  <div class="cwm-hero">
73
  <div>{img_tag}</div>
src/ui/theme.py CHANGED
@@ -13,64 +13,10 @@ theme = gr.themes.Soft(
13
 
14
  CSS = """
15
  @import url('https://fonts.googleapis.com/css2?family=Lora:wght@400;700&display=swap');
16
-
17
- /* ---------------------------------------------------------------------------
18
- Force a warm light palette regardless of the browser/system dark mode.
19
- We pin the parchment background, so we must also pin DARK text colours via
20
- Gradio's CSS variables — otherwise dark-mode users get white text on the
21
- light background and it disappears.
22
- --------------------------------------------------------------------------- */
23
- .gradio-container, .gradio-container.dark {
24
- background: #f5ecd9 !important;
25
- color-scheme: light !important;
26
-
27
- --body-text-color: #4a3722;
28
- --body-text-color-subdued: #7a5a35;
29
- --block-title-text-color: #6b4a2a;
30
- --block-label-text-color: #6b4a2a;
31
- --block-info-text-color: #7a5a35;
32
- --block-background-fill: #fffbf0;
33
- --input-background-fill: #fffbf0;
34
- --border-color-primary: #d8c9ad;
35
- --color-accent-soft: #fbe2d2;
36
- }
37
-
38
- /* Blanket dark text for native Gradio text elements (covers dark mode) */
39
- .gradio-container,
40
- .gradio-container .prose,
41
- .gradio-container label,
42
- .gradio-container .gr-text,
43
- .gradio-container span,
44
- .gradio-container p,
45
- .gradio-container .gr-check-radio label,
46
- .gradio-container .wrap,
47
- .gradio-container .gr-form,
48
- .gradio-container .tab-nav button,
49
- .gradio-container .gr-accordion,
50
- .gradio-container input,
51
- .gradio-container textarea {
52
- color: #4a3722 !important;
53
- }
54
-
55
  .gradio-container .prose h1,
56
  .gradio-container .prose h2,
57
- .gradio-container .prose h3 { font-family: 'Lora', serif !important; color: #6b4a2a !important; }
58
-
59
- /* Tabs: dark labels, terracotta active */
60
- .gradio-container .tab-nav button { color: #6b4a2a !important; }
61
- .gradio-container .tab-nav button.selected {
62
- color: #a85c2a !important; border-bottom-color: #a85c2a !important;
63
- }
64
-
65
- /* Native blocks (inputs, radio, checkbox, number) on warm cards */
66
- .gradio-container .block,
67
- .gradio-container .form,
68
- .gradio-container input[type="text"],
69
- .gradio-container input[type="number"] {
70
- background: #fffbf0 !important;
71
- border-color: #d8c9ad !important;
72
- }
73
-
74
  /* Generic container shared by every HTMLComponent */
75
  .cwm-card {
76
  border: 1px solid #d8c9ad;
@@ -80,7 +26,6 @@ CSS = """
80
  }
81
  button.primary, .gr-button-primary {
82
  background: #a85c2a !important;
83
- color: #fffbf0 !important;
84
  font-weight: 600 !important;
85
  font-size: 16px !important;
86
  padding: 12px 22px !important;
 
13
 
14
  CSS = """
15
  @import url('https://fonts.googleapis.com/css2?family=Lora:wght@400;700&display=swap');
16
+ .gradio-container { background: #f5ecd9 !important; }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  .gradio-container .prose h1,
18
  .gradio-container .prose h2,
19
+ .gradio-container .prose h3 { font-family: 'Lora', serif !important; color: #6b4a2a; }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  /* Generic container shared by every HTMLComponent */
21
  .cwm-card {
22
  border: 1px solid #d8c9ad;
 
26
  }
27
  button.primary, .gr-button-primary {
28
  background: #a85c2a !important;
 
29
  font-weight: 600 !important;
30
  font-size: 16px !important;
31
  padding: 12px 22px !important;