Spaces:
Running on Zero
Running on Zero
strategy
#4
by Fred1e4 - opened
- .gitignore +0 -24
- README.md +4 -71
- Strategy/arquitectura.html +0 -668
- Strategy/estrategia.md +0 -496
- Strategy/plan.md +0 -245
- Strategy/plan_implementacion.md +0 -674
- app.py +84 -212
- modal_app/__init__.py +0 -0
- modal_app/flux_endpoint.py +0 -124
- modal_app/planner_endpoint.py +0 -117
- modal_app/serve_app.py +0 -102
- packages.txt +0 -2
- requirements.txt +5 -5
- scripts/build_recipe_dataset.py +0 -281
- scripts/diag_planner.py +0 -73
- scripts/train_planner.py +0 -172
- src/agents/progress_validator.py +0 -84
- src/agents/recipe_planner.py +0 -167
- src/agents/step_illustrator.py +0 -81
- src/config.py +3 -14
- src/data/__init__.py +0 -0
- src/data/nutrition.py +0 -112
- src/models/planner.py +0 -103
- src/pipeline.py +0 -32
- src/prompts/planner_propose.txt +0 -11
- src/prompts/planner_recipe.txt +0 -11
- src/prompts/validator_prompt.txt +0 -14
- src/ui/components.py +42 -48
- src/ui/components.pyi +5 -8
- src/ui/theme.py +2 -57
.gitignore
DELETED
|
@@ -1,24 +0,0 @@
|
|
| 1 |
-
# Python
|
| 2 |
-
__pycache__/
|
| 3 |
-
*.py[cod]
|
| 4 |
-
*.egg-info/
|
| 5 |
-
.venv/
|
| 6 |
-
venv/
|
| 7 |
-
|
| 8 |
-
# Generated data (SFT dataset lives on HF Hub: eldinosaur/cook-with-me-recipes-sft)
|
| 9 |
-
data/*.parquet
|
| 10 |
-
data/*.jsonl
|
| 11 |
-
data/*.png
|
| 12 |
-
data/*.npy
|
| 13 |
-
data/*.csv
|
| 14 |
-
|
| 15 |
-
# Local caches / model weights
|
| 16 |
-
*.gguf
|
| 17 |
-
.cache/
|
| 18 |
-
assets/*.png
|
| 19 |
-
|
| 20 |
-
# OS / editor
|
| 21 |
-
.DS_Store
|
| 22 |
-
Thumbs.db
|
| 23 |
-
.idea/
|
| 24 |
-
.vscode/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
README.md
CHANGED
|
@@ -1,80 +1,13 @@
|
|
| 1 |
---
|
| 2 |
title: Cook With A LLM
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
sdk: gradio
|
| 7 |
sdk_version: 6.15.2
|
| 8 |
python_version: '3.12'
|
| 9 |
app_file: app.py
|
| 10 |
pinned: false
|
| 11 |
-
license: apache-2.0
|
| 12 |
---
|
| 13 |
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
> *Snap your fridge. Pick a dish. Cook step by step. Check your progress with a photo.*
|
| 17 |
-
|
| 18 |
-
A closed-loop multimodal cooking assistant built for the **Hugging Face Small Models / Big Adventures Hackathon (June 2026)**.
|
| 19 |
-
|
| 20 |
-
---
|
| 21 |
-
|
| 22 |
-
## How it works
|
| 23 |
-
|
| 24 |
-
```
|
| 25 |
-
📸 Fridge photo ──▶ [Vision Agent] identify ingredients
|
| 26 |
-
│
|
| 27 |
-
▼
|
| 28 |
-
[Recipe Planner] propose 3 dishes → full recipe JSON
|
| 29 |
-
│
|
| 30 |
-
▼
|
| 31 |
-
[Nutrition Engine] per-serving macros (lookup, no hallucination)
|
| 32 |
-
│
|
| 33 |
-
▼
|
| 34 |
-
📸 Progress photo ──▶ [Progress Validator] go / wait / fix verdict
|
| 35 |
-
```
|
| 36 |
-
|
| 37 |
-
1. **Snap** your fridge or pantry — the fine-tuned vision model identifies every ingredient.
|
| 38 |
-
2. **Pick** one of three AI-suggested dishes tailored to what you have.
|
| 39 |
-
3. **Cook** step by step with a generated recipe and per-serving nutrition info.
|
| 40 |
-
4. **Check** your progress by uploading a photo of your pan — the model tells you *go*, *wait*, or *fix*.
|
| 41 |
-
|
| 42 |
-
---
|
| 43 |
-
|
| 44 |
-
## Models
|
| 45 |
-
|
| 46 |
-
| Role | Model | Params | Runtime |
|
| 47 |
-
|---|---|---|---|
|
| 48 |
-
| Vision + Planner + Validator | `openbmb/MiniCPM-V-4.6` (fine-tuned) | ~4.6B | `transformers` / ZeroGPU |
|
| 49 |
-
|
| 50 |
-
**Total: ~4.6B parameters** (≤ 32B cap ✓ — significant headroom)
|
| 51 |
-
|
| 52 |
-
The ingredient-identification model is **fine-tuned** on fridge/pantry photos for higher precision.
|
| 53 |
-
|
| 54 |
-
---
|
| 55 |
-
|
| 56 |
-
## Badges targeted
|
| 57 |
-
|
| 58 |
-
| Badge | Status | How |
|
| 59 |
-
|---|---|---|
|
| 60 |
-
| 🎯 Well-Tuned | ✓ | Fine-tuned MiniCPM-V-4.6 for ingredient detection, published to Hub |
|
| 61 |
-
| 🎨 Off-Brand | ✓ | Recipe-card UI with custom CSS — Lora serif, warm parchment palette |
|
| 62 |
-
| 📡 Sharing is Caring | ✓ | Agent traces shared on Hub |
|
| 63 |
-
| 📓 Field Notes | ✓ | Blog post: "Building a closed-loop visual cooking coach" |
|
| 64 |
-
|
| 65 |
-
---
|
| 66 |
-
|
| 67 |
-
## Architecture highlights
|
| 68 |
-
|
| 69 |
-
- **Single model, three roles:** MiniCPM-V-4.6 handles vision (ingredients + progress) *and* text planning (recipe JSON generation) — no redundant model downloads.
|
| 70 |
-
- **Closed-loop visual validation:** Flux generates step targets → user cooks → vision model compares — a real agent loop, not a wrapper.
|
| 71 |
-
- **Hallucination-free nutrition:** macros come from a lookup table, not LLM arithmetic.
|
| 72 |
-
- **Robust JSON extraction:** multi-strategy parser handles markdown fences, single quotes, and trailing commas so generation failures degrade gracefully.
|
| 73 |
-
|
| 74 |
-
---
|
| 75 |
-
|
| 76 |
-
## Track
|
| 77 |
-
|
| 78 |
-
**Chapter One — Backyard AI** · "Build something for someone you actually know."
|
| 79 |
-
|
| 80 |
-
Submission for the Hugging Face Hackathon · June 5–15, 2026.
|
|
|
|
| 1 |
---
|
| 2 |
title: Cook With A LLM
|
| 3 |
+
emoji: 🐠
|
| 4 |
+
colorFrom: pink
|
| 5 |
+
colorTo: pink
|
| 6 |
sdk: gradio
|
| 7 |
sdk_version: 6.15.2
|
| 8 |
python_version: '3.12'
|
| 9 |
app_file: app.py
|
| 10 |
pinned: false
|
|
|
|
| 11 |
---
|
| 12 |
|
| 13 |
+
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Strategy/arquitectura.html
DELETED
|
@@ -1,668 +0,0 @@
|
|
| 1 |
-
<!DOCTYPE html>
|
| 2 |
-
<html lang="es">
|
| 3 |
-
<head>
|
| 4 |
-
<meta charset="UTF-8" />
|
| 5 |
-
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
| 6 |
-
<title>Cocina Conmigo — Plan visual del proyecto</title>
|
| 7 |
-
<style>
|
| 8 |
-
:root {
|
| 9 |
-
--bg: #f5ecd9;
|
| 10 |
-
--card: #fffbf0;
|
| 11 |
-
--ink: #2b2018;
|
| 12 |
-
--accent: #a85c2a; /* terracotta */
|
| 13 |
-
--accent-soft: #f6dccc;
|
| 14 |
-
--accent2: #6b4a2a;
|
| 15 |
-
--gold: #c9962b;
|
| 16 |
-
--green: #3f7a3a;
|
| 17 |
-
--green-soft: #dbe9d8;
|
| 18 |
-
--red: #b03a2e;
|
| 19 |
-
--red-soft: #f4d6d2;
|
| 20 |
-
--gray: #8a7e6f;
|
| 21 |
-
--line: #d8c9ad;
|
| 22 |
-
}
|
| 23 |
-
* { box-sizing: border-box; }
|
| 24 |
-
body {
|
| 25 |
-
font-family: 'Inter', -apple-system, sans-serif;
|
| 26 |
-
background: var(--bg);
|
| 27 |
-
color: var(--ink);
|
| 28 |
-
margin: 0;
|
| 29 |
-
padding: 32px 16px 80px;
|
| 30 |
-
line-height: 1.55;
|
| 31 |
-
}
|
| 32 |
-
.wrap { max-width: 1240px; margin: 0 auto; }
|
| 33 |
-
|
| 34 |
-
h1 { font-family: 'Lora', Georgia, serif; font-size: 46px; margin: 0 0 4px;
|
| 35 |
-
letter-spacing: -0.5px; font-weight: 700; }
|
| 36 |
-
h1 em { color: var(--accent); font-style: italic; }
|
| 37 |
-
.subtitle { color: var(--accent2); font-style: italic; margin-bottom: 28px; font-size: 17px; }
|
| 38 |
-
|
| 39 |
-
h2 {
|
| 40 |
-
margin-top: 56px; border-top: 1px dashed var(--line); padding-top: 24px;
|
| 41 |
-
font-size: 26px; font-family: 'Lora', Georgia, serif; letter-spacing: 0.3px;
|
| 42 |
-
}
|
| 43 |
-
h2 .num {
|
| 44 |
-
color: var(--accent); font-family: ui-monospace, monospace;
|
| 45 |
-
font-size: 20px; margin-right: 10px;
|
| 46 |
-
}
|
| 47 |
-
h3 { font-size: 18px; margin-top: 28px; color: var(--accent2); font-family: 'Lora', Georgia, serif; }
|
| 48 |
-
|
| 49 |
-
/* Hero */
|
| 50 |
-
.hero {
|
| 51 |
-
background: var(--card); border: 2px solid var(--ink); border-radius: 14px;
|
| 52 |
-
padding: 30px 32px; display: grid; grid-template-columns: 1fr; gap: 18px;
|
| 53 |
-
}
|
| 54 |
-
@media(min-width: 760px){ .hero { grid-template-columns: 2fr 1fr; align-items: center; } }
|
| 55 |
-
.hero h2 { border:0; margin:0 0 6px; padding:0; font-size: 22px; }
|
| 56 |
-
.hero .quote {
|
| 57 |
-
font-style: italic; font-size: 17px; color: var(--accent2);
|
| 58 |
-
border-left: 3px solid var(--accent); padding-left: 14px; margin: 6px 0 0;
|
| 59 |
-
}
|
| 60 |
-
.hero .target {
|
| 61 |
-
background: #fff3cf; border-radius: 12px; padding: 14px 16px;
|
| 62 |
-
font-size: 13px; border: 1px solid var(--line); line-height: 1.55;
|
| 63 |
-
}
|
| 64 |
-
.hero .target strong { color: var(--accent); }
|
| 65 |
-
|
| 66 |
-
/* Pills */
|
| 67 |
-
.pill {
|
| 68 |
-
display: inline-block; padding: 2px 9px; border-radius: 12px;
|
| 69 |
-
color: white; font-size: 12px; margin: 2px 4px 2px 0; font-family: ui-monospace, monospace;
|
| 70 |
-
}
|
| 71 |
-
.pill.user { background: var(--gray); }
|
| 72 |
-
.pill.gradio { background: var(--accent); }
|
| 73 |
-
.pill.hf { background: var(--gold); }
|
| 74 |
-
.pill.modal { background: var(--green); }
|
| 75 |
-
.pill.flux { background: #111; }
|
| 76 |
-
.pill.openbmb { background: #075e54; }
|
| 77 |
-
.pill.cohere { background: #5e3aa3; }
|
| 78 |
-
.pill.openai { background: #2c5e8a; }
|
| 79 |
-
.pill.llama { background: #6a3d8a; }
|
| 80 |
-
|
| 81 |
-
/* Phone/recipe card mockup */
|
| 82 |
-
.phone-row {
|
| 83 |
-
display: grid; grid-template-columns: 1fr; gap: 18px; margin-top: 16px;
|
| 84 |
-
}
|
| 85 |
-
@media(min-width: 760px){ .phone-row { grid-template-columns: repeat(4, 1fr); } }
|
| 86 |
-
.phone {
|
| 87 |
-
background: #111; border-radius: 24px; padding: 8px;
|
| 88 |
-
box-shadow: 0 8px 22px rgba(0,0,0,0.18);
|
| 89 |
-
}
|
| 90 |
-
.phone .screen {
|
| 91 |
-
background: #fffbf0; border-radius: 18px; overflow: hidden;
|
| 92 |
-
height: 380px; display: flex; flex-direction: column;
|
| 93 |
-
}
|
| 94 |
-
.phone .topbar {
|
| 95 |
-
background: var(--accent); color: white; padding: 10px 14px;
|
| 96 |
-
font-size: 13px; font-family: 'Lora', serif;
|
| 97 |
-
}
|
| 98 |
-
.phone .body { padding: 12px; flex: 1; overflow-y: auto; font-size: 12px; }
|
| 99 |
-
.phone .body .illu {
|
| 100 |
-
width: 100%; aspect-ratio: 4/3; border-radius: 8px;
|
| 101 |
-
background: linear-gradient(135deg, #ffd28b 0%, #c97a3e 100%);
|
| 102 |
-
display: flex; align-items: center; justify-content: center;
|
| 103 |
-
font-size: 48px; box-shadow: 0 2px 8px rgba(0,0,0,0.1); margin-bottom: 8px;
|
| 104 |
-
}
|
| 105 |
-
.phone .body p { margin: 6px 0; line-height: 1.5; }
|
| 106 |
-
.phone .body .voice {
|
| 107 |
-
background: var(--green-soft); border-radius: 6px; padding: 6px 10px;
|
| 108 |
-
margin-top: 8px; font-size: 11px; color: var(--green);
|
| 109 |
-
}
|
| 110 |
-
.phone .body .tip {
|
| 111 |
-
background: var(--red-soft); border-radius: 6px; padding: 6px 10px;
|
| 112 |
-
margin-top: 6px; font-size: 11px; color: var(--red);
|
| 113 |
-
}
|
| 114 |
-
.scenario-label {
|
| 115 |
-
text-align: center; font-size: 13px; color: var(--accent2);
|
| 116 |
-
margin-top: 8px; font-style: italic;
|
| 117 |
-
}
|
| 118 |
-
|
| 119 |
-
/* SVG */
|
| 120 |
-
svg { width: 100%; height: auto; display: block; }
|
| 121 |
-
.node-box { fill: var(--card); stroke: var(--ink); stroke-width: 1.5; }
|
| 122 |
-
.node-text { font-family: 'Inter', sans-serif; font-size: 14px; fill: var(--ink); }
|
| 123 |
-
.node-title { font-weight: 700; font-size: 15px; }
|
| 124 |
-
.node-sub { font-size: 11px; fill: var(--accent2); font-style: italic; }
|
| 125 |
-
.arrow { stroke: var(--ink); stroke-width: 1.8; fill: none; }
|
| 126 |
-
.arrow-label { font-size: 11px; fill: var(--accent2); font-family: ui-monospace, monospace; }
|
| 127 |
-
.dashed { stroke-dasharray: 6 4; }
|
| 128 |
-
.arrow-loop { stroke: var(--accent); stroke-width: 2.2; fill: none; }
|
| 129 |
-
|
| 130 |
-
/* Cards */
|
| 131 |
-
.grid-2 { display: grid; grid-template-columns: 1fr; gap: 18px; margin-top: 16px; }
|
| 132 |
-
@media(min-width: 880px){ .grid-2 { grid-template-columns: 1fr 1fr; } }
|
| 133 |
-
.grid-3 { display: grid; grid-template-columns: 1fr; gap: 14px; margin-top: 14px; }
|
| 134 |
-
@media(min-width: 760px){ .grid-3 { grid-template-columns: repeat(3, 1fr); } }
|
| 135 |
-
|
| 136 |
-
.card {
|
| 137 |
-
background: var(--card); border: 1px solid var(--line);
|
| 138 |
-
border-radius: 10px; padding: 18px 20px;
|
| 139 |
-
}
|
| 140 |
-
.card.pick { border: 2px solid var(--accent); }
|
| 141 |
-
.pick-tag {
|
| 142 |
-
display: inline-block; background: var(--accent); color: white;
|
| 143 |
-
font-family: ui-monospace, monospace; font-size: 11px;
|
| 144 |
-
padding: 1px 7px; border-radius: 10px; margin-bottom: 6px;
|
| 145 |
-
}
|
| 146 |
-
|
| 147 |
-
table {
|
| 148 |
-
width: 100%; border-collapse: collapse; background: var(--card);
|
| 149 |
-
border: 1px solid var(--line); margin-top: 14px; font-size: 14px;
|
| 150 |
-
}
|
| 151 |
-
th, td { padding: 8px 10px; text-align: left; border-bottom: 1px solid var(--line); vertical-align: top; }
|
| 152 |
-
th { background: #efe4cb; font-size: 13px; letter-spacing: 0.5px; text-transform: uppercase; }
|
| 153 |
-
code {
|
| 154 |
-
background: #efe4cb; border-radius: 3px; padding: 1px 5px; font-size: 13px;
|
| 155 |
-
}
|
| 156 |
-
|
| 157 |
-
/* Forbidden zone */
|
| 158 |
-
.forbidden {
|
| 159 |
-
background: var(--red-soft); border: 1px solid var(--red);
|
| 160 |
-
border-radius: 8px; padding: 14px 18px; margin-top: 14px;
|
| 161 |
-
}
|
| 162 |
-
.forbidden strong { color: var(--red); }
|
| 163 |
-
.forbidden ul {
|
| 164 |
-
columns: 2; column-gap: 28px; margin: 8px 0 0; padding-left: 18px; font-size: 14px;
|
| 165 |
-
}
|
| 166 |
-
|
| 167 |
-
/* Timeline */
|
| 168 |
-
.timeline { position: relative; padding-left: 36px; margin-top: 20px; }
|
| 169 |
-
.timeline::before {
|
| 170 |
-
content: ""; position: absolute; left: 12px; top: 6px; bottom: 6px;
|
| 171 |
-
width: 3px; background: var(--accent); border-radius: 2px;
|
| 172 |
-
}
|
| 173 |
-
.day {
|
| 174 |
-
position: relative; margin-bottom: 14px; background: var(--card);
|
| 175 |
-
border: 1px solid var(--line); border-radius: 8px; padding: 12px 16px;
|
| 176 |
-
}
|
| 177 |
-
.day::before {
|
| 178 |
-
content: ""; position: absolute; left: -29px; top: 16px;
|
| 179 |
-
width: 13px; height: 13px; background: var(--accent);
|
| 180 |
-
border: 2px solid var(--card); border-radius: 50%;
|
| 181 |
-
}
|
| 182 |
-
.day .lbl {
|
| 183 |
-
display: inline-block; background: var(--accent); color: white;
|
| 184 |
-
font-family: ui-monospace, monospace; font-size: 11px;
|
| 185 |
-
padding: 1px 7px; border-radius: 10px; margin-right: 8px;
|
| 186 |
-
}
|
| 187 |
-
.day strong { font-size: 15px; }
|
| 188 |
-
.day .what { font-size: 13px; color: var(--accent2); margin-top: 2px; }
|
| 189 |
-
|
| 190 |
-
/* Award rows */
|
| 191 |
-
.award-row {
|
| 192 |
-
display: flex; justify-content: space-between;
|
| 193 |
-
padding: 8px 12px; border-bottom: 1px solid var(--line); font-size: 14px;
|
| 194 |
-
}
|
| 195 |
-
.award-row:last-child { border-bottom: 0; }
|
| 196 |
-
.prob {
|
| 197 |
-
font-family: ui-monospace, monospace; font-size: 12px;
|
| 198 |
-
padding: 1px 8px; border-radius: 10px; color: white;
|
| 199 |
-
}
|
| 200 |
-
.prob-h { background: #2e7d32; }
|
| 201 |
-
.prob-m { background: #ef9c2c; }
|
| 202 |
-
.prob-l { background: #b03a2e; }
|
| 203 |
-
|
| 204 |
-
/* Badges grid */
|
| 205 |
-
.badges-grid {
|
| 206 |
-
display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr));
|
| 207 |
-
gap: 12px; margin-top: 14px;
|
| 208 |
-
}
|
| 209 |
-
.badge-card {
|
| 210 |
-
background: var(--card); border: 1px solid var(--line);
|
| 211 |
-
border-radius: 8px; padding: 12px 14px;
|
| 212 |
-
}
|
| 213 |
-
.badge-card.skip { opacity: 0.45; border-style: dashed; }
|
| 214 |
-
.badge-card .tag {
|
| 215 |
-
display: inline-block; background: var(--accent); color: white;
|
| 216 |
-
font-family: ui-monospace, monospace; font-size: 11px;
|
| 217 |
-
padding: 1px 7px; border-radius: 10px; margin-bottom: 6px;
|
| 218 |
-
}
|
| 219 |
-
.badge-card.skip .tag { background: var(--gray); }
|
| 220 |
-
.badge-card strong { font-size: 14px; }
|
| 221 |
-
.badge-card p { font-size: 13px; color: var(--accent2); margin: 4px 0 0; }
|
| 222 |
-
|
| 223 |
-
.footnote {
|
| 224 |
-
margin-top: 30px; padding: 14px 18px;
|
| 225 |
-
border-left: 4px solid var(--accent);
|
| 226 |
-
background: var(--card); font-size: 14px; border-radius: 4px;
|
| 227 |
-
}
|
| 228 |
-
</style>
|
| 229 |
-
<link rel="preconnect" href="https://fonts.googleapis.com">
|
| 230 |
-
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
|
| 231 |
-
<link href="https://fonts.googleapis.com/css2?family=Lora:wght@400;600;700&family=Inter:wght@400;500;600;700&display=swap" rel="stylesheet">
|
| 232 |
-
</head>
|
| 233 |
-
<body>
|
| 234 |
-
<div class="wrap">
|
| 235 |
-
|
| 236 |
-
<h1><em>Cocina Conmigo</em></h1>
|
| 237 |
-
<div class="subtitle">Sous-chef multimodal con visión, voz y Flux.2 — para cocinar con tu mamá sin tener las manos libres</div>
|
| 238 |
-
|
| 239 |
-
<div class="hero">
|
| 240 |
-
<div>
|
| 241 |
-
<h2>La idea en una frase</h2>
|
| 242 |
-
<p>Tu mamá toma foto del refri, la app le propone qué cocinar, le <strong>muestra cómo se debe ver cada paso</strong> con Flux.2, y la <strong>narra por voz</strong> mientras ella cocina con las manos llenas.</p>
|
| 243 |
-
<p class="quote">"Mi mamá me pidió que le enseñara a hacer ramen. Le construí un sous-chef que vive en su tablet."</p>
|
| 244 |
-
<div style="margin-top: 14px;">
|
| 245 |
-
<span class="pill flux">Flux.2 Klein 9B</span>
|
| 246 |
-
<span class="pill openbmb">MiniCPM-V + voice</span>
|
| 247 |
-
<span class="pill cohere">Cohere voice</span>
|
| 248 |
-
<span class="pill gradio">Gradio Workflows</span>
|
| 249 |
-
<span class="pill modal">Modal-powered</span>
|
| 250 |
-
<span class="pill llama">llama.cpp</span>
|
| 251 |
-
</div>
|
| 252 |
-
</div>
|
| 253 |
-
<div class="target">
|
| 254 |
-
<strong>Track:</strong> Backyard AI<br/>
|
| 255 |
-
<strong>Persona:</strong> tu mamá / pareja / vecino<br/>
|
| 256 |
-
<strong>Idioma:</strong> español-mexicano<br/>
|
| 257 |
-
<strong>Total params:</strong> ~17B (≤ 32B ✓)<br/>
|
| 258 |
-
<strong>Cocina:</strong> mexicana tradicional<br/>
|
| 259 |
-
<strong>Storyline:</strong> "Para que mi mamá deje de googlear"
|
| 260 |
-
</div>
|
| 261 |
-
</div>
|
| 262 |
-
|
| 263 |
-
|
| 264 |
-
<h2><span class="num">01</span>Por qué esta idea, y no las anteriores</h2>
|
| 265 |
-
<table>
|
| 266 |
-
<thead><tr><th>Iteración</th><th>Idea</th><th>Por qué se descartó</th></tr></thead>
|
| 267 |
-
<tbody>
|
| 268 |
-
<tr><td>v1</td><td>Abuelita (parent phone helper)</td><td>En la lista pre-cocinada de OpenBMB → 5-15 equipos lo harán</td></tr>
|
| 269 |
-
<tr><td>v2</td><td>Cuentacuentos (voice storyteller)</td><td>También en la lista pre-cocinada de OpenBMB</td></tr>
|
| 270 |
-
<tr style="background:#fff3cf;"><td><strong>v3 (ésta)</strong></td><td><strong>Cocina Conmigo</strong></td><td>Refinamiento de tu idea #1 · NO está en ninguna lista pre-cocinada · usa Flux.2 + Workflows + voces · diaria + universal</td></tr>
|
| 271 |
-
</tbody>
|
| 272 |
-
</table>
|
| 273 |
-
|
| 274 |
-
<div class="forbidden">
|
| 275 |
-
<strong>⛔ Las 12 ideas en zona prohibida (clúster OpenBMB):</strong>
|
| 276 |
-
<ul>
|
| 277 |
-
<li>parent phone helper</li>
|
| 278 |
-
<li>receipt / bill explainer</li>
|
| 279 |
-
<li>shop menu / repair manual</li>
|
| 280 |
-
<li>offline personal assistant / voice companion</li>
|
| 281 |
-
<li>voice storyteller</li>
|
| 282 |
-
<li>visual mystery box</li>
|
| 283 |
-
<li>AI museum (≈ tu idea #4)</li>
|
| 284 |
-
<li>doodle creature</li>
|
| 285 |
-
<li>dream postcard gen</li>
|
| 286 |
-
<li>omni-modal adventure</li>
|
| 287 |
-
<li>tiny local NPC / character agent</li>
|
| 288 |
-
<li>cortes de cabello (tu idea #3, ya saturada)</li>
|
| 289 |
-
</ul>
|
| 290 |
-
</div>
|
| 291 |
-
|
| 292 |
-
|
| 293 |
-
<h2><span class="num">02</span>Las 4 historias del demo</h2>
|
| 294 |
-
<div class="phone-row">
|
| 295 |
-
|
| 296 |
-
<div>
|
| 297 |
-
<div class="phone"><div class="screen">
|
| 298 |
-
<div class="topbar">📸 Tengo esto en el refri</div>
|
| 299 |
-
<div class="body">
|
| 300 |
-
<div class="illu">🍅🌶🐔🧅</div>
|
| 301 |
-
<p><strong>Veo:</strong> pollo, jitomate, cebolla, cilantro, tortillas, queso.</p>
|
| 302 |
-
<p style="background:#fff3cf;border-radius:6px;padding:6px 10px;">
|
| 303 |
-
<strong>3 opciones:</strong><br/>
|
| 304 |
-
🌮 Tinga · 🌯 Enchiladas · 🧀 Quesadillas
|
| 305 |
-
</p>
|
| 306 |
-
<div class="voice">🔊 "¿Qué traes ganas?"</div>
|
| 307 |
-
</div>
|
| 308 |
-
</div></div>
|
| 309 |
-
<div class="scenario-label">1. Visión + Planner</div>
|
| 310 |
-
</div>
|
| 311 |
-
|
| 312 |
-
<div>
|
| 313 |
-
<div class="phone"><div class="screen">
|
| 314 |
-
<div class="topbar">👩🍳 Paso 2 de 5</div>
|
| 315 |
-
<div class="body">
|
| 316 |
-
<div class="illu">🍳✨</div>
|
| 317 |
-
<p><strong>Acitrona la cebolla en aceite caliente.</strong></p>
|
| 318 |
-
<p style="font-size:11px;color:var(--gray);">⏱ 4 minutos · hasta que esté transparente</p>
|
| 319 |
-
<div class="voice">🔊 OpenBMB voice narra…</div>
|
| 320 |
-
</div>
|
| 321 |
-
</div></div>
|
| 322 |
-
<div class="scenario-label">2. Voz + imagen objetivo</div>
|
| 323 |
-
</div>
|
| 324 |
-
|
| 325 |
-
<div>
|
| 326 |
-
<div class="phone"><div class="screen">
|
| 327 |
-
<div class="topbar">📸 ¿Voy bien?</div>
|
| 328 |
-
<div class="body">
|
| 329 |
-
<div class="illu">🍳👀</div>
|
| 330 |
-
<p style="background:var(--green-soft);border-radius:6px;padding:6px 10px;color:var(--green);">
|
| 331 |
-
<strong>✅ Va perfecto.</strong> La cebolla ya se ve transparente.
|
| 332 |
-
</p>
|
| 333 |
-
<div class="tip">🔊 Cohere voice: "¡Súbele 1 minuto más, está bien!"</div>
|
| 334 |
-
</div>
|
| 335 |
-
</div></div>
|
| 336 |
-
<div class="scenario-label">3. Closed-loop visual</div>
|
| 337 |
-
</div>
|
| 338 |
-
|
| 339 |
-
<div>
|
| 340 |
-
<div class="phone"><div class="screen">
|
| 341 |
-
<div class="topbar">🔄 Replan</div>
|
| 342 |
-
<div class="body">
|
| 343 |
-
<p>Usuario: <em>"No tengo cilantro."</em></p>
|
| 344 |
-
<div class="illu" style="background: linear-gradient(135deg,#ffd28b,#a85c2a);">🌮</div>
|
| 345 |
-
<p>"No pasa nada. Le ponemos perejil o nada. Sigue siendo tinga."</p>
|
| 346 |
-
<div class="voice">🔊 Receta regenera · plato final actualizado</div>
|
| 347 |
-
</div>
|
| 348 |
-
</div></div>
|
| 349 |
-
<div class="scenario-label">4. Adaptación en vivo</div>
|
| 350 |
-
</div>
|
| 351 |
-
|
| 352 |
-
</div>
|
| 353 |
-
|
| 354 |
-
|
| 355 |
-
<h2><span class="num">03</span>Arquitectura — 5 agentes en un Gradio Workflow</h2>
|
| 356 |
-
|
| 357 |
-
<svg viewBox="0 0 1240 540" xmlns="http://www.w3.org/2000/svg">
|
| 358 |
-
<defs>
|
| 359 |
-
<marker id="ar" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="7" markerHeight="7" orient="auto">
|
| 360 |
-
<path d="M0,0 L10,5 L0,10 z" fill="#2b2018"/>
|
| 361 |
-
</marker>
|
| 362 |
-
<marker id="aro" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="7" markerHeight="7" orient="auto">
|
| 363 |
-
<path d="M0,0 L10,5 L0,10 z" fill="#a85c2a"/>
|
| 364 |
-
</marker>
|
| 365 |
-
</defs>
|
| 366 |
-
|
| 367 |
-
<!-- User input area -->
|
| 368 |
-
<rect x="20" y="40" width="200" height="240" rx="10" fill="#fff3cf" stroke="#d8c9ad" stroke-dasharray="4 3"/>
|
| 369 |
-
<text x="40" y="62" class="node-text node-title" fill="#6b4a2a">USUARIO (cocina)</text>
|
| 370 |
-
|
| 371 |
-
<rect class="node-box" x="40" y="80" width="160" height="50" rx="6" fill="#ddd1bd"/>
|
| 372 |
-
<text x="120" y="102" class="node-text node-title" text-anchor="middle">📸 Foto del refri</text>
|
| 373 |
-
<text x="120" y="118" class="node-text node-sub" text-anchor="middle">trigger inicial</text>
|
| 374 |
-
|
| 375 |
-
<rect class="node-box" x="40" y="140" width="160" height="50" rx="6" fill="#ddd1bd"/>
|
| 376 |
-
<text x="120" y="162" class="node-text node-title" text-anchor="middle">🎙️ Pregunta voz</text>
|
| 377 |
-
<text x="120" y="178" class="node-text node-sub" text-anchor="middle">"¿voy bien?"</text>
|
| 378 |
-
|
| 379 |
-
<rect class="node-box" x="40" y="200" width="160" height="50" rx="6" fill="#ddd1bd"/>
|
| 380 |
-
<text x="120" y="222" class="node-text node-title" text-anchor="middle">📸 Foto progreso</text>
|
| 381 |
-
<text x="120" y="238" class="node-text node-sub" text-anchor="middle">closed-loop</text>
|
| 382 |
-
|
| 383 |
-
<!-- Output area -->
|
| 384 |
-
<rect x="20" y="320" width="200" height="180" rx="10" fill="#fff3cf" stroke="#d8c9ad" stroke-dasharray="4 3"/>
|
| 385 |
-
<text x="40" y="342" class="node-text node-title" fill="#6b4a2a">SALIDA</text>
|
| 386 |
-
|
| 387 |
-
<rect class="node-box" x="40" y="360" width="160" height="50" rx="6" fill="#dbe9d8"/>
|
| 388 |
-
<text x="120" y="382" class="node-text node-title" text-anchor="middle">🍽️ Plato final + receta</text>
|
| 389 |
-
<text x="120" y="398" class="node-text node-sub" text-anchor="middle">imagen + texto</text>
|
| 390 |
-
|
| 391 |
-
<rect class="node-box" x="40" y="420" width="160" height="50" rx="6" fill="#dbe9d8"/>
|
| 392 |
-
<text x="120" y="442" class="node-text node-title" text-anchor="middle">🔊 Voz por paso</text>
|
| 393 |
-
<text x="120" y="458" class="node-text node-sub" text-anchor="middle">narrador + tips</text>
|
| 394 |
-
|
| 395 |
-
<!-- Pipeline center -->
|
| 396 |
-
<rect x="260" y="40" width="700" height="460" rx="10" fill="#fffaf0" stroke="#d8c9ad" stroke-width="1.5"/>
|
| 397 |
-
<text x="610" y="62" class="node-text node-title" text-anchor="middle" fill="#6b4a2a">HF SPACE — Gradio Workflow (5 agentes)</text>
|
| 398 |
-
|
| 399 |
-
<!-- Vision (Mise en Place) -->
|
| 400 |
-
<rect class="node-box" x="280" y="90" width="200" height="80" rx="6" fill="#e6d5ed"/>
|
| 401 |
-
<text x="380" y="110" class="node-text node-title" text-anchor="middle">👁️ MISE EN PLACE</text>
|
| 402 |
-
<text x="380" y="126" class="node-text node-sub" text-anchor="middle">MiniCPM-V (Q4)</text>
|
| 403 |
-
<text x="380" y="142" class="node-text node-sub" text-anchor="middle">~2-4B</text>
|
| 404 |
-
<text x="380" y="160" class="node-text node-sub" text-anchor="middle">identifica ingredientes</text>
|
| 405 |
-
|
| 406 |
-
<!-- Recipe Planner -->
|
| 407 |
-
<rect class="node-box" x="510" y="90" width="200" height="80" rx="6" fill="#fbe4d3"/>
|
| 408 |
-
<text x="610" y="110" class="node-text node-title" text-anchor="middle">🧠 RECIPE PLANNER</text>
|
| 409 |
-
<text x="610" y="126" class="node-text node-sub" text-anchor="middle">MiniCPM-4 (LoRA mx)</text>
|
| 410 |
-
<text x="610" y="142" class="node-text node-sub" text-anchor="middle">~4B</text>
|
| 411 |
-
<text x="610" y="160" class="node-text node-sub" text-anchor="middle">arma receta JSON · replan</text>
|
| 412 |
-
|
| 413 |
-
<!-- Step Illustrator -->
|
| 414 |
-
<rect class="node-box" x="740" y="90" width="200" height="80" rx="6" fill="#f6dccc"/>
|
| 415 |
-
<text x="840" y="110" class="node-text node-title" text-anchor="middle">🎨 STEP ILLUSTRATOR</text>
|
| 416 |
-
<text x="840" y="126" class="node-text node-sub" text-anchor="middle">Flux.2 Klein 9B</text>
|
| 417 |
-
<text x="840" y="142" class="node-text node-sub" text-anchor="middle">en Modal GPU L4</text>
|
| 418 |
-
<text x="840" y="160" class="node-text node-sub" text-anchor="middle">imagen-objetivo por paso</text>
|
| 419 |
-
|
| 420 |
-
<!-- Sous-Chef Narrator -->
|
| 421 |
-
<rect class="node-box" x="510" y="200" width="200" height="70" rx="6" fill="#cfe0ee"/>
|
| 422 |
-
<text x="610" y="222" class="node-text node-title" text-anchor="middle">🔊 SOUS-CHEF NARRATOR</text>
|
| 423 |
-
<text x="610" y="238" class="node-text node-sub" text-anchor="middle">OpenBMB voice (~1B)</text>
|
| 424 |
-
<text x="610" y="254" class="node-text node-sub" text-anchor="middle">tono cálido</text>
|
| 425 |
-
|
| 426 |
-
<!-- Tip Giver -->
|
| 427 |
-
<rect class="node-box" x="740" y="200" width="200" height="70" rx="6" fill="#e9d6f5"/>
|
| 428 |
-
<text x="840" y="222" class="node-text node-title" text-anchor="middle">🎭 TIP GIVER</text>
|
| 429 |
-
<text x="840" y="238" class="node-text node-sub" text-anchor="middle">Cohere voice (~1B)</text>
|
| 430 |
-
<text x="840" y="254" class="node-text node-sub" text-anchor="middle">warnings · enérgico</text>
|
| 431 |
-
|
| 432 |
-
<!-- Progress Validator (closed loop) -->
|
| 433 |
-
<rect class="node-box" x="280" y="290" width="220" height="90" rx="6" fill="#dbe9d8" stroke="#3f7a3a" stroke-width="2"/>
|
| 434 |
-
<text x="390" y="312" class="node-text node-title" text-anchor="middle" fill="#3f7a3a">✅ PROGRESS VALIDATOR</text>
|
| 435 |
-
<text x="390" y="328" class="node-text node-sub" text-anchor="middle">MiniCPM-V (reuso)</text>
|
| 436 |
-
<text x="390" y="344" class="node-text node-sub" text-anchor="middle">compara foto usuario vs</text>
|
| 437 |
-
<text x="390" y="360" class="node-text node-sub" text-anchor="middle">imagen-objetivo</text>
|
| 438 |
-
<text x="390" y="376" class="node-text node-sub" text-anchor="middle">CLOSED LOOP 🔄</text>
|
| 439 |
-
|
| 440 |
-
<!-- STT -->
|
| 441 |
-
<rect class="node-box" x="280" y="200" width="200" height="70" rx="6" fill="#cfe0ee"/>
|
| 442 |
-
<text x="380" y="222" class="node-text node-title" text-anchor="middle">🎙️ STT (opcional)</text>
|
| 443 |
-
<text x="380" y="238" class="node-text node-sub" text-anchor="middle">Whisper-tiny (~40M)</text>
|
| 444 |
-
<text x="380" y="254" class="node-text node-sub" text-anchor="middle">"¿voy bien?" hands-free</text>
|
| 445 |
-
|
| 446 |
-
<!-- Recipe State -->
|
| 447 |
-
<rect class="node-box" x="510" y="290" width="430" height="90" rx="6" fill="#fff3cf"/>
|
| 448 |
-
<text x="725" y="312" class="node-text node-title" text-anchor="middle" fill="#8a6a18">📖 RECIPE STATE (dataclass)</text>
|
| 449 |
-
<text x="725" y="328" class="node-text node-sub" text-anchor="middle">name · final_dish_image · steps · current_step ·</text>
|
| 450 |
-
<text x="725" y="344" class="node-text node-sub" text-anchor="middle">missing_ingredients · substitutes · user_progress_photos</text>
|
| 451 |
-
<text x="725" y="362" class="node-text node-sub" text-anchor="middle">cada agente lee y escribe sobre este objeto</text>
|
| 452 |
-
|
| 453 |
-
<!-- Page assembler -->
|
| 454 |
-
<rect class="node-box" x="280" y="400" width="660" height="60" rx="6" fill="#f6dccc"/>
|
| 455 |
-
<text x="610" y="422" class="node-text node-title" text-anchor="middle">📖 RECIPE CARD ASSEMBLER</text>
|
| 456 |
-
<text x="610" y="438" class="node-text node-sub" text-anchor="middle">renderiza la tarjeta de receta + cards por paso + audio reproducible</text>
|
| 457 |
-
|
| 458 |
-
<!-- Modal box -->
|
| 459 |
-
<rect x="990" y="40" width="240" height="460" rx="10" fill="#dbe9d8" stroke="#3f7a3a" stroke-width="1.5"/>
|
| 460 |
-
<text x="1110" y="62" class="node-text node-title" text-anchor="middle" fill="#3f7a3a">MODAL</text>
|
| 461 |
-
|
| 462 |
-
<rect class="node-box" x="1010" y="90" width="200" height="80" rx="6" fill="#fff"/>
|
| 463 |
-
<text x="1110" y="112" class="node-text node-title" text-anchor="middle">Flux endpoint</text>
|
| 464 |
-
<text x="1110" y="128" class="node-text node-sub" text-anchor="middle">runtime · @app.cls L4</text>
|
| 465 |
-
<text x="1110" y="144" class="node-text node-sub" text-anchor="middle">scaledown 180s</text>
|
| 466 |
-
<text x="1110" y="160" class="node-text node-sub" text-anchor="middle">~1-3s/imagen</text>
|
| 467 |
-
|
| 468 |
-
<rect class="node-box" x="1010" y="190" width="200" height="80" rx="6" fill="#fff"/>
|
| 469 |
-
<text x="1110" y="212" class="node-text node-title" text-anchor="middle">Dataset cocina mx</text>
|
| 470 |
-
<text x="1110" y="228" class="node-text node-sub" text-anchor="middle">offline · 200 recetas</text>
|
| 471 |
-
<text x="1110" y="244" class="node-text node-sub" text-anchor="middle">Codex API genera</text>
|
| 472 |
-
<text x="1110" y="260" class="node-text node-sub" text-anchor="middle">~$5</text>
|
| 473 |
-
|
| 474 |
-
<rect class="node-box" x="1010" y="290" width="200" height="80" rx="6" fill="#fff"/>
|
| 475 |
-
<text x="1110" y="312" class="node-text node-title" text-anchor="middle">LoRA Planner</text>
|
| 476 |
-
<text x="1110" y="328" class="node-text node-sub" text-anchor="middle">offline · A10G ~30 min</text>
|
| 477 |
-
<text x="1110" y="344" class="node-text node-sub" text-anchor="middle">push GGUF a HF</text>
|
| 478 |
-
<text x="1110" y="360" class="node-text node-sub" text-anchor="middle">~$1</text>
|
| 479 |
-
|
| 480 |
-
<rect class="node-box" x="1010" y="390" width="200" height="80" rx="6" fill="#fff"/>
|
| 481 |
-
<text x="1110" y="412" class="node-text node-title" text-anchor="middle">Eval pipeline</text>
|
| 482 |
-
<text x="1110" y="428" class="node-text node-sub" text-anchor="middle">consistencia visual</text>
|
| 483 |
-
<text x="1110" y="444" class="node-text node-sub" text-anchor="middle">% ingredientes correctos</text>
|
| 484 |
-
<text x="1110" y="460" class="node-text node-sub" text-anchor="middle">~$1</text>
|
| 485 |
-
|
| 486 |
-
<!-- Arrows: input → vision -->
|
| 487 |
-
<path class="arrow" d="M200 105 L278 130" marker-end="url(#ar)"/>
|
| 488 |
-
<text x="200" y="100" class="arrow-label">refri</text>
|
| 489 |
-
|
| 490 |
-
<!-- input → STT -->
|
| 491 |
-
<path class="arrow" d="M200 165 L278 235" marker-end="url(#ar)"/>
|
| 492 |
-
<text x="205" y="200" class="arrow-label">audio</text>
|
| 493 |
-
|
| 494 |
-
<!-- input progress → validator -->
|
| 495 |
-
<path class="arrow arrow-loop" d="M200 225 L278 330" marker-end="url(#aro)"/>
|
| 496 |
-
<text x="200" y="270" class="arrow-label" style="fill:#a85c2a;">progreso</text>
|
| 497 |
-
|
| 498 |
-
<!-- Vision → Planner -->
|
| 499 |
-
<path class="arrow" d="M480 130 L508 130" marker-end="url(#ar)"/>
|
| 500 |
-
<text x="482" y="120" class="arrow-label">ingredientes</text>
|
| 501 |
-
|
| 502 |
-
<!-- Planner → Illustrator -->
|
| 503 |
-
<path class="arrow" d="M710 130 L738 130" marker-end="url(#ar)"/>
|
| 504 |
-
<text x="712" y="120" class="arrow-label">visual prompt</text>
|
| 505 |
-
|
| 506 |
-
<!-- Illustrator → Modal -->
|
| 507 |
-
<path class="arrow dashed" d="M940 130 L1008 130" marker-end="url(#ar)"/>
|
| 508 |
-
<text x="945" y="120" class="arrow-label">.remote()</text>
|
| 509 |
-
|
| 510 |
-
<!-- Planner → narrator -->
|
| 511 |
-
<path class="arrow" d="M610 170 L610 198" marker-end="url(#ar)"/>
|
| 512 |
-
<!-- Planner → tip giver -->
|
| 513 |
-
<path class="arrow" d="M710 145 C 760 170, 800 180, 800 198" marker-end="url(#ar)"/>
|
| 514 |
-
|
| 515 |
-
<!-- Validator → Planner (loop) -->
|
| 516 |
-
<path class="arrow arrow-loop" d="M390 290 C 390 240, 470 190, 510 145" marker-end="url(#aro)"/>
|
| 517 |
-
<text x="395" y="240" class="arrow-label" style="fill:#a85c2a;">verdict · feedback</text>
|
| 518 |
-
|
| 519 |
-
<!-- STT → Validator -->
|
| 520 |
-
<path class="arrow dashed" d="M380 270 L380 288" marker-end="url(#ar)"/>
|
| 521 |
-
|
| 522 |
-
<!-- Recipe state ↔ all agents -->
|
| 523 |
-
<path class="arrow dashed" d="M725 290 L725 270" marker-end="url(#ar)"/>
|
| 524 |
-
<path class="arrow dashed" d="M610 290 L610 270" marker-end="url(#ar)"/>
|
| 525 |
-
|
| 526 |
-
<!-- All → Assembler -->
|
| 527 |
-
<path class="arrow" d="M610 380 L610 398" marker-end="url(#ar)"/>
|
| 528 |
-
|
| 529 |
-
<!-- Assembler → output -->
|
| 530 |
-
<path class="arrow" d="M280 425 C 240 425, 220 410, 200 385" marker-end="url(#ar)"/>
|
| 531 |
-
<path class="arrow" d="M280 440 C 240 440, 220 445, 200 445" marker-end="url(#ar)"/>
|
| 532 |
-
|
| 533 |
-
<!-- Modal → Planner (LoRA pesos offline) -->
|
| 534 |
-
<path class="arrow dashed" d="M1010 330 C 870 330, 750 280, 710 165" marker-end="url(#ar)"/>
|
| 535 |
-
<text x="900" y="280" class="arrow-label">LoRA pesos</text>
|
| 536 |
-
</svg>
|
| 537 |
-
<p style="font-size: 13px; color: var(--accent2); margin-top: 10px;">
|
| 538 |
-
<strong>Flecha naranja</strong> = closed-loop visual (la innovación). El usuario toma foto del progreso, MiniCPM-V valida vs imagen-objetivo, el Planner ajusta o avanza. Ningún recipe app del mercado lo hace.
|
| 539 |
-
</p>
|
| 540 |
-
|
| 541 |
-
|
| 542 |
-
<h2><span class="num">04</span>El truco innovador: closed-loop visual cocinero</h2>
|
| 543 |
-
<div class="grid-3">
|
| 544 |
-
<div class="card">
|
| 545 |
-
<h3>1. Imagen-objetivo por paso</h3>
|
| 546 |
-
<p style="font-size:13px;">Flux.2 genera "así debe verse el sartén/plato/olla en el paso N". No es texto, no es stock photo: es generación context-aware del estado deseado.</p>
|
| 547 |
-
</div>
|
| 548 |
-
<div class="card">
|
| 549 |
-
<h3>2. Validación con foto del usuario</h3>
|
| 550 |
-
<p style="font-size:13px;">El usuario sube foto de cómo va. MiniCPM-V compara contra la imagen-objetivo y devuelve verdict: <code>go</code> · <code>wait</code> · <code>fix</code>.</p>
|
| 551 |
-
</div>
|
| 552 |
-
<div class="card">
|
| 553 |
-
<h3>3. Replan adaptativo</h3>
|
| 554 |
-
<p style="font-size:13px;">"No tengo cilantro." → Planner regenera receta + Flux regenera imagen final. El plan no es estático, evoluciona con el estado real.</p>
|
| 555 |
-
</div>
|
| 556 |
-
</div>
|
| 557 |
-
<p style="margin-top:14px; font-size:14px;">
|
| 558 |
-
<strong>Esta es la sección destacada del README</strong> y el blog post de Field Notes badge: <em>"How visual closed-loop cooking guidance works."</em>
|
| 559 |
-
</p>
|
| 560 |
-
|
| 561 |
-
|
| 562 |
-
<h2><span class="num">05</span>Badges objetivo (5/6)</h2>
|
| 563 |
-
<div class="badges-grid">
|
| 564 |
-
<div class="badge-card"><span class="tag">LLAMA.CPP</span><br/><strong>Llama Champion</strong><p>Vision + Planner via <code>llama-cpp-python</code> con GGUF Q4.</p></div>
|
| 565 |
-
<div class="badge-card"><span class="tag">FINE-TUNED</span><br/><strong>Well-Tuned</strong><p>LoRA en cocina mexicana · publicado en HF.</p></div>
|
| 566 |
-
<div class="badge-card"><span class="tag">CUSTOM UI</span><br/><strong>Off-Brand</strong><p>UI tarjeta de receta · serif · paleta cálida · modo cocina XL.</p></div>
|
| 567 |
-
<div class="badge-card"><span class="tag">OPEN TRACE</span><br/><strong>Sharing is Caring</strong><p>Dataset 150 recetas mx + traces + recetas generadas al Hub.</p></div>
|
| 568 |
-
<div class="badge-card"><span class="tag">TENTATIVE</span><br/><strong>Field Notes</strong><p>Blog: "Le construí un sous-chef a mi mamá".</p></div>
|
| 569 |
-
<div class="badge-card skip"><span class="tag">LOCAL-FIRST</span><br/><strong>Off the Grid</strong><p>Sacrificado: Flux.2 corre en Modal por calidad.</p></div>
|
| 570 |
-
</div>
|
| 571 |
-
|
| 572 |
-
|
| 573 |
-
<h2><span class="num">06</span>Premios objetivo</h2>
|
| 574 |
-
<div class="card">
|
| 575 |
-
<div class="award-row"><span><strong>Backyard AI Track</strong> · $1K–$4K</span><span class="prob prob-h">ALTA</span></div>
|
| 576 |
-
<div class="award-row"><span><strong>Modal Awards</strong> · $3K–$10K credits</span><span class="prob prob-h">ALTA</span></div>
|
| 577 |
-
<div class="award-row"><span><strong>OpenBMB Award</strong> · $1K–$2.5K</span><span class="prob prob-h">ALTA</span></div>
|
| 578 |
-
<div class="award-row"><span><strong>Best Demo</strong> · $1K</span><span class="prob prob-h">ALTA</span></div>
|
| 579 |
-
<div class="award-row"><span><strong>Community Choice</strong> · $2K</span><span class="prob prob-h">ALTA</span></div>
|
| 580 |
-
<div class="award-row"><span><strong>Best Agent</strong> · $1K</span><span class="prob prob-h">ALTA — closed-loop multi-agente real</span></div>
|
| 581 |
-
<div class="award-row"><span><strong>Bonus Quest Champion</strong> · $2K</span><span class="prob prob-m">MEDIA-ALTA · 5/6 badges</span></div>
|
| 582 |
-
<div class="award-row"><span><strong>Off-Brand</strong> · $1.5K</span><span class="prob prob-m">MEDIA</span></div>
|
| 583 |
-
<div class="award-row"><span><strong>Tiny Titan</strong> · $1.5K</span><span class="prob prob-l">BAJA · Flux 9B saca del rango</span></div>
|
| 584 |
-
</div>
|
| 585 |
-
<p style="font-size: 14px; margin-top: 8px;"><strong>Cota razonable acumulada: $5K–$12K cash + $3K–$10K Modal credits.</strong></p>
|
| 586 |
-
|
| 587 |
-
|
| 588 |
-
<h2><span class="num">07</span>Timeline de 10 días</h2>
|
| 589 |
-
<div class="timeline">
|
| 590 |
-
<div class="day"><span class="lbl">D1</span><strong>Setup + Modal Flux endpoint</strong><div class="what">"Hola Flux": prompt → imagen de un platillo. Space vacío deployado.</div></div>
|
| 591 |
-
<div class="day"><span class="lbl">D2</span><strong>Vision: identificación de ingredientes</strong><div class="what">MiniCPM-V Q4 · prueba con 5 fotos reales del refri.</div></div>
|
| 592 |
-
<div class="day"><span class="lbl">D3</span><strong>Recipe Planner LLM</strong><div class="what">MiniCPM-4 · JSON estructurado · 3 opciones a partir de ingredientes.</div></div>
|
| 593 |
-
<div class="day"><span class="lbl">D4</span><strong>Step Illustrator (Flux + consistencia)</strong><div class="what">Imagen del plato final + 5 imágenes-objetivo por paso · i2i suave.</div></div>
|
| 594 |
-
<div class="day"><span class="lbl">D5</span><strong>Voz: narrador + tip-giver</strong><div class="what">OpenBMB voice + Cohere voice · audio pre-renderizado por paso.</div></div>
|
| 595 |
-
<div class="day"><span class="lbl">D6</span><strong>UI Off-Brand: recipe card</strong><div class="what">gr.Blocks + CSS serif tierra · modo cocina XL hands-free.</div></div>
|
| 596 |
-
<div class="day"><span class="lbl">D7</span><strong>Gradio Workflows showcase</strong><div class="what">Pipeline reescrita como Workflow visible · pestaña separada.</div></div>
|
| 597 |
-
<div class="day"><span class="lbl">D8</span><strong>Fine-tune del Planner en cocina mx</strong><div class="what">200 recetas sintéticas · LoRA · GGUF · push HF.</div></div>
|
| 598 |
-
<div class="day"><span class="lbl">D9</span><strong>STT + Progress Validator + eval</strong><div class="what">Whisper · closed-loop activo · Sharing is Caring badge.</div></div>
|
| 599 |
-
<div class="day"><span class="lbl">D10</span><strong>Demo + README + blog + submit</strong><div class="what">Mamá real cocinando · 60-90s · subtítulos EN · Field Notes blog.</div></div>
|
| 600 |
-
</div>
|
| 601 |
-
|
| 602 |
-
|
| 603 |
-
<h2><span class="num">08</span>Plan B (corte de scope)</h2>
|
| 604 |
-
<table>
|
| 605 |
-
<thead><tr><th>#</th><th>Cortar</th><th>Pierdes</th><th>Conservas</th></tr></thead>
|
| 606 |
-
<tbody>
|
| 607 |
-
<tr><td>1</td><td>STT (preguntas voz)</td><td>comodidad demo</td><td>texto + foto</td></tr>
|
| 608 |
-
<tr><td>2</td><td>2da voz (Cohere tip-giver)</td><td>1 sponsor voice</td><td>narrador único</td></tr>
|
| 609 |
-
<tr><td>3</td><td>Progress Validator (closed-loop)</td><td><strong>Best Agent</strong> + innovación principal</td><td>demo lineal</td></tr>
|
| 610 |
-
<tr><td>4</td><td>Fine-tune del Planner</td><td><strong>Well-Tuned</strong></td><td>resto badges</td></tr>
|
| 611 |
-
<tr><td>5</td><td>Gradio Workflows showcase</td><td>diferenciador "fresh"</td><td>pipeline Python</td></tr>
|
| 612 |
-
<tr><td>6</td><td>UI super-custom</td><td><strong>Off-Brand</strong></td><td>UI default</td></tr>
|
| 613 |
-
<tr style="background:#fff3cf;"><td>—</td><td><strong>NUNCA</strong></td><td colspan="2">Vision + Planner + Illustrator + Narrator + UI mínima + video con persona real cocinando</td></tr>
|
| 614 |
-
</tbody>
|
| 615 |
-
</table>
|
| 616 |
-
|
| 617 |
-
|
| 618 |
-
<h2><span class="num">09</span>Riesgos clave</h2>
|
| 619 |
-
<table>
|
| 620 |
-
<thead><tr><th>Riesgo</th><th>Mitigación</th></tr></thead>
|
| 621 |
-
<tbody>
|
| 622 |
-
<tr><td>Flux.2 Klein no tiene API/pesos públicos cuando lo necesitas</td><td>Plan B: Flux.1-schnell o SDXL-Lightning. Pierdes posicionamiento sponsor pero idea sobrevive.</td></tr>
|
| 623 |
-
<tr><td>MiniCPM-V no identifica ingredientes mexicanos (chile poblano, nopales)</td><td>Few-shot en prompt; eventualmente fine-tune ligero del visión sobre 50 fotos etiquetadas</td></tr>
|
| 624 |
-
<tr><td>Flux.2 genera comida poco apetitosa</td><td>Itera prompts ("recipe magazine, warm light, top-down"); usa imagen final como ref para los pasos</td></tr>
|
| 625 |
-
<tr><td>Progress Validator da false positives</td><td>Conservador: solo dice "vas bien" si similitud es alta; default es "sigue" sin juicio fuerte</td></tr>
|
| 626 |
-
<tr><td>Latencia receta > 30s</td><td>Streaming progresivo; paraleliza Flux + TTS</td></tr>
|
| 627 |
-
<tr><td>Modal cold start ~30-60s en Flux</td><td>Pre-warm 30s antes de filmar · <code>keep_warm=1</code> el día del demo</td></tr>
|
| 628 |
-
<tr><td>Persona del demo se quema/cocina mal</td><td>Practica la receta una vez antes · 2-3 candidatos de receta listos</td></tr>
|
| 629 |
-
<tr><td>Otro equipo presenta "recipe app con AI"</td><td>Diferéncialo con: closed-loop visual + español + cocina mx + dataset publicado + persona real</td></tr>
|
| 630 |
-
</tbody>
|
| 631 |
-
</table>
|
| 632 |
-
|
| 633 |
-
|
| 634 |
-
<h2><span class="num">10</span>Cómo gastar los créditos</h2>
|
| 635 |
-
<div class="grid-2">
|
| 636 |
-
<div class="card">
|
| 637 |
-
<h3>Modal · $250</h3>
|
| 638 |
-
<table>
|
| 639 |
-
<tr><td>Flux dev (días 1-9)</td><td>$5-15</td></tr>
|
| 640 |
-
<tr><td>Dataset cocina mx</td><td>$3-8</td></tr>
|
| 641 |
-
<tr><td>LoRA + sweeps</td><td>$4-5</td></tr>
|
| 642 |
-
<tr><td>Eval</td><td>$1</td></tr>
|
| 643 |
-
<tr><td>Inferencia grading jueces</td><td>$10-25</td></tr>
|
| 644 |
-
<tr><th>Subtotal</th><th>$25-65</th></tr>
|
| 645 |
-
<tr><th>+ Buffer</th><th>$30</th></tr>
|
| 646 |
-
<tr><th>Proyectado</th><th><strong>~$55-95 / $250</strong></th></tr>
|
| 647 |
-
</table>
|
| 648 |
-
</div>
|
| 649 |
-
<div class="card">
|
| 650 |
-
<h3>OpenAI Codex · $100</h3>
|
| 651 |
-
<table>
|
| 652 |
-
<tr><td>Codex CLI pair-programmer</td><td>$20-40</td></tr>
|
| 653 |
-
<tr><td>200 recetas mx sintéticas</td><td>$10-25</td></tr>
|
| 654 |
-
<tr><td>Prompts Flux por paso</td><td>$5-10</td></tr>
|
| 655 |
-
<tr><td>Reserva</td><td>$30</td></tr>
|
| 656 |
-
<tr><th>Proyectado</th><th><strong>~$65-105 / $100</strong></th></tr>
|
| 657 |
-
</table>
|
| 658 |
-
</div>
|
| 659 |
-
</div>
|
| 660 |
-
|
| 661 |
-
|
| 662 |
-
<div class="footnote">
|
| 663 |
-
<strong>Mantra del proyecto:</strong> "Una mamá cocinando frente a la cámara. Un platillo que se ve apetitoso. Una voz que la acompaña sin juzgar. Un paso a la vez."
|
| 664 |
-
</div>
|
| 665 |
-
|
| 666 |
-
</div>
|
| 667 |
-
</body>
|
| 668 |
-
</html>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Strategy/estrategia.md
DELETED
|
@@ -1,496 +0,0 @@
|
|
| 1 |
-
# Estrategia detallada — "Cocina Conmigo"
|
| 2 |
-
|
| 3 |
-
> Documento de ejecución. Lee primero `plan.md` para el "qué" y el "por qué".
|
| 4 |
-
> Este archivo es el "cómo": modelo mental, multi-agente, timeline, gasto de créditos, riesgos, snippets.
|
| 5 |
-
|
| 6 |
-
---
|
| 7 |
-
|
| 8 |
-
## 1. Modelo mental: la "receta" como objeto de estado
|
| 9 |
-
|
| 10 |
-
La app no es un chatbot. Es una **máquina de estado** alrededor de un objeto `Recipe` que evoluciona en el tiempo. Ese estado se actualiza en cada turno.
|
| 11 |
-
|
| 12 |
-
```python
|
| 13 |
-
@dataclass
|
| 14 |
-
class Recipe:
|
| 15 |
-
name: str # "Tinga de Pollo"
|
| 16 |
-
final_dish_image: bytes # imagen Flux del plato final
|
| 17 |
-
available_ingredients: list[str] # lo que la cámara vio en el refri
|
| 18 |
-
missing_ingredients: list[str] # lo que falta + sus sustitutos
|
| 19 |
-
steps: list[Step] # 5-7 pasos
|
| 20 |
-
current_step: int # qué paso vamos haciendo
|
| 21 |
-
user_progress_photos: list[bytes] # fotos que el usuario tomó
|
| 22 |
-
|
| 23 |
-
@dataclass
|
| 24 |
-
class Step:
|
| 25 |
-
n: int
|
| 26 |
-
instruction_text: str # "Pica la cebolla en cubos chicos"
|
| 27 |
-
visual_target: bytes # imagen Flux: "así debe verse el sartén"
|
| 28 |
-
duration_estimate: str # "4 minutos"
|
| 29 |
-
audio_narration: bytes # narración pre-renderizada
|
| 30 |
-
tip: str | None # "no la quemes"
|
| 31 |
-
tip_audio: bytes | None # voz Cohere
|
| 32 |
-
```
|
| 33 |
-
|
| 34 |
-
Ventajas de pensarlo así:
|
| 35 |
-
- Cada nodo del Workflow toma `Recipe` y devuelve `Recipe` modificada. Composable y observable.
|
| 36 |
-
- El "replan" (no tengo cilantro) es una sola función `recipe.replan(missing="cilantro") → Recipe`.
|
| 37 |
-
- El "validador" toma `Recipe` + `progress_photo` y devuelve `feedback`.
|
| 38 |
-
|
| 39 |
-
---
|
| 40 |
-
|
| 41 |
-
## 2. Los 5 agentes (multi-agente real, no simulado)
|
| 42 |
-
|
| 43 |
-
| Agente | Responsabilidad | Trigger | Output |
|
| 44 |
-
|---|---|---|---|
|
| 45 |
-
| **Mise en Place** | Identificar ingredientes en foto del refri | foto del refri | `available_ingredients` |
|
| 46 |
-
| **Recipe Planner** | Proponer 3 recetas factibles · armar la elegida | usuario elige idea | `Recipe` con steps |
|
| 47 |
-
| **Step Illustrator** | Generar imagen-objetivo de cada paso + plato final | nueva receta | `Step.visual_target` para cada paso |
|
| 48 |
-
| **Sous-Chef Narrator** | Narrar instrucciones por voz | paso activo | `Step.audio_narration` |
|
| 49 |
-
| **Progress Validator** | Comparar foto del usuario vs imagen-objetivo | usuario sube foto mid-cooking | `feedback` (texto + voz tip) |
|
| 50 |
-
|
| 51 |
-
Esto es un **sistema multi-agente real**: cada agente tiene su propia función, su propio modelo, y se comunican por estado compartido (`Recipe`). No es un agente único con tools — es 5 agentes en pipeline + closed-loop.
|
| 52 |
-
|
| 53 |
-
> **Best Agent badge candidate.** Documenta esto en el README con un diagrama explícito.
|
| 54 |
-
|
| 55 |
-
---
|
| 56 |
-
|
| 57 |
-
## 3. El truco innovador: closed-loop visual
|
| 58 |
-
|
| 59 |
-
```
|
| 60 |
-
┌─────────────────────────────────────┐
|
| 61 |
-
│ │
|
| 62 |
-
▼ │
|
| 63 |
-
[Step Illustrator]──▶ visual_target ──▶ [Usuario cocina]
|
| 64 |
-
│
|
| 65 |
-
▼
|
| 66 |
-
📸 progress_photo
|
| 67 |
-
│
|
| 68 |
-
▼
|
| 69 |
-
[Progress Validator]
|
| 70 |
-
(MiniCPM-V)
|
| 71 |
-
│
|
| 72 |
-
┌───────────────────┤
|
| 73 |
-
│ │
|
| 74 |
-
✅ va bien ❌ ajustar
|
| 75 |
-
│ │
|
| 76 |
-
siguiente paso [Recipe Planner]
|
| 77 |
-
replan/tip
|
| 78 |
-
│
|
| 79 |
-
└──────▶ vuelta al loop
|
| 80 |
-
```
|
| 81 |
-
|
| 82 |
-
Esta es **la innovación técnica** del proyecto. La mayoría de "recipe apps" son listas estáticas. Cocina Conmigo:
|
| 83 |
-
|
| 84 |
-
1. Genera *visualmente* cómo debe verse cada paso (no solo texto).
|
| 85 |
-
2. Acepta foto del usuario y la *compara* con el objetivo.
|
| 86 |
-
3. Adapta el plan en vivo si algo no va.
|
| 87 |
-
|
| 88 |
-
Sección dedicada en el README: *"How visual closed-loop cooking guidance works"*. Es también el blog post de Field Notes.
|
| 89 |
-
|
| 90 |
-
---
|
| 91 |
-
|
| 92 |
-
## 4. Cronograma — 10 días
|
| 93 |
-
|
| 94 |
-
> ~50-70 horas de trabajo + 1 humano + Codex CLI como pair.
|
| 95 |
-
|
| 96 |
-
### Día 1 — Setup + Modal Flux endpoint
|
| 97 |
-
- `pip install gradio modal openai huggingface-hub diffusers llama-cpp-python`
|
| 98 |
-
- `modal setup` y deploya el endpoint Flux que devuelve imagen dada un prompt.
|
| 99 |
-
- Crea Space vacío en HF, push inicial.
|
| 100 |
-
- **Entregable:** Space que muestra una imagen Flux dado un texto.
|
| 101 |
-
|
| 102 |
-
### Día 2 — Vision: identificación de ingredientes
|
| 103 |
-
- Carga MiniCPM-V Q4 GGUF en local.
|
| 104 |
-
- Función: `identify_ingredients(fridge_photo) → list[str]`.
|
| 105 |
-
- Prueba con 5 fotos de refri reales (el tuyo, el de tu mamá).
|
| 106 |
-
- **Entregable:** dada foto del refri, devuelve lista correcta de 80%+ ingredientes visibles.
|
| 107 |
-
|
| 108 |
-
### Día 3 — Recipe Planner LLM
|
| 109 |
-
- Carga MiniCPM-4 Q4 GGUF.
|
| 110 |
-
- Prompt template estructurado que devuelve JSON:
|
| 111 |
-
```json
|
| 112 |
-
{
|
| 113 |
-
"name": "Tinga de Pollo",
|
| 114 |
-
"options": [{"name": "...", "why": "..."}, ...],
|
| 115 |
-
"steps": [{"n": 1, "instruction": "...", "duration": "...", "visual": "..."}],
|
| 116 |
-
"missing": ["cilantro"],
|
| 117 |
-
"substitutes": {"cilantro": ["perejil", "nada"]}
|
| 118 |
-
}
|
| 119 |
-
```
|
| 120 |
-
- Conecta Vision + Planner: foto refri → 3 opciones de receta.
|
| 121 |
-
- **Entregable:** dada foto + selección, devuelve receta completa estructurada.
|
| 122 |
-
|
| 123 |
-
### Día 4 — Step Illustrator (Flux.2 con consistencia)
|
| 124 |
-
- Para cada `Step.visual` del JSON, llama Flux.2 endpoint con prompt:
|
| 125 |
-
> *"Top-down view of a kitchen pan with [step.visual]. Mexican cooking style. Warm lighting. Natural ingredients. Photorealistic, recipe magazine style."*
|
| 126 |
-
- Para mantener estilo entre pasos: usa la imagen del paso anterior como `ref` con `strength=0.6` (más relajado que cuentos, porque el contenido cambia mucho).
|
| 127 |
-
- Genera también imagen del plato final (sin `ref`).
|
| 128 |
-
- **Entregable:** receta de 5 pasos cada uno con imagen-objetivo + foto del plato final.
|
| 129 |
-
|
| 130 |
-
### Día 5 — Voz: narrador + tip-giver
|
| 131 |
-
- **OpenBMB voice** para `Step.audio_narration`: instrucciones tono cálido y claro.
|
| 132 |
-
- **Cohere Labs voice** para `Step.tip_audio`: tono más enérgico ("¡cuidado!").
|
| 133 |
-
- Genera audio de los 5 pasos por adelantado (no en streaming, evita cold starts molestos).
|
| 134 |
-
- **Entregable:** receta completa con narración audible.
|
| 135 |
-
|
| 136 |
-
### Día 6 — UI Off-Brand: tarjeta de receta
|
| 137 |
-
- `gr.Blocks` + CSS custom.
|
| 138 |
-
- Layout: hero con imagen del plato final + título grande, abajo carrusel de pasos cada uno con `imagen objetivo + texto + botón "ya"`, modo cocina hands-free con texto enorme.
|
| 139 |
-
- Estilo: serif elegante (`Lora`), paleta cálida tierra/dorado.
|
| 140 |
-
- **Entregable:** Space que parece tarjeta de revista de cocina, no Gradio.
|
| 141 |
-
|
| 142 |
-
### Día 7 — Gradio Workflows showcase
|
| 143 |
-
- Reescribe pipeline como **Gradio Workflow** con nodos visibles.
|
| 144 |
-
- Nodos: `📸 Fridge → 👁️ Vision → 🧠 Planner → 🎨 Illustrator → 🔊 Narrator → 📖 Recipe Card`.
|
| 145 |
-
- Para `Progress Validator`, agrega rama: `📸 Progress Photo → 👁️ Validator → 💬 Feedback`.
|
| 146 |
-
- Pestaña separada en el Space que muestra el grafo del Workflow corriendo en vivo.
|
| 147 |
-
- **Entregable:** Workflow visualmente impresionante en pantalla. Diferenciador para jueces de Gradio.
|
| 148 |
-
|
| 149 |
-
### Día 8 — Fine-tune del Planner en cocina mexicana
|
| 150 |
-
- **Dataset sintético en Modal:** Codex API genera 200 recetas mexicanas en formato JSON estructurado (tinga, mole, chiles rellenos, sopes, pozole, etc.). Filtras manualmente las 150 mejores.
|
| 151 |
-
- **LoRA en Modal A10G:** ~30-60 min de fine-tune sobre MiniCPM-4 4B.
|
| 152 |
-
- **GGUF + push HF:** convierte a Q4_K_M, sube a HF Hub.
|
| 153 |
-
- Reemplaza el Planner por la versión fine-tuneada.
|
| 154 |
-
- **Entregable:** modelo `tu-usuario/cocinaconmigo-4b-mx-Q4_K_M-gguf` publicado.
|
| 155 |
-
|
| 156 |
-
### Día 9 — STT + Progress Validator + eval
|
| 157 |
-
- `faster-whisper tiny` en español: usuario pregunta hands-free.
|
| 158 |
-
- Implementa **Progress Validator**: foto del usuario → MiniCPM-V compara contra `Step.visual_target` → genera feedback.
|
| 159 |
-
- Eval: 10 recetas generadas, mide:
|
| 160 |
-
- % ingredientes correctamente identificados.
|
| 161 |
-
- % pasos con imagen-objetivo coherente.
|
| 162 |
-
- Calidad subjetiva de validación (5 fotos de progreso).
|
| 163 |
-
- Sube traces al Hub (badge **Sharing is Caring**).
|
| 164 |
-
- **Entregable:** app completa con voz IN, validador, traces publicados.
|
| 165 |
-
|
| 166 |
-
### Día 10 — Demo video + README + blog + submit
|
| 167 |
-
- **Filma a una persona real cocinando** una receta sugerida por la app, de principio a fin.
|
| 168 |
-
- 60-90 segundos: foto del refri → 3 opciones → elige → cocina con voz → toma foto mid-cooking → app valida → plato final → la persona come.
|
| 169 |
-
- README: badges declarados, diagrama, link al video, sección "How closed-loop visual cooking guidance works".
|
| 170 |
-
- Blog post (badge **Field Notes**): "Le construí un sous-chef a mi mamá".
|
| 171 |
-
- Submit + post social.
|
| 172 |
-
|
| 173 |
-
---
|
| 174 |
-
|
| 175 |
-
## 5. Decisiones técnicas explícitas
|
| 176 |
-
|
| 177 |
-
### 5.1 Por qué Modal en runtime (rompiendo Off the Grid)
|
| 178 |
-
Igual que en planes anteriores: Flux.2 9B en CPU del Space free es inviable (GB de RAM y minutos por imagen). Modal-powered es la elección obligada cuando el centro de la app es generación visual.
|
| 179 |
-
|
| 180 |
-
### 5.2 Por qué cocina mexicana específicamente
|
| 181 |
-
- Dataset acotado pero rico. Cubrible en 200 recetas.
|
| 182 |
-
- Diferenciador cultural automático.
|
| 183 |
-
- Se alinea con el público "para mi mamá" (si tu mamá es latina).
|
| 184 |
-
- Si los jueces son mexicanos en Discord/Slack, +1.
|
| 185 |
-
|
| 186 |
-
### 5.3 Por qué visual_target con Flux.2 en lugar de imagen stock
|
| 187 |
-
- Stock photos tienen sesgo americano/europeo. Flux genera estilo mexicano si lo prompteas.
|
| 188 |
-
- Stock no se ajusta al ingrediente exacto que tienes (Flux sí).
|
| 189 |
-
- Esto es lo que hace única la app — es el wow factor.
|
| 190 |
-
|
| 191 |
-
### 5.4 Por qué pre-renderizar audio en lugar de streaming
|
| 192 |
-
- Latencia: streaming TTS tarda y se ve mal en demo.
|
| 193 |
-
- Cocina es secuencial: sabes los 5 pasos cuando empieza la receta. Pre-render todo en paralelo.
|
| 194 |
-
- Si el usuario hace replan, regeneras solo los pasos afectados.
|
| 195 |
-
|
| 196 |
-
### 5.5 LoRA y no full fine-tune
|
| 197 |
-
Mismo argumento de planes anteriores: 150-200 ejemplos = LoRA r=16 es suficiente. ~30 min A10G ≈ $1.
|
| 198 |
-
|
| 199 |
-
### 5.6 Cómo gastar los $250 de Modal
|
| 200 |
-
| Concepto | Estimado |
|
| 201 |
-
|---|---|
|
| 202 |
-
| Inferencia Flux.2 dev (días 1-9, ~5h GPU L4) | $5-15 |
|
| 203 |
-
| Generación dataset sintético cocina mexicana (~2h) | $3-8 |
|
| 204 |
-
| LoRA fine-tune + sweeps (~3h A10G) | $4-5 |
|
| 205 |
-
| Eval pipeline | $1 |
|
| 206 |
-
| Inferencia durante grading de jueces (~10h) | $10-25 |
|
| 207 |
-
| **Subtotal** | **$25-65** |
|
| 208 |
-
| Buffer | $30 |
|
| 209 |
-
| **Total proyectado** | **~$55-95 / $250** |
|
| 210 |
-
|
| 211 |
-
### 5.7 Cómo gastar los $100 de OpenAI Codex
|
| 212 |
-
- Codex CLI durante 10 días como pair-programmer: $20-40.
|
| 213 |
-
- Generación de 200 recetas mexicanas estructuradas (Día 8): $10-25.
|
| 214 |
-
- Generación de prompts de Flux para los pasos (Día 4): $5-10.
|
| 215 |
-
- Reserva: $30.
|
| 216 |
-
|
| 217 |
-
---
|
| 218 |
-
|
| 219 |
-
## 6. Riesgos y mitigaciones
|
| 220 |
-
|
| 221 |
-
| Riesgo | Impacto | Mitigación |
|
| 222 |
-
|---|---|---|
|
| 223 |
-
| Flux.2 Klein no tiene API/pesos públicos cuando lo necesitas | Bloquea idea | Plan B: Flux.1-schnell o SDXL-Lightning. Pierdes tag sponsor pero idea sobrevive. |
|
| 224 |
-
| MiniCPM-V no identifica ingredientes mexicanos (chile poblano, chayote, nopales) | Recipe Planner falla | Agrega few-shot examples al prompt; eventualmente fine-tune del visión sobre 50 fotos etiquetadas |
|
| 225 |
-
| Flux.2 genera comida poco apetitosa/uncanny | Mata el demo | Itera prompts (style="recipe magazine, warm light, top-down"); usa imagen de plato final como ref para los pasos |
|
| 226 |
-
| Latencia: receta completa tarda más de 30s en generarse | Demo aburrido | Streaming progresivo (muestra opción + plato final primero, pasos después); paraleliza Flux + TTS |
|
| 227 |
-
| Modal cold start ~30-60s en Flux | Primera demo lenta | Pre-warm 30s antes de filmar; `keep_warm=1` el día del demo |
|
| 228 |
-
| Validador de progreso da false positives ("vas bien" cuando no) | Confunde al usuario | Conservador: solo dice "vas bien" si la similitud es muy alta; default es "sigue" sin juicio fuerte |
|
| 229 |
-
| TTS español sin acento mexicano | Suena raro | Si OpenBMB no tiene es-MX, usa Cohere o Kokoro con voz neutra; pre-graba para video |
|
| 230 |
-
| Usuario del demo cocina mal/se quema | Mata el video | Practica la receta una vez antes de filmar; ten 2-3 candidatos de receta listos |
|
| 231 |
-
| Otro equipo presenta "recipe app con AI" | Compite por premios | Diferénciate con: closed-loop visual + español + cocina mexicana específica + dataset publicado + persona real cocinando + Workflow visible |
|
| 232 |
-
| Workflows de Gradio inestable (lanzado ayer) | Rompe app | Versión sin Workflows como backup. Workflows es decoración. |
|
| 233 |
-
|
| 234 |
-
---
|
| 235 |
-
|
| 236 |
-
## 7. Plan B — corte de scope
|
| 237 |
-
|
| 238 |
-
Si en Día 7 ves que no llegas, recorta features en este orden:
|
| 239 |
-
|
| 240 |
-
| # | Cortar | Pierdes | Conservas |
|
| 241 |
-
|---|---|---|---|
|
| 242 |
-
| 1 | STT (preguntas hands-free por voz) | comodidad demo | input por texto + foto |
|
| 243 |
-
| 2 | 2da voz (Cohere tip-giver) | un sponsor de voz | narrador único |
|
| 244 |
-
| 3 | Progress Validator (closed-loop) | **Best Agent badge** + innovación principal | demo lineal sin loop |
|
| 245 |
-
| 4 | Fine-tune del Planner | **Well-Tuned badge** | base model + prompting |
|
| 246 |
-
| 5 | Gradio Workflows showcase | diferenciador "fresh" | pipeline Python |
|
| 247 |
-
| 6 | UI super-custom | **Off-Brand badge** | UI default |
|
| 248 |
-
|
| 249 |
-
**NUNCA cortar:**
|
| 250 |
-
- Vision + Planner + Step Illustrator + Narrator + UI mínima + video con persona real cocinando.
|
| 251 |
-
|
| 252 |
-
Eso solo ya entra fuerte a Backyard AI track.
|
| 253 |
-
|
| 254 |
-
---
|
| 255 |
-
|
| 256 |
-
## 8. Métricas de éxito (auto-evaluación pre-submit)
|
| 257 |
-
|
| 258 |
-
Antes de mandar:
|
| 259 |
-
|
| 260 |
-
- [ ] Una persona real cocinó una receta entera con la app y se la comió.
|
| 261 |
-
- [ ] El video tiene una cara humana y un plato terminado en al menos 30s de los 90s.
|
| 262 |
-
- [ ] La app identifica correctamente ≥4 de 5 ingredientes en una foto típica de refri.
|
| 263 |
-
- [ ] Las imágenes de Flux para los pasos se ven *apetitosas* (test: si las muestras a alguien sin contexto, dice "se ve rico").
|
| 264 |
-
- [ ] Una receta completa se genera en menos de 30s (texto + 5 imágenes + audio).
|
| 265 |
-
- [ ] El Progress Validator funciona en al menos 5 de 10 fotos de progreso reales.
|
| 266 |
-
- [ ] El README tiene un diagrama y la sección "How closed-loop cooking works".
|
| 267 |
-
- [ ] Hay 3 recetas pre-renderizadas listas para que jueces las vean sin esperar.
|
| 268 |
-
- [ ] Total params declarado y verificado ≤ 32B.
|
| 269 |
-
- [ ] Sin secrets hardcoded.
|
| 270 |
-
|
| 271 |
-
Si fallas más de 2, no submitas; arregla.
|
| 272 |
-
|
| 273 |
-
---
|
| 274 |
-
|
| 275 |
-
## 9. Lo que NO debes hacer
|
| 276 |
-
|
| 277 |
-
- **No** intentes generar video del platillo. Imagen estática se ve mejor que video AI mediocre.
|
| 278 |
-
- **No** hagas más de 7 pasos por receta. Atención del juez = 60-90s.
|
| 279 |
-
- **No** soportes 100 recetas. Soporta 20 recetas mexicanas excelentes y di "más recetas pronto".
|
| 280 |
-
- **No** subas fotos del refri real con productos identificables (marcas, info personal). Borra labels.
|
| 281 |
-
- **No** persigas Off the Grid. Decisión ya tomada.
|
| 282 |
-
- **No** dejes el video de demo para el último día sin practicar la receta antes.
|
| 283 |
-
- **No** publiques tokens en el repo.
|
| 284 |
-
- **No** generes recetas con ingredientes raros que la mayoría no tenga (cocina accesible > cocina chef).
|
| 285 |
-
|
| 286 |
-
---
|
| 287 |
-
|
| 288 |
-
## 10. Pitch del README (esqueleto)
|
| 289 |
-
|
| 290 |
-
```markdown
|
| 291 |
-
# Cocina Conmigo
|
| 292 |
-
> A visual sous-chef that sees what's in your fridge,
|
| 293 |
-
> shows you what each step should look like, and walks you through it
|
| 294 |
-
> with voice — hands-free.
|
| 295 |
-
|
| 296 |
-
[60-second demo video embed: tu mamá cocinando tinga]
|
| 297 |
-
|
| 298 |
-
## Why it shouldn't exist (but does)
|
| 299 |
-
Every recipe app is a list of steps. Cocina Conmigo is a closed-loop assistant:
|
| 300 |
-
it generates the *target image* of each cooking step with Flux.2, listens
|
| 301 |
-
when you ask "¿voy bien?", and adapts when you say "no tengo cilantro."
|
| 302 |
-
|
| 303 |
-
## Tech
|
| 304 |
-
- 👁️ MiniCPM-V — sees your fridge + validates your progress
|
| 305 |
-
- 🧠 MiniCPM-4 4B (LoRA fine-tuned on Mexican cuisine) — recipe planner
|
| 306 |
-
- 🎨 Flux.2 Klein 9B (Modal endpoint) — generates target images per step
|
| 307 |
-
- 🔊 OpenBMB voice — sous-chef narrator
|
| 308 |
-
- 🎭 Cohere voice — tip-giver second voice
|
| 309 |
-
- 🎙️ Whisper-tiny — voice input
|
| 310 |
-
- ⚙️ Gradio Workflows — visible pipeline of nodes
|
| 311 |
-
|
| 312 |
-
Total params: ~17B (≤ 32B ✓)
|
| 313 |
-
|
| 314 |
-
## Badges
|
| 315 |
-
✓ Llama Champion · ✓ Well-Tuned · ✓ Off-Brand · ✓ Sharing is Caring · ✓ Field Notes
|
| 316 |
-
|
| 317 |
-
## Built for
|
| 318 |
-
My mom. She makes great mole. She can never remember tinga.
|
| 319 |
-
|
| 320 |
-
## Try it
|
| 321 |
-
[HF Space link]
|
| 322 |
-
```
|
| 323 |
-
|
| 324 |
-
---
|
| 325 |
-
|
| 326 |
-
## 11. Apéndice: snippets clave
|
| 327 |
-
|
| 328 |
-
### 11.1 Mise en Place agent (vision)
|
| 329 |
-
```python
|
| 330 |
-
def identify_ingredients(image: PIL.Image) -> list[str]:
|
| 331 |
-
prompt = """Veo esta foto de un refrigerador o despensa.
|
| 332 |
-
Lista TODOS los ingredientes que se ven, en español, en JSON:
|
| 333 |
-
{"ingredients": ["pollo", "cebolla", "cilantro", ...]}
|
| 334 |
-
Solo ingredientes alimentarios, no contenedores."""
|
| 335 |
-
out = mini_cpm_v.create_chat_completion(messages=[
|
| 336 |
-
{"role": "user", "content": [
|
| 337 |
-
{"type": "image_url", "image_url": pil_to_data_url(image)},
|
| 338 |
-
{"type": "text", "text": prompt}
|
| 339 |
-
]}
|
| 340 |
-
])
|
| 341 |
-
return json.loads(out["choices"][0]["message"]["content"])["ingredients"]
|
| 342 |
-
```
|
| 343 |
-
|
| 344 |
-
### 11.2 Recipe Planner agent (LLM)
|
| 345 |
-
```python
|
| 346 |
-
SYS = """Eres un chef mexicano. Generas recetas a partir de ingredientes
|
| 347 |
-
disponibles. Prefiere cocina mexicana tradicional, accesible.
|
| 348 |
-
|
| 349 |
-
Salida JSON estricta:
|
| 350 |
-
{
|
| 351 |
-
"name": "...",
|
| 352 |
-
"options": [{"name": "...", "why": "..."}],
|
| 353 |
-
"steps": [
|
| 354 |
-
{"n": 1, "instruction": "...", "duration": "4 min",
|
| 355 |
-
"visual": "english visual description for image gen",
|
| 356 |
-
"tip": "optional warning or tip"}
|
| 357 |
-
],
|
| 358 |
-
"missing": ["cilantro"],
|
| 359 |
-
"substitutes": {"cilantro": ["perejil", "nada"]},
|
| 360 |
-
"final_dish_visual": "english visual description of the final plated dish"
|
| 361 |
-
}
|
| 362 |
-
"""
|
| 363 |
-
|
| 364 |
-
def plan_recipe(ingredients, choice=None):
|
| 365 |
-
msgs = [{"role": "system", "content": SYS}]
|
| 366 |
-
msgs.append({"role": "user", "content":
|
| 367 |
-
f"Tengo: {', '.join(ingredients)}.\n"
|
| 368 |
-
+ (f"Quiero hacer: {choice}." if choice else "Propón 3 opciones.")})
|
| 369 |
-
raw = llm.create_chat_completion(messages=msgs, temperature=0.7)
|
| 370 |
-
return json.loads(raw["choices"][0]["message"]["content"])
|
| 371 |
-
```
|
| 372 |
-
|
| 373 |
-
### 11.3 Step Illustrator (Flux endpoint)
|
| 374 |
-
```python
|
| 375 |
-
import modal
|
| 376 |
-
app = modal.App("cocina-flux")
|
| 377 |
-
image = modal.Image.debian_slim().pip_install("torch","diffusers","transformers","accelerate","Pillow")
|
| 378 |
-
|
| 379 |
-
@app.cls(image=image, gpu="L4", scaledown_window=180, keep_warm=0)
|
| 380 |
-
class FluxKlein:
|
| 381 |
-
@modal.enter()
|
| 382 |
-
def load(self):
|
| 383 |
-
from diffusers import FluxPipeline
|
| 384 |
-
self.pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.2-klein",
|
| 385 |
-
torch_dtype="bfloat16").to("cuda")
|
| 386 |
-
|
| 387 |
-
@modal.method()
|
| 388 |
-
def render_step(self, visual: str, ref_png: bytes | None = None) -> bytes:
|
| 389 |
-
from PIL import Image; import io
|
| 390 |
-
prompt = (f"Top-down photo of a kitchen pan or plate showing {visual}. "
|
| 391 |
-
f"Mexican home cooking, warm natural lighting, recipe magazine "
|
| 392 |
-
f"style, photorealistic, appetizing.")
|
| 393 |
-
if ref_png:
|
| 394 |
-
ref = Image.open(io.BytesIO(ref_png)).convert("RGB")
|
| 395 |
-
img = self.pipe(prompt=prompt, image=ref, strength=0.6,
|
| 396 |
-
num_inference_steps=4).images[0]
|
| 397 |
-
else:
|
| 398 |
-
img = self.pipe(prompt=prompt, num_inference_steps=4).images[0]
|
| 399 |
-
buf = io.BytesIO(); img.save(buf, "PNG"); return buf.getvalue()
|
| 400 |
-
```
|
| 401 |
-
|
| 402 |
-
### 11.4 Progress Validator (closed-loop)
|
| 403 |
-
```python
|
| 404 |
-
def validate_progress(target_image: PIL.Image, user_photo: PIL.Image,
|
| 405 |
-
step_instruction: str) -> dict:
|
| 406 |
-
prompt = f"""Compara estas dos fotos de cocina.
|
| 407 |
-
Foto 1 (objetivo): cómo debe verse después del paso "{step_instruction}".
|
| 408 |
-
Foto 2 (usuario): cómo va el usuario.
|
| 409 |
-
|
| 410 |
-
Responde en JSON:
|
| 411 |
-
{{"verdict": "go|wait|fix", "feedback_es": "...", "tip": "..." | null}}
|
| 412 |
-
- "go": va bien, siguiente paso
|
| 413 |
-
- "wait": le falta tiempo
|
| 414 |
-
- "fix": algo se ve mal, sugiere ajuste
|
| 415 |
-
"""
|
| 416 |
-
out = mini_cpm_v.create_chat_completion(messages=[
|
| 417 |
-
{"role": "user", "content": [
|
| 418 |
-
{"type": "image_url", "image_url": pil_to_data_url(target_image)},
|
| 419 |
-
{"type": "image_url", "image_url": pil_to_data_url(user_photo)},
|
| 420 |
-
{"type": "text", "text": prompt}
|
| 421 |
-
]}
|
| 422 |
-
])
|
| 423 |
-
return json.loads(out["choices"][0]["message"]["content"])
|
| 424 |
-
```
|
| 425 |
-
|
| 426 |
-
### 11.5 UI Off-Brand (recipe card)
|
| 427 |
-
```python
|
| 428 |
-
import gradio as gr
|
| 429 |
-
|
| 430 |
-
CSS = """
|
| 431 |
-
@import url('https://fonts.googleapis.com/css2?family=Lora:wght@400;700&family=Inter:wght@400;600&display=swap');
|
| 432 |
-
.gradio-container {background: #f5ecd9 !important; font-family: 'Inter', sans-serif !important;}
|
| 433 |
-
.recipe-hero {background: #fffbf0; border-radius: 14px; padding: 28px;
|
| 434 |
-
box-shadow: 0 8px 24px rgba(0,0,0,0.12); border: 1px solid #d8c9ad;}
|
| 435 |
-
.recipe-hero h1 {font-family: 'Lora', serif !important; font-size: 36px !important;
|
| 436 |
-
margin: 0 0 6px !important; color: #6b4a2a !important;}
|
| 437 |
-
.step-card {background: #fffbf0; border-left: 4px solid #a85c2a;
|
| 438 |
-
border-radius: 8px; padding: 18px 22px; margin: 12px 0;}
|
| 439 |
-
.step-card h3 {font-family: 'Lora', serif !important; margin: 0 !important;}
|
| 440 |
-
.step-card p {font-size: 17px !important; line-height: 1.6;}
|
| 441 |
-
button.primary {background: #a85c2a !important; font-family: 'Inter', sans-serif !important;
|
| 442 |
-
font-weight: 600 !important; font-size: 16px !important; padding: 14px 22px !important;}
|
| 443 |
-
"""
|
| 444 |
-
|
| 445 |
-
with gr.Blocks(css=CSS, title="Cocina Conmigo") as demo:
|
| 446 |
-
gr.Markdown("# 👩🍳 Cocina Conmigo")
|
| 447 |
-
fridge = gr.Image(label="📸 Foto de tu refri o despensa", type="pil")
|
| 448 |
-
btn = gr.Button("¿Qué cocino?", variant="primary")
|
| 449 |
-
with gr.Column(elem_classes=["recipe-hero"]):
|
| 450 |
-
title = gr.Markdown()
|
| 451 |
-
final_img = gr.Image(show_label=False)
|
| 452 |
-
steps_box = gr.Column()
|
| 453 |
-
progress = gr.Image(label="📸 Tómame foto de tu progreso", type="pil")
|
| 454 |
-
feedback = gr.Markdown()
|
| 455 |
-
# callbacks omitidos
|
| 456 |
-
```
|
| 457 |
-
|
| 458 |
-
### 11.6 LoRA fine-tune del Planner en Modal
|
| 459 |
-
```python
|
| 460 |
-
@app.function(image=image_train, gpu="A10G", timeout=60*60*2,
|
| 461 |
-
volumes={"/cache": modal.Volume.from_name("hf-cache", create_if_missing=True)})
|
| 462 |
-
def train_planner():
|
| 463 |
-
import os; os.environ["HF_HOME"] = "/cache"
|
| 464 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 465 |
-
from peft import LoraConfig, get_peft_model
|
| 466 |
-
from trl import SFTTrainer, SFTConfig
|
| 467 |
-
from datasets import load_dataset
|
| 468 |
-
|
| 469 |
-
base = "openbmb/MiniCPM-4-Base"
|
| 470 |
-
tok = AutoTokenizer.from_pretrained(base, trust_remote_code=True)
|
| 471 |
-
model = AutoModelForCausalLM.from_pretrained(base, trust_remote_code=True,
|
| 472 |
-
device_map="cuda", torch_dtype="bfloat16")
|
| 473 |
-
model = get_peft_model(model, LoraConfig(r=16, lora_alpha=32,
|
| 474 |
-
target_modules="all-linear"))
|
| 475 |
-
ds = load_dataset("tu-usuario/recetas-mexicanas-sft", split="train")
|
| 476 |
-
SFTTrainer(model=model, tokenizer=tok, train_dataset=ds,
|
| 477 |
-
args=SFTConfig(output_dir="/cache/out", num_train_epochs=2,
|
| 478 |
-
per_device_train_batch_size=4, learning_rate=2e-4,
|
| 479 |
-
push_to_hub=True,
|
| 480 |
-
hub_model_id="tu-usuario/cocinaconmigo-4b-mx")
|
| 481 |
-
).train()
|
| 482 |
-
```
|
| 483 |
-
|
| 484 |
-
---
|
| 485 |
-
|
| 486 |
-
## 12. Lectura recomendada antes del Día 1
|
| 487 |
-
|
| 488 |
-
- `Context/guia-tecnologias.md` (sección 3 Modal, sección 4 llama.cpp).
|
| 489 |
-
- HF Black Forest Labs: <https://huggingface.co/black-forest-labs> — confirma versión Flux.2 Klein.
|
| 490 |
-
- HF MiniCPM-V: <https://huggingface.co/openbmb> — versión vision con GGUF.
|
| 491 |
-
- Modal stable-diffusion example: <https://github.com/modal-labs/modal-examples/tree/main/06_gpu_and_ml/stable_diffusion>.
|
| 492 |
-
- Diffusers img2img: <https://huggingface.co/docs/diffusers/using-diffusers/img2img>.
|
| 493 |
-
- Gradio Workflows: <https://www.gradio.app/guides> (busca el guide más reciente).
|
| 494 |
-
- Cohere Labs voice: confirma con sponsor el modelo exacto disponible.
|
| 495 |
-
|
| 496 |
-
> Cocina con tu mamá una vez antes de empezar a programar. Te va a aclarar más sobre qué necesita tu app que cualquier brainstorm. Suerte.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Strategy/plan.md
DELETED
|
@@ -1,245 +0,0 @@
|
|
| 1 |
-
# Plan ganador — "Cocina Conmigo"
|
| 2 |
-
|
| 3 |
-
> Un sous-chef multimodal que ve lo que tienes en el refri, te dice qué cocinar, te muestra cómo debe verse cada paso con Flux.2, y te narra todo por voz mientras cocinas con las manos llenas.
|
| 4 |
-
>
|
| 5 |
-
> Hackathon "Small models / Big adventures" — junio 2026.
|
| 6 |
-
|
| 7 |
-
---
|
| 8 |
-
|
| 9 |
-
## TL;DR
|
| 10 |
-
|
| 11 |
-
**Idea elegida:** **Cocina Conmigo** — un copiloto de cocina hands-free que combina visión, razonamiento, generación de imagen en tiempo real, y voz, para acompañarte de principio a fin: desde *"¿qué cocino con esto?"* hasta *"¿voy bien?"*.
|
| 12 |
-
|
| 13 |
-
**Por qué esta y no otra:** es la única idea que **(1) está fuera de las 11 ideas pre-cocinadas por OpenBMB**, **(2) usa Flux.2 + voces + Workflows como núcleo**, y **(3) tiene utilidad real, diaria y universal**. Nadie cocina como hobby; todos cocinan por necesidad.
|
| 14 |
-
|
| 15 |
-
---
|
| 16 |
-
|
| 17 |
-
## Por qué cambió el plan respecto a iteraciones anteriores
|
| 18 |
-
|
| 19 |
-
| Iteración | Idea | Por qué se descartó |
|
| 20 |
-
|---|---|---|
|
| 21 |
-
| v1 | Abuelita (parent phone helper) | **Está en la lista pre-cocinada de OpenBMB para Backyard AI.** 5-15 equipos van a hacer la misma cosa. |
|
| 22 |
-
| v2 | Cuentacuentos (storyteller ilustrado) | **Está en la lista pre-cocinada de OpenBMB para Thousand Token Wood ("voice storyteller").** Mismo problema de saturación. |
|
| 23 |
-
| v3 (ésta) | **Cocina Conmigo** | Refinamiento de **tu propia idea #1**, ahora viable de verdad gracias a Flux.2. **No está en ninguna lista pre-cocinada.** |
|
| 24 |
-
|
| 25 |
-
La regla estratégica: **usar los modelos de los sponsors, no copiar sus templates de proyecto.**
|
| 26 |
-
|
| 27 |
-
---
|
| 28 |
-
|
| 29 |
-
## Las 12 ideas en zona prohibida (clúster OpenBMB)
|
| 30 |
-
|
| 31 |
-
| Backyard AI | Thousand Token Wood |
|
| 32 |
-
|---|---|
|
| 33 |
-
| Parent phone helper | Voice storyteller |
|
| 34 |
-
| Receipt / bill explainer | Visual mystery box |
|
| 35 |
-
| Shop menu / repair manual | AI museum |
|
| 36 |
-
| Offline personal assistant / voice companion | Doodle creature |
|
| 37 |
-
| | Dream postcard gen |
|
| 38 |
-
| | Omni-modal adventure |
|
| 39 |
-
| | Tiny local NPC / character agent |
|
| 40 |
-
|
| 41 |
-
Y de tus 5 ideas originales, también caen:
|
| 42 |
-
- #3 cortes de cabello (tú mismo dijiste "ya está muy trabajado")
|
| 43 |
-
- #4 museum Q&A (choca con "AI museum")
|
| 44 |
-
|
| 45 |
-
**Quedan vivas, fuera de zona prohibida:**
|
| 46 |
-
- #1 Recetas (→ **Cocina Conmigo**, esta propuesta)
|
| 47 |
-
- #2 Detector de intenciones (no usa Flux.2, demo aburrida)
|
| 48 |
-
- #5 Outfits con armario (alternativa B, ver final del documento)
|
| 49 |
-
|
| 50 |
-
---
|
| 51 |
-
|
| 52 |
-
## El producto en una frase
|
| 53 |
-
|
| 54 |
-
> *"Mi mamá me pidió que le enseñara a hacer ramen. Le construí un sous-chef que vive en su tablet."*
|
| 55 |
-
|
| 56 |
-
---
|
| 57 |
-
|
| 58 |
-
## Las 4 historias del demo
|
| 59 |
-
|
| 60 |
-
### 1. *"Tengo esto en el refri"*
|
| 61 |
-
```
|
| 62 |
-
👩 Mamá toma foto del refri abierto.
|
| 63 |
-
🤖 [MiniCPM-V] "Veo: pollo, cebolla, jitomate, cilantro, tortillas, queso."
|
| 64 |
-
🤖 [LLM] "Te puedo proponer: tinga de pollo, enchiladas, o quesadillas. ¿Qué traes ganas?"
|
| 65 |
-
👩 "Tinga."
|
| 66 |
-
🤖 [Flux.2] genera foto del platillo final, hermosa, mexicana.
|
| 67 |
-
🤖 "Perfecto. Te tomará 35 minutos. ¿Empezamos?"
|
| 68 |
-
```
|
| 69 |
-
|
| 70 |
-
### 2. *"Cocina paso a paso"* (hands-free)
|
| 71 |
-
```
|
| 72 |
-
🤖 [Flux.2] muestra: olla con cebolla acitronándose
|
| 73 |
-
🤖 [Voz OpenBMB] "Pica la cebolla en cubitos chicos y ponla en aceite caliente."
|
| 74 |
-
👩 (cocinando, manos sucias)
|
| 75 |
-
👩 "¿Cuánto tiempo?"
|
| 76 |
-
🤖 [Voz] "Hasta que se vea transparente. Como 4 minutos."
|
| 77 |
-
```
|
| 78 |
-
|
| 79 |
-
### 3. *"¿Voy bien?"* (visión en loop)
|
| 80 |
-
```
|
| 81 |
-
👩 (toma foto del sartén con cebolla)
|
| 82 |
-
🤖 [MiniCPM-V] compara contra imagen objetivo.
|
| 83 |
-
🤖 [Voz Cohere — el "tip-giver"] "Le falta un poquito. Súbele 1 minuto más, está bien."
|
| 84 |
-
```
|
| 85 |
-
|
| 86 |
-
### 4. *"No tengo cilantro"* (replan adaptativo)
|
| 87 |
-
```
|
| 88 |
-
👩 "No tengo cilantro."
|
| 89 |
-
🤖 [LLM] re-planea sobre la marcha.
|
| 90 |
-
🤖 [Voz] "No pasa nada. Le ponemos perejil o nada. Sigue siendo tinga."
|
| 91 |
-
🤖 [Flux.2] regenera la foto del plato final, ahora sin cilantro.
|
| 92 |
-
```
|
| 93 |
-
|
| 94 |
-
Las 4 historias usan los **mismos 5 modelos**. Una sola pipeline.
|
| 95 |
-
|
| 96 |
-
---
|
| 97 |
-
|
| 98 |
-
## Por qué este plan **gana** este hackathon
|
| 99 |
-
|
| 100 |
-
### 1. "Build for someone you actually know" → Backyard AI track
|
| 101 |
-
La descripción literal del track dice: *"Solve a real problem for someone you actually know. Pick a person — a neighbor, a parent, a small-business owner..."*. Tu mamá. Tu hermana. Tu hermano que vive solo. **Todos** cocinan. Pocas apps de hackathon van a tener un usuario tan cercano y tan recurrente.
|
| 102 |
-
|
| 103 |
-
### 2. Aprovecha **todos** los assets sponsor sin copiar templates
|
| 104 |
-
| Asset | Cómo se usa |
|
| 105 |
-
|---|---|
|
| 106 |
-
| **Flux.2 Klein 9B** (sponsor) | Genera la imagen-objetivo del platillo + "esto debes ver" en cada paso · i2i para ajustes |
|
| 107 |
-
| **MiniCPM-V** (OpenBMB) | Visión: identifica ingredientes + valida progreso ("¿voy bien?") |
|
| 108 |
-
| **MiniCPM razonamiento** (OpenBMB) | Recipe Planner: arma receta + replan adaptativo |
|
| 109 |
-
| **OpenBMB voice / TTS** | Voz principal del sous-chef (cálida, paciente) |
|
| 110 |
-
| **Cohere Labs voice** (sponsor) | Segunda voz: tips, advertencias ("¡cuidado, se quema!") |
|
| 111 |
-
| **Whisper-tiny** | STT: preguntas hands-free mientras cocinas |
|
| 112 |
-
| **Gradio Workflows** | UI de nodos visible: Vision → Planner → Illustrator → Narrator → Validator |
|
| 113 |
-
| **Modal $250** | Hostea Flux.2 en GPU + dataset sintético + LoRA fine-tune |
|
| 114 |
-
| **OpenAI Codex $100** | Pair-programmer y generador de dataset de recetas |
|
| 115 |
-
|
| 116 |
-
Todos los sponsors tocados. Cero ideas copiadas.
|
| 117 |
-
|
| 118 |
-
### 3. **Innovación técnica concreta**: el bucle visual cerrado
|
| 119 |
-
La mayoría de "recipe apps" del mundo son listas de pasos. Cocina Conmigo introduce un **closed-loop visual**:
|
| 120 |
-
|
| 121 |
-
```
|
| 122 |
-
[Flux.2 muestra paso ideal] ──▶ [Usuario cocina]
|
| 123 |
-
▲ │
|
| 124 |
-
│ ▼
|
| 125 |
-
[LLM ajusta plan] ◀── [MiniCPM-V valida foto del usuario]
|
| 126 |
-
```
|
| 127 |
-
|
| 128 |
-
Esto es un agente real, no un wrapper. Best Agent badge en juego.
|
| 129 |
-
|
| 130 |
-
### 4. Demo apetitoso = video viral
|
| 131 |
-
Persona real cocinando + voz cálida + ilustraciones live + "¡me quedó igual!" + plato final que se come frente a la cámara. Best Demo + Community Choice por inercia. **Nadie va a recordar la submission #14 de "voice storyteller"; van a recordar el video donde tu mamá hace tinga con AI.**
|
| 132 |
-
|
| 133 |
-
### 5. Diferenciación cultural sostenible
|
| 134 |
-
- **Español-mexicano-first** — diferenciador en hackathon US-céntrico.
|
| 135 |
-
- **Cocina mexicana** como dataset de fine-tune — territorio que pocos van a tocar.
|
| 136 |
-
- "Para mi mamá" como historia: emocional + universal.
|
| 137 |
-
|
| 138 |
-
---
|
| 139 |
-
|
| 140 |
-
## Arquitectura (resumen — ver `arquitectura.html`)
|
| 141 |
-
|
| 142 |
-
5 nodos en un Gradio Workflow visible:
|
| 143 |
-
|
| 144 |
-
```
|
| 145 |
-
[📸/🎙️ Input] ──▶ [👁️ Vision MiniCPM-V] ──▶ [🧠 Recipe Planner] ──▶ [🎨 Step Illustrator Flux.2]
|
| 146 |
-
│
|
| 147 |
-
▼
|
| 148 |
-
[🔊 Sous-Chef Narrator OpenBMB] + [🎭 Tip-Giver Cohere]
|
| 149 |
-
│
|
| 150 |
-
▼
|
| 151 |
-
[✅ Progress Validator] ──▶ loop al usuario
|
| 152 |
-
```
|
| 153 |
-
|
| 154 |
-
| Nodo | Modelo | Tamaño | Rol |
|
| 155 |
-
|---|---|---|---|
|
| 156 |
-
| Vision In | MiniCPM-V 2.6 / 4 (Q4 GGUF) | ~2-4B | Identifica ingredientes + valida progreso |
|
| 157 |
-
| Planner | MiniCPM-4 4B (LoRA en cocina mexicana) | ~4B | Genera receta JSON estructurado · replan |
|
| 158 |
-
| Illustrator | Flux.2 Klein 9B (Modal GPU) | 9B | Imagen final + paso-a-paso, i2i para consistencia |
|
| 159 |
-
| Narrator | OpenBMB voice / Kokoro | ~1B | Voz principal: instrucciones |
|
| 160 |
-
| Tip-Giver | Cohere Labs voice | ~1B | Segunda voz: warnings, encouragement |
|
| 161 |
-
| STT (opcional) | Whisper-tiny | ~40M | "¿voy bien?" "¿cuánto tiempo?" |
|
| 162 |
-
|
| 163 |
-
**Total: ~17B parámetros** (cap 32B ✓)
|
| 164 |
-
|
| 165 |
-
**Donde corre:**
|
| 166 |
-
- Vision, Planner, voces, STT → CPU del HF Space (llama.cpp + bindings ligeros)
|
| 167 |
-
- **Flux.2 → endpoint Modal con GPU L4** (no aguanta CPU del Space)
|
| 168 |
-
|
| 169 |
-
> Mismo tradeoff que los planes anteriores: **rompemos Off the Grid** intencionalmente para preservar calidad de imagen y latencia. A cambio calificamos para Modal Awards.
|
| 170 |
-
|
| 171 |
-
---
|
| 172 |
-
|
| 173 |
-
## Badges objetivo (5/6)
|
| 174 |
-
|
| 175 |
-
| Badge | Cómo |
|
| 176 |
-
|---|---|
|
| 177 |
-
| ✓ **Llama Champion** | Vision + Planner via `llama-cpp-python` con GGUF Q4 |
|
| 178 |
-
| ✓ **Well-Tuned** | LoRA del Planner en dataset de cocina mexicana, publicado en HF |
|
| 179 |
-
| ✓ **Off-Brand** | UI estilo "tarjeta de receta" + modo cocina hands-free, no parece Gradio default |
|
| 180 |
-
| ✓ **Sharing is Caring** | Dataset de recetas mexicanas + agent traces + recetas generadas, todo al Hub |
|
| 181 |
-
| ✓ **Field Notes** | Blog: "Le construí un sous-chef a mi mamá" |
|
| 182 |
-
| ✗ **Off the Grid** | Sacrificio consciente: Flux.2 corre en Modal |
|
| 183 |
-
|
| 184 |
-
5 badges + Modal-powered fuerte = competitivo para **Bonus Quest Champion ($2K)**.
|
| 185 |
-
|
| 186 |
-
---
|
| 187 |
-
|
| 188 |
-
## Premios objetivo (proyección)
|
| 189 |
-
|
| 190 |
-
| Premio | Probabilidad | Por qué |
|
| 191 |
-
|---|---|---|
|
| 192 |
-
| **Backyard AI Track** ($1K–$4K) | **Alta** | Idea es texto literal del track. Demo emocional. |
|
| 193 |
-
| **Modal Awards** ($3K–$10K credits) | **Alta** | Flux en Modal en runtime + entrenamiento offline. Modal-powered de manual. |
|
| 194 |
-
| **OpenBMB Award** ($1K–$2.5K) | **Alta** | Usa modelos OpenBMB en 3 roles (vision, planner, voice) sin copiar template |
|
| 195 |
-
| **Best Demo** ($1K) | **Alta** | Persona cocinando + comida final + voz = video apetitoso |
|
| 196 |
-
| **Community Choice** ($2K) | **Alta** | Apela a memoria emocional universal (tu mamá cocinando) |
|
| 197 |
-
| **Bonus Quest Champion** ($2K) | Media-alta | 5/6 badges es competitivo |
|
| 198 |
-
| **Best Agent** ($1K) | Media-alta | Closed-loop multi-agent real (5 agentes) |
|
| 199 |
-
| **Off-Brand** ($1.5K) | Media | UI tarjeta-de-receta tiene buenas chances |
|
| 200 |
-
| **Tiny Titan** ($1.5K) | Baja | Flux.2 9B nos saca del rango ≤4B |
|
| 201 |
-
|
| 202 |
-
**Cota razonable acumulada:** $5K–$12K cash + $3K–$10K Modal credits.
|
| 203 |
-
|
| 204 |
-
---
|
| 205 |
-
|
| 206 |
-
## Las 3 condiciones que pone Idea.md
|
| 207 |
-
|
| 208 |
-
| Condición | Cómo se cumple |
|
| 209 |
-
|---|---|
|
| 210 |
-
| **Innovador** | Closed-loop visual (Flux genera ideal → usuario cocina → vision valida → planner ajusta) — no existe en apps de receta |
|
| 211 |
-
| **Fresco** | Combina Flux.2 (nuevo) + Workflows (lanzado ayer) + voces multi-sponsor + cocina hands-free. Ninguna submission tendrá esa combinación. |
|
| 212 |
-
| **Útil** | Cocinar es diario, universal, recurrente. La app reemplaza Google + YouTube + adivinar. |
|
| 213 |
-
|
| 214 |
-
---
|
| 215 |
-
|
| 216 |
-
## Decisiones que tienes que tomar tú
|
| 217 |
-
|
| 218 |
-
| Decisión | Recomendación |
|
| 219 |
-
|---|---|
|
| 220 |
-
| ¿Cocina Conmigo o Mi Espejo (outfits)? | **Cocina.** Menor riesgo técnico (Flux generando platos > generando personas reales con ropa). Más universal. |
|
| 221 |
-
| ¿Cocina mexicana o cocina general? | **Mexicana.** Diferenciador + fine-tune en dataset acotado y rico. |
|
| 222 |
-
| ¿Persona real para el demo? | **Sí, no negociable.** Tu mamá, tu pareja, tu vecina. Que coma frente a la cámara al final. |
|
| 223 |
-
| ¿Empiezas con texto o con voz/foto? | **Empieza con foto del refri + texto.** Voz se agrega en Día 7-9. |
|
| 224 |
-
| ¿Cuántos pasos por receta? | 5-7 pasos. Más es muy largo para el demo, menos no es una receta. |
|
| 225 |
-
|
| 226 |
-
---
|
| 227 |
-
|
| 228 |
-
## Plan B — alternativa "Mi Espejo"
|
| 229 |
-
|
| 230 |
-
Si por cualquier razón Cocina Conmigo no avanza (ej. Flux.2 genera platillos feos consistentemente), pivota a **"Mi Espejo"** (refinamiento de tu idea #5):
|
| 231 |
-
|
| 232 |
-
- 📸 Subes foto tuya + fotos del armario.
|
| 233 |
-
- 🧠 Stylist LLM combina outfits según ocasión + tendencia.
|
| 234 |
-
- 🎨 **Flux.2 i2i te genera vistiendo cada combinación.**
|
| 235 |
-
- 🔊 Voz comenta el look.
|
| 236 |
-
|
| 237 |
-
Mismas badges, mismo track (Backyard), pero más alto wow visual y más alto riesgo (uncanny valley con personas reales). **Es plan B**, no plan A.
|
| 238 |
-
|
| 239 |
-
---
|
| 240 |
-
|
| 241 |
-
## Siguiente paso
|
| 242 |
-
|
| 243 |
-
Lee **`estrategia.md`** (timeline 10 días, gasto Modal/Codex, riesgos+mitigaciones, snippets) y **`arquitectura.html`** (diagrama del sistema + las 4 historias del demo + Workflow visual). Luego abre Codex CLI y haz el "hola mundo" del Día 1: un endpoint Modal que devuelve una imagen Flux.2 de un platillo dado un nombre de receta.
|
| 244 |
-
|
| 245 |
-
> *"Cocinar es la última cosa que la IA debería poder ayudarte a hacer bien. Y por eso es la mejor cosa que puedes ganar haciendo."*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Strategy/plan_implementacion.md
DELETED
|
@@ -1,674 +0,0 @@
|
|
| 1 |
-
# Implementation Plan — "Cook With Me"
|
| 2 |
-
|
| 3 |
-
> Step-by-step implementation guide for developers building the multimodal cooking sous-chef Gradio app for Hugging Face Spaces.
|
| 4 |
-
>
|
| 5 |
-
> **Hackathon:** Small models / Big adventures — June 2026
|
| 6 |
-
> **Read first:** `plan.md` (the *what* and *why*) and `estrategia.md` (the *how* at a strategic level). This document is the *how* at a tactical level — turn this into code.
|
| 7 |
-
|
| 8 |
-
---
|
| 9 |
-
|
| 10 |
-
## 0. Locked decisions (do not re-discuss)
|
| 11 |
-
|
| 12 |
-
| Decision | Value | Reason |
|
| 13 |
-
|---|---|---|
|
| 14 |
-
| UI framework | **Gradio** | Hackathon requirement |
|
| 15 |
-
| Hosting | **Hugging Face Space** | Hackathon requirement |
|
| 16 |
-
| Inference runtime (text + vision) | **llama.cpp** via `llama-cpp-python` | Runs inside the Space CPU, no external APIs needed for now. Future: migrate to Modal |
|
| 17 |
-
| Image generation | **FLUX.2 Klein 9B** (`black-forest-labs/FLUX.2-klein-9B`) | Sponsor model; runs in the Space if a GPU Space is rented (or via `enable_model_cpu_offload()` as fallback). Plan to migrate this specific component to Modal post-hackathon |
|
| 18 |
-
| Recipe planner / reasoning | **`openbmb/MiniCPM-V-4`** (GGUF) | Provided requirement |
|
| 19 |
-
| Vision (ingredient ID + progress validator) | **`openbmb/MiniCPM-V-4.6`** (GGUF) | Provided requirement |
|
| 20 |
-
| Text-to-speech | **OpenBMB VoxCPM2** | Provided requirement |
|
| 21 |
-
| Recipe dataset | **`thedevastator/better-recipes-for-a-better-life`** (Kaggle) — international cuisine | Provided requirement; not limited to Mexican food |
|
| 22 |
-
| App language | **English only** | Provided requirement |
|
| 23 |
-
| Final output | **Recipe + step images + voice + nutritional values** | Provided requirement |
|
| 24 |
-
| External API calls at runtime | **None** | "llama.cpp inside the Space" mandate |
|
| 25 |
-
|
| 26 |
-
---
|
| 27 |
-
|
| 28 |
-
## 1. Architecture (final, English-only, llama.cpp-first)
|
| 29 |
-
|
| 30 |
-
```
|
| 31 |
-
┌──────────────────────────────────────┐
|
| 32 |
-
│ Hugging Face Space (Gradio) │
|
| 33 |
-
│ (CPU + optional GPU upgrade) │
|
| 34 |
-
├──────────────────────────────────────┤
|
| 35 |
-
📸 Fridge photo ─────▶│ [Vision Agent] │
|
| 36 |
-
│ MiniCPM-V-4.6 GGUF (llama.cpp) │
|
| 37 |
-
│ → list[ingredient] │
|
| 38 |
-
│ │ │
|
| 39 |
-
│ ▼ │
|
| 40 |
-
🥘 User picks dish ───▶│ [Recipe Planner] │
|
| 41 |
-
│ MiniCPM-V-4 GGUF (llama.cpp) │
|
| 42 |
-
│ + retrieval over Kaggle dataset │
|
| 43 |
-
│ → Recipe JSON (steps, nutrition) │
|
| 44 |
-
│ │ │
|
| 45 |
-
│ ▼ │
|
| 46 |
-
│ [Step Illustrator] │
|
| 47 |
-
│ FLUX.2 Klein 9B (diffusers) │
|
| 48 |
-
│ → PNG per step + final dish │
|
| 49 |
-
│ │ │
|
| 50 |
-
│ ▼ │
|
| 51 |
-
│ [Narrator] │
|
| 52 |
-
│ VoxCPM2 → MP3 per step │
|
| 53 |
-
│ │ │
|
| 54 |
-
│ ▼ │
|
| 55 |
-
📸 Progress photo ────▶│ [Progress Validator] │
|
| 56 |
-
│ MiniCPM-V-4.6 (vision compare) │
|
| 57 |
-
│ → "go / wait / fix" + tip │
|
| 58 |
-
└──────────────────────────────────────┘
|
| 59 |
-
```
|
| 60 |
-
|
| 61 |
-
**Total parameter count (≤ 32B requirement):**
|
| 62 |
-
- MiniCPM-V-4 (reasoning) ≈ 4B
|
| 63 |
-
- MiniCPM-V-4.6 (vision) ≈ 4.6B
|
| 64 |
-
- FLUX.2 Klein ≈ 9B
|
| 65 |
-
- VoxCPM2 ≈ 1B (estimate)
|
| 66 |
-
- **Total ≈ 18.6B ✓**
|
| 67 |
-
|
| 68 |
-
---
|
| 69 |
-
|
| 70 |
-
## 2. Repository layout
|
| 71 |
-
|
| 72 |
-
```
|
| 73 |
-
cook-with-me/
|
| 74 |
-
├── app.py # Gradio entrypoint (Space looks for this)
|
| 75 |
-
├── requirements.txt
|
| 76 |
-
├── packages.txt # apt packages (ffmpeg, libsndfile1)
|
| 77 |
-
├── README.md # Space card (HF requires YAML frontmatter)
|
| 78 |
-
├── .gitignore
|
| 79 |
-
├── src/
|
| 80 |
-
│ ├── __init__.py
|
| 81 |
-
│ ├── config.py # paths, model IDs, constants
|
| 82 |
-
│ ├── models/
|
| 83 |
-
│ │ ├── __init__.py
|
| 84 |
-
│ │ ├── vision.py # MiniCPM-V-4.6 wrapper (llama-cpp)
|
| 85 |
-
│ │ ├── planner.py # MiniCPM-V-4 wrapper (llama-cpp)
|
| 86 |
-
│ │ ├── illustrator.py # FLUX.2 Klein wrapper (diffusers)
|
| 87 |
-
│ │ ├── narrator.py # VoxCPM2 wrapper
|
| 88 |
-
│ │ └── loader.py # lazy singletons + GGUF download
|
| 89 |
-
│ ├── agents/
|
| 90 |
-
│ │ ├── mise_en_place.py # ingredient identification
|
| 91 |
-
│ │ ├── recipe_planner.py # builds Recipe object
|
| 92 |
-
│ │ ├── step_illustrator.py # per-step image gen
|
| 93 |
-
│ │ ├── narrator.py # per-step TTS
|
| 94 |
-
│ │ └── progress_validator.py
|
| 95 |
-
│ ├── data/
|
| 96 |
-
│ │ ├── recipe_index.py # loads Kaggle dataset, builds retrieval
|
| 97 |
-
│ │ └── nutrition.py # USDA-style nutrition computation
|
| 98 |
-
│ ├── pipeline.py # Recipe state machine, orchestration
|
| 99 |
-
│ ├── prompts/
|
| 100 |
-
│ │ ├── vision_prompt.txt
|
| 101 |
-
│ │ ├── planner_system.txt
|
| 102 |
-
│ │ └── validator_prompt.txt
|
| 103 |
-
│ └── ui/
|
| 104 |
-
│ ├── theme.py # custom CSS (Off-Brand badge)
|
| 105 |
-
│ └── components.py # reusable Gradio Blocks pieces
|
| 106 |
-
├── scripts/
|
| 107 |
-
│ ├── download_models.py # pre-warms GGUF + Flux weights at build time
|
| 108 |
-
│ ├── build_recipe_index.py # caches Kaggle dataset locally
|
| 109 |
-
│ └── smoke_test.py # end-to-end validation before push
|
| 110 |
-
└── assets/
|
| 111 |
-
├── sample_fridge_1.jpg
|
| 112 |
-
└── sample_progress_1.jpg
|
| 113 |
-
```
|
| 114 |
-
|
| 115 |
-
---
|
| 116 |
-
|
| 117 |
-
## 3. Phase-by-phase plan (10 days)
|
| 118 |
-
|
| 119 |
-
> Each phase has: **goal**, **tasks**, **deliverable**, **verification check**. Do not move to the next phase if verification fails.
|
| 120 |
-
|
| 121 |
-
---
|
| 122 |
-
|
| 123 |
-
### Phase 0 — Day 0 (½ day): Account + tooling setup
|
| 124 |
-
|
| 125 |
-
**Goal:** every credential and CLI is ready before writing code.
|
| 126 |
-
|
| 127 |
-
**Tasks**
|
| 128 |
-
1. Create or confirm Hugging Face account; generate a **write token** (Settings → Access Tokens). Store as `HF_TOKEN` env var locally.
|
| 129 |
-
2. Install Hugging Face CLI: `pip install -U huggingface_hub` then `huggingface-cli login`.
|
| 130 |
-
3. Install Kaggle CLI: `pip install kaggle`. Place `kaggle.json` (Account → API → Create New Token) in `~/.kaggle/kaggle.json` with `chmod 600`.
|
| 131 |
-
4. Install OpenAI Codex CLI (pair-programmer) and verify your $100 credit is active.
|
| 132 |
-
5. Install local Python 3.11 venv: `python -m venv .venv && source .venv/bin/activate`.
|
| 133 |
-
6. Create the repo locally: `git init cook-with-me && cd cook-with-me`.
|
| 134 |
-
7. Create an empty Hugging Face Space: huggingface.co → New Space → SDK = **Gradio**, Hardware = **CPU basic** (upgrade later if you need GPU for FLUX). Clone it and copy your repo skeleton into it.
|
| 135 |
-
8. Verify model availability: open in a browser and confirm pages exist:
|
| 136 |
-
- `huggingface.co/openbmb/MiniCPM-V-4`
|
| 137 |
-
- `huggingface.co/openbmb/MiniCPM-V-4-6`
|
| 138 |
-
- `huggingface.co/openbmb/VoxCPM2` (or whatever the exact repo name is — search "VoxCPM" on HF)
|
| 139 |
-
- `huggingface.co/black-forest-labs/FLUX.2-klein-9B`
|
| 140 |
-
|
| 141 |
-
**Deliverable:** empty Space deployed showing "Hello World" Gradio.
|
| 142 |
-
|
| 143 |
-
**Verify:** `https://huggingface.co/spaces/<you>/cook-with-me` loads.
|
| 144 |
-
|
| 145 |
-
---
|
| 146 |
-
|
| 147 |
-
### Phase 1 — Day 1: Project skeleton + recipe dataset ingestion
|
| 148 |
-
|
| 149 |
-
**Goal:** the Kaggle dataset is downloaded, parsed, and cached as a local artifact ready for retrieval.
|
| 150 |
-
|
| 151 |
-
**Tasks**
|
| 152 |
-
1. Write `requirements.txt` (initial version — packages will be added as phases progress):
|
| 153 |
-
```text
|
| 154 |
-
gradio>=4.44
|
| 155 |
-
huggingface_hub>=0.24
|
| 156 |
-
llama-cpp-python>=0.3.2
|
| 157 |
-
numpy
|
| 158 |
-
pandas
|
| 159 |
-
Pillow
|
| 160 |
-
pydantic>=2
|
| 161 |
-
sentence-transformers
|
| 162 |
-
```
|
| 163 |
-
2. Write `packages.txt`:
|
| 164 |
-
```text
|
| 165 |
-
ffmpeg
|
| 166 |
-
libsndfile1
|
| 167 |
-
```
|
| 168 |
-
3. Write `scripts/build_recipe_index.py`:
|
| 169 |
-
- Use `kagglehub.load_dataset(KaggleDatasetAdapter.PANDAS, "thedevastator/better-recipes-for-a-better-life", file_path)` — discover `file_path` by listing the dataset files first via `kagglehub.dataset_download`.
|
| 170 |
-
- Normalize columns: `name`, `ingredients` (list[str]), `instructions` (list[str]), `cuisine` (str if present, else "international"), `prep_time`, `servings`.
|
| 171 |
-
- Drop rows missing critical fields. Lowercase + strip ingredient strings.
|
| 172 |
-
- Save to `data/recipes.parquet` (~5–50MB depending on dataset size).
|
| 173 |
-
- Build sentence embeddings of the recipe **name + first 3 ingredients** using `sentence-transformers/all-MiniLM-L6-v2` and save to `data/recipes_emb.npy`.
|
| 174 |
-
- This script runs **once locally**; commit the parquet + npy files to the repo (or to a private HF Dataset, then download in `app.py`). If files exceed 100MB, push to a HF Dataset repo: `<you>/cook-with-me-recipes`.
|
| 175 |
-
4. Write `src/data/recipe_index.py`:
|
| 176 |
-
- `class RecipeIndex` with `.search(ingredients: list[str], top_k=5) -> list[RecipeRow]`.
|
| 177 |
-
- Build a query string from ingredients, embed it, cosine-similarity against the cached embeddings, return top-k.
|
| 178 |
-
|
| 179 |
-
**Deliverable:** `python -c "from src.data.recipe_index import RecipeIndex; r=RecipeIndex(); print(r.search(['chicken','onion','tomato']))"` prints 5 sensible recipes.
|
| 180 |
-
|
| 181 |
-
**Verify:** at least 3 of the top-5 results contain ≥2 of the input ingredients.
|
| 182 |
-
|
| 183 |
-
---
|
| 184 |
-
|
| 185 |
-
### Phase 2 — Day 2: Vision agent (Mise en Place) — MiniCPM-V-4.6 via llama.cpp
|
| 186 |
-
|
| 187 |
-
**Goal:** given a fridge photo, return a clean list of English ingredient names.
|
| 188 |
-
|
| 189 |
-
**Background:** llama.cpp supports multimodal models through a vision projector (`mmproj-*.gguf`) plus the language model GGUF. MiniCPM-V family ships both files on the Hub.
|
| 190 |
-
|
| 191 |
-
**Tasks**
|
| 192 |
-
1. Find the GGUF release of MiniCPM-V-4.6. Search HF for `MiniCPM-V-4_6-gguf` or `openbmb/MiniCPM-V-4_6-gguf`. You need **two** files:
|
| 193 |
-
- `Model-Q4_K_M.gguf` (or similar quant)
|
| 194 |
-
- `mmproj-model-f16.gguf` (the vision projector)
|
| 195 |
-
2. Write `src/models/loader.py`:
|
| 196 |
-
```python
|
| 197 |
-
from huggingface_hub import hf_hub_download
|
| 198 |
-
from llama_cpp import Llama
|
| 199 |
-
from llama_cpp.llama_chat_format import MiniCPMv26ChatHandler # or matching handler
|
| 200 |
-
|
| 201 |
-
_vision = None
|
| 202 |
-
|
| 203 |
-
def get_vision_model():
|
| 204 |
-
global _vision
|
| 205 |
-
if _vision is None:
|
| 206 |
-
model_path = hf_hub_download(
|
| 207 |
-
repo_id="openbmb/MiniCPM-V-4_6-gguf", # confirm exact repo
|
| 208 |
-
filename="Model-Q4_K_M.gguf",
|
| 209 |
-
)
|
| 210 |
-
mmproj_path = hf_hub_download(
|
| 211 |
-
repo_id="openbmb/MiniCPM-V-4_6-gguf",
|
| 212 |
-
filename="mmproj-model-f16.gguf",
|
| 213 |
-
)
|
| 214 |
-
handler = MiniCPMv26ChatHandler(clip_model_path=mmproj_path)
|
| 215 |
-
_vision = Llama(
|
| 216 |
-
model_path=model_path,
|
| 217 |
-
chat_handler=handler,
|
| 218 |
-
n_ctx=4096,
|
| 219 |
-
n_threads=4,
|
| 220 |
-
verbose=False,
|
| 221 |
-
)
|
| 222 |
-
return _vision
|
| 223 |
-
```
|
| 224 |
-
3. Write `src/agents/mise_en_place.py`:
|
| 225 |
-
```python
|
| 226 |
-
import base64, io, json
|
| 227 |
-
from PIL import Image
|
| 228 |
-
from src.models.loader import get_vision_model
|
| 229 |
-
|
| 230 |
-
PROMPT = (
|
| 231 |
-
"You are an ingredient detector. Look at the fridge/pantry photo and "
|
| 232 |
-
"list every edible ingredient you can identify. Return strict JSON: "
|
| 233 |
-
'{"ingredients": ["chicken", "onion", "tomato", ...]} '
|
| 234 |
-
"Lowercase, English, no brand names, no containers."
|
| 235 |
-
)
|
| 236 |
-
|
| 237 |
-
def _img_to_data_url(img: Image.Image) -> str:
|
| 238 |
-
buf = io.BytesIO(); img.save(buf, "JPEG", quality=85)
|
| 239 |
-
b64 = base64.b64encode(buf.getvalue()).decode()
|
| 240 |
-
return f"data:image/jpeg;base64,{b64}"
|
| 241 |
-
|
| 242 |
-
def identify_ingredients(image: Image.Image) -> list[str]:
|
| 243 |
-
llm = get_vision_model()
|
| 244 |
-
out = llm.create_chat_completion(messages=[
|
| 245 |
-
{"role": "user", "content": [
|
| 246 |
-
{"type": "image_url", "image_url": {"url": _img_to_data_url(image)}},
|
| 247 |
-
{"type": "text", "text": PROMPT},
|
| 248 |
-
]}
|
| 249 |
-
], temperature=0.2, response_format={"type": "json_object"})
|
| 250 |
-
data = json.loads(out["choices"][0]["message"]["content"])
|
| 251 |
-
return [s.lower().strip() for s in data["ingredients"]]
|
| 252 |
-
```
|
| 253 |
-
4. Test locally with 5 sample fridge photos.
|
| 254 |
-
|
| 255 |
-
**Deliverable:** the function returns a non-empty English list with ≥80% precision on a clean fridge photo.
|
| 256 |
-
|
| 257 |
-
**Verify:** stash these 5 results in `tests/vision_smoke.json` for regression checks.
|
| 258 |
-
|
| 259 |
-
---
|
| 260 |
-
|
| 261 |
-
### Phase 3 — Day 3: Recipe Planner — MiniCPM-V-4 via llama.cpp + retrieval
|
| 262 |
-
|
| 263 |
-
**Goal:** given a list of ingredients (and optionally a chosen dish), return a fully structured `Recipe` JSON including steps, durations, visual descriptions, and nutritional values.
|
| 264 |
-
|
| 265 |
-
**Tasks**
|
| 266 |
-
1. Find or convert MiniCPM-V-4 to GGUF. Likely repo: `openbmb/MiniCPM-V-4-gguf` or community quants. Pick `Q4_K_M`.
|
| 267 |
-
2. Add to `src/models/loader.py` a `get_planner_model()` (same pattern as vision but without `chat_handler`).
|
| 268 |
-
3. Write `src/agents/recipe_planner.py`:
|
| 269 |
-
- **Step A — propose:** call planner with `Tengo: [ingredients]. Propose 3 dish options that fit. Reply JSON.`
|
| 270 |
-
- **Step B — retrieve:** for the chosen dish name, call `RecipeIndex.search(...)` and pick the closest match. Use it as a *grounded reference*.
|
| 271 |
-
- **Step C — restructure:** prompt the planner with both the user's available ingredients and the retrieved reference recipe, asking it to output the canonical `Recipe` JSON schema below. The retrieval grounds the model and prevents hallucinated steps.
|
| 272 |
-
- **Step D — nutrition:** from the recipe ingredients, compute approximate nutritional values per serving. See Phase 3.5.
|
| 273 |
-
4. Define the canonical schema in `src/pipeline.py` using Pydantic:
|
| 274 |
-
```python
|
| 275 |
-
from pydantic import BaseModel
|
| 276 |
-
from typing import Optional
|
| 277 |
-
|
| 278 |
-
class Step(BaseModel):
|
| 279 |
-
n: int
|
| 280 |
-
instruction: str # English, imperative
|
| 281 |
-
duration: str # "4 minutes"
|
| 282 |
-
visual: str # English visual description for FLUX prompt
|
| 283 |
-
tip: Optional[str] = None
|
| 284 |
-
|
| 285 |
-
class Nutrition(BaseModel):
|
| 286 |
-
calories: int # per serving
|
| 287 |
-
protein_g: float
|
| 288 |
-
carbs_g: float
|
| 289 |
-
fat_g: float
|
| 290 |
-
fiber_g: float
|
| 291 |
-
|
| 292 |
-
class Recipe(BaseModel):
|
| 293 |
-
name: str
|
| 294 |
-
cuisine: str
|
| 295 |
-
servings: int
|
| 296 |
-
total_time_minutes: int
|
| 297 |
-
options: list[dict] # only populated on "propose" call
|
| 298 |
-
ingredients_have: list[str]
|
| 299 |
-
ingredients_missing: list[str]
|
| 300 |
-
substitutes: dict[str, list[str]]
|
| 301 |
-
steps: list[Step]
|
| 302 |
-
final_dish_visual: str
|
| 303 |
-
nutrition_per_serving: Nutrition
|
| 304 |
-
```
|
| 305 |
-
5. Write the system prompt (`src/prompts/planner_system.txt`):
|
| 306 |
-
- Persona: international chef
|
| 307 |
-
- Hard rule: output JSON only, matching schema
|
| 308 |
-
- Hard rule: prefer dishes feasible with available ingredients
|
| 309 |
-
- Hard rule: 5–7 steps, each ≤ 25 words, each with a concrete `visual` field for image generation
|
| 310 |
-
- Hard rule: include `nutrition_per_serving` (model is allowed to estimate; you'll override with `data/nutrition.py` for accuracy)
|
| 311 |
-
6. Use `response_format={"type": "json_object"}` in the chat completion call. Set `temperature=0.7, top_p=0.95, enable_thinking=True` for the propose step (creative); `temperature=0.4` for the structured-output step (deterministic).
|
| 312 |
-
|
| 313 |
-
**Deliverable:** for `["chicken","onion","tomato","tortilla","cheese"]` and chosen dish "chicken tinga", the function returns a valid `Recipe` Pydantic object with 5–7 steps.
|
| 314 |
-
|
| 315 |
-
**Verify:** the JSON parses, each step has all required fields, and total inference time on Space CPU < 60 seconds.
|
| 316 |
-
|
| 317 |
-
---
|
| 318 |
-
|
| 319 |
-
### Phase 3.5 — Day 3 (afternoon): Nutritional values
|
| 320 |
-
|
| 321 |
-
**Goal:** the recipe ends with reliable per-serving nutrition (not hallucinated by the LLM).
|
| 322 |
-
|
| 323 |
-
**Approach:** small, embedded reference table beats LLM math.
|
| 324 |
-
|
| 325 |
-
**Tasks**
|
| 326 |
-
1. Bundle `data/nutrition_table.csv` — a 200-row CSV mapping common English ingredient names to per-100g macros (kcal, protein, carbs, fat, fiber). Source: USDA FoodData Central CSV download (free, public domain). Trim columns; commit to repo.
|
| 327 |
-
2. Write `src/data/nutrition.py`:
|
| 328 |
-
- `parse_quantity(line: str) -> (grams, ingredient_name)` — handle "2 cups flour", "200 g chicken", "1 tbsp olive oil". Use a small regex + a unit-to-grams table (cup=240, tbsp=15, tsp=5, oz=28.35).
|
| 329 |
-
- `compute_nutrition(ingredient_lines: list[str], servings: int) -> Nutrition` — sum per-100g values weighted by grams, divide by servings.
|
| 330 |
-
- If a line cannot be parsed, skip it and log; don't crash.
|
| 331 |
-
3. After the planner returns a recipe, **overwrite** `recipe.nutrition_per_serving` with the computed value. Keep the LLM's value only as a fallback when the parser yields zero.
|
| 332 |
-
|
| 333 |
-
**Deliverable:** for a known recipe (e.g., spaghetti with tomato sauce, 4 servings), computed calories per serving is within ±25% of online references.
|
| 334 |
-
|
| 335 |
-
---
|
| 336 |
-
|
| 337 |
-
### Phase 4 — Day 4: Step Illustrator — FLUX.2 Klein 9B
|
| 338 |
-
|
| 339 |
-
**Goal:** generate an appetizing image for the final dish + one image per step.
|
| 340 |
-
|
| 341 |
-
**Constraint:** FLUX.2 Klein on CPU is impractical; on a free Space CPU it would take ~10 minutes per image. Two paths:
|
| 342 |
-
- **Path A (recommended for the hackathon):** upgrade the Space to a GPU instance (T4 or A10G — paid, but $20 HF credits cover it for a week of development). Code stays unchanged.
|
| 343 |
-
- **Path B (fallback):** run FLUX in `enable_model_cpu_offload()` mode with `num_inference_steps=4` and accept ~3 min/image — only feasible for pre-rendered demo recipes, not live runs.
|
| 344 |
-
|
| 345 |
-
**Tasks**
|
| 346 |
-
1. Add to `requirements.txt`:
|
| 347 |
-
```text
|
| 348 |
-
diffusers>=0.31
|
| 349 |
-
transformers>=4.45
|
| 350 |
-
accelerate
|
| 351 |
-
torch
|
| 352 |
-
safetensors
|
| 353 |
-
```
|
| 354 |
-
2. Write `src/models/illustrator.py`:
|
| 355 |
-
```python
|
| 356 |
-
import torch
|
| 357 |
-
from diffusers import Flux2KleinPipeline
|
| 358 |
-
|
| 359 |
-
_pipe = None
|
| 360 |
-
|
| 361 |
-
def get_flux():
|
| 362 |
-
global _pipe
|
| 363 |
-
if _pipe is None:
|
| 364 |
-
dtype = torch.bfloat16
|
| 365 |
-
_pipe = Flux2KleinPipeline.from_pretrained(
|
| 366 |
-
"black-forest-labs/FLUX.2-klein-9B",
|
| 367 |
-
torch_dtype=dtype,
|
| 368 |
-
)
|
| 369 |
-
_pipe.enable_model_cpu_offload()
|
| 370 |
-
return _pipe
|
| 371 |
-
|
| 372 |
-
def render(prompt: str, seed: int = 0) -> "PIL.Image.Image":
|
| 373 |
-
pipe = get_flux()
|
| 374 |
-
device = "cuda" if torch.cuda.is_available() else "cpu"
|
| 375 |
-
img = pipe(
|
| 376 |
-
prompt=prompt,
|
| 377 |
-
height=1024, width=1024,
|
| 378 |
-
guidance_scale=1.0,
|
| 379 |
-
num_inference_steps=4,
|
| 380 |
-
generator=torch.Generator(device=device).manual_seed(seed),
|
| 381 |
-
).images[0]
|
| 382 |
-
return img
|
| 383 |
-
```
|
| 384 |
-
3. Write `src/agents/step_illustrator.py`:
|
| 385 |
-
- For each `Step.visual`, build a prompt like:
|
| 386 |
-
> `f"Top-down photo of a kitchen pan or plate showing {visual}. {cuisine} home cooking, warm natural lighting, recipe magazine style, photorealistic, appetizing."`
|
| 387 |
-
- Generate the **final dish image first**, then the per-step images, all in **one Python loop** (no parallelism — FLUX holds the GPU).
|
| 388 |
-
- Cache results on disk keyed by `hash(prompt)` to avoid re-renders on re-runs.
|
| 389 |
-
- Emit Gradio progress updates so the UI doesn't appear frozen.
|
| 390 |
-
4. **Critical tuning:** keep `num_inference_steps=4` (Klein is distilled). Higher counts blow latency and offer minimal quality gain at this scale.
|
| 391 |
-
|
| 392 |
-
**Deliverable:** for a 5-step recipe, all 6 images (final + 5 steps) render in:
|
| 393 |
-
- < 1 minute on T4 GPU Space
|
| 394 |
-
- < 8 minutes on CPU offload (acceptable only for pre-cached demos)
|
| 395 |
-
|
| 396 |
-
**Verify:** show the 6 images to an unprompted human; ≥4 should be described as "appetizing".
|
| 397 |
-
|
| 398 |
-
---
|
| 399 |
-
|
| 400 |
-
### Phase 5 — Day 5: Narrator — VoxCPM2
|
| 401 |
-
|
| 402 |
-
**Goal:** every step's instruction is rendered to an MP3 in a warm, clear English voice.
|
| 403 |
-
|
| 404 |
-
**Tasks**
|
| 405 |
-
1. Confirm the exact VoxCPM2 repo name on HF (`openbmb/VoxCPM2` or similar). Read its README for the inference snippet — TTS APIs vary widely between models.
|
| 406 |
-
2. Add to `requirements.txt`: `soundfile`, `torchaudio`, `numpy`. If VoxCPM2 ships GGUF, use it via `llama-cpp-python` audio extension (if available); otherwise load via `transformers` directly.
|
| 407 |
-
3. Write `src/models/narrator.py`:
|
| 408 |
-
```python
|
| 409 |
-
_tts = None
|
| 410 |
-
|
| 411 |
-
def get_tts():
|
| 412 |
-
global _tts
|
| 413 |
-
if _tts is None:
|
| 414 |
-
# placeholder — replace with the exact VoxCPM2 loading code from its README
|
| 415 |
-
from transformers import AutoModel, AutoProcessor
|
| 416 |
-
_tts = ... # load on CPU; VoxCPM2 is small (~1B)
|
| 417 |
-
return _tts
|
| 418 |
-
|
| 419 |
-
def synthesize(text: str, voice: str = "warm_female_en") -> bytes:
|
| 420 |
-
"""Returns MP3 bytes."""
|
| 421 |
-
tts = get_tts()
|
| 422 |
-
wav = tts.generate(text, voice=voice) # API depends on VoxCPM2
|
| 423 |
-
# encode wav -> mp3 with soundfile + ffmpeg-python or pydub
|
| 424 |
-
return mp3_bytes
|
| 425 |
-
```
|
| 426 |
-
4. Write `src/agents/narrator.py`:
|
| 427 |
-
- For each step, synthesize `step.instruction`. If `step.tip` is set, synthesize a separate "tip" clip.
|
| 428 |
-
- Save MP3 files in a per-recipe temp directory; return file paths to Gradio.
|
| 429 |
-
5. Pre-render all step audio when the recipe is finalized — never stream per-step in the demo (too much UI lag).
|
| 430 |
-
|
| 431 |
-
**Deliverable:** clicking "Play" on step 1 in the UI plays clear English narration.
|
| 432 |
-
|
| 433 |
-
**Verify:** on a 5-step recipe, total TTS rendering time < 30 seconds on CPU.
|
| 434 |
-
|
| 435 |
-
---
|
| 436 |
-
|
| 437 |
-
### Phase 6 — Day 6: Gradio UI (Off-Brand)
|
| 438 |
-
|
| 439 |
-
**Goal:** the Space looks like a recipe magazine, not stock Gradio.
|
| 440 |
-
|
| 441 |
-
**Tasks**
|
| 442 |
-
1. Write `src/ui/theme.py`:
|
| 443 |
-
```python
|
| 444 |
-
import gradio as gr
|
| 445 |
-
|
| 446 |
-
theme = gr.themes.Soft(
|
| 447 |
-
primary_hue="orange",
|
| 448 |
-
neutral_hue="stone",
|
| 449 |
-
font=[gr.themes.GoogleFont("Inter"), "sans-serif"],
|
| 450 |
-
font_mono=[gr.themes.GoogleFont("JetBrains Mono"), "monospace"],
|
| 451 |
-
)
|
| 452 |
-
|
| 453 |
-
CSS = """
|
| 454 |
-
.gradio-container { background: #f5ecd9 !important; }
|
| 455 |
-
.recipe-hero { background:#fffbf0; border-radius:14px; padding:28px; }
|
| 456 |
-
.recipe-hero h1 { font-family:'Lora',serif!important; font-size:36px!important; color:#6b4a2a!important; }
|
| 457 |
-
.step-card { background:#fffbf0; border-left:4px solid #a85c2a; border-radius:8px; padding:18px 22px; margin:12px 0; }
|
| 458 |
-
.nutri-grid { display:grid; grid-template-columns:repeat(5,1fr); gap:12px; margin-top:24px; }
|
| 459 |
-
.nutri-cell { background:#fffbf0; border:1px solid #d8c9ad; border-radius:10px; padding:12px; text-align:center; }
|
| 460 |
-
"""
|
| 461 |
-
```
|
| 462 |
-
2. Write `app.py` with three tabs:
|
| 463 |
-
- **Tab 1 — Cook**: fridge photo input → ingredient chips → 3 dish options → selected recipe card with hero image, steps (image + text + audio play button each), nutrition grid at the bottom.
|
| 464 |
-
- **Tab 2 — Check Progress**: upload a progress photo + select active step → validator returns badge (`go/wait/fix`) + tip + audio.
|
| 465 |
-
- **Tab 3 — About / Tech**: README-style explanation, badges, model list.
|
| 466 |
-
3. Use `gr.Blocks` with `gr.State` to hold the current `Recipe` Pydantic object across UI events. Serialize to/from `dict` since Pydantic objects don't survive Gradio state by default — wrap in `state.value = recipe.model_dump()`.
|
| 467 |
-
4. Wire callbacks:
|
| 468 |
-
- `btn_propose.click(fn=on_propose, inputs=[fridge_photo], outputs=[ingredient_chips, dish_options, state])`
|
| 469 |
-
- `dish_options.select(fn=on_pick_dish, inputs=[state, picked_dish], outputs=[recipe_card, hero_img, steps_column, nutrition_grid, state])`
|
| 470 |
-
- `progress_image.upload(fn=on_validate, inputs=[state, current_step_idx, progress_image], outputs=[verdict_md, tip_audio])`
|
| 471 |
-
|
| 472 |
-
**Deliverable:** end-to-end run from a sample fridge photo to a fully rendered recipe card with audio and nutrition. No Gradio default look anywhere.
|
| 473 |
-
|
| 474 |
-
---
|
| 475 |
-
|
| 476 |
-
### Phase 7 — Day 7: Progress Validator (closed loop)
|
| 477 |
-
|
| 478 |
-
**Goal:** user uploads a progress photo, app says "go / wait / fix" with a voiced tip.
|
| 479 |
-
|
| 480 |
-
**Tasks**
|
| 481 |
-
1. Write `src/agents/progress_validator.py`:
|
| 482 |
-
```python
|
| 483 |
-
PROMPT = """Compare these two cooking photos.
|
| 484 |
-
Photo 1 (target): how it should look after the step "{instruction}".
|
| 485 |
-
Photo 2 (user's pan/plate): the user's current progress.
|
| 486 |
-
Reply strict JSON: {"verdict": "go|wait|fix", "feedback": "...", "tip": "..."}
|
| 487 |
-
- "go": looks right, move to next step
|
| 488 |
-
- "wait": needs more time, do not change anything yet
|
| 489 |
-
- "fix": something is off; suggest a concrete adjustment in one sentence
|
| 490 |
-
"""
|
| 491 |
-
def validate(target_img, user_img, step_instruction): ...
|
| 492 |
-
```
|
| 493 |
-
2. Use the same vision model singleton as Phase 2 — both calls share weights.
|
| 494 |
-
3. Render the verdict as a colored badge (green/amber/red) and play the tip via VoxCPM2.
|
| 495 |
-
|
| 496 |
-
**Deliverable:** running the validator on 5 real progress photos returns the correct verdict on ≥3.
|
| 497 |
-
|
| 498 |
-
---
|
| 499 |
-
|
| 500 |
-
### Phase 8 — Day 8: Fine-tune the Planner on the Kaggle dataset (Well-Tuned badge)
|
| 501 |
-
|
| 502 |
-
> **Important caveat:** The user instruction says "for now keep inference on llama.cpp inside HF Space, future migration to Modal." Fine-tuning still **requires GPU**, so training itself happens on Modal (one-shot, offline) or on a rented Colab/Lambda GPU. Inference of the resulting model stays on llama.cpp inside the Space (as GGUF). This does **not** violate the runtime constraint — only the build pipeline touches a GPU.
|
| 503 |
-
|
| 504 |
-
**Goal:** publish a fine-tuned Planner GGUF to the Hub and load it from the Space.
|
| 505 |
-
|
| 506 |
-
**Tasks**
|
| 507 |
-
1. **Build SFT dataset** (`scripts/build_sft_dataset.py`):
|
| 508 |
-
- Load Kaggle `better-recipes` dataset.
|
| 509 |
-
- For each recipe, build a `(prompt, completion)` pair where `prompt` is `"Available ingredients: X, Y, Z. Propose recipe."` and `completion` is the full canonical `Recipe` JSON.
|
| 510 |
-
- Generate ~1000 pairs, push to `<you>/cook-with-me-sft` HF Dataset.
|
| 511 |
-
2. **LoRA training** (`scripts/train_planner.py` — to be run on a GPU machine, not the Space):
|
| 512 |
-
```python
|
| 513 |
-
# peft + trl SFTTrainer, base = openbmb/MiniCPM-V-4
|
| 514 |
-
# r=16, alpha=32, lr=2e-4, epochs=2, batch=4
|
| 515 |
-
# push_to_hub=True, hub_model_id="<you>/cook-with-me-planner-4b"
|
| 516 |
-
```
|
| 517 |
-
3. **Convert to GGUF** (Day 8 evening):
|
| 518 |
-
- Use `llama.cpp/convert_hf_to_gguf.py` then `quantize` to `Q4_K_M`.
|
| 519 |
-
- Push GGUF to `<you>/cook-with-me-planner-4b-gguf`.
|
| 520 |
-
4. Update `src/models/loader.py` to point at your GGUF instead of the base model.
|
| 521 |
-
|
| 522 |
-
**Deliverable:** the Space loads your fine-tuned Planner GGUF and produces JSON recipes that are noticeably better-formatted than the base model on a held-out test set.
|
| 523 |
-
|
| 524 |
-
---
|
| 525 |
-
|
| 526 |
-
### Phase 9 — Day 9: End-to-end test, performance pass, pre-warm cache
|
| 527 |
-
|
| 528 |
-
**Goal:** the Space loads in <60s and a full recipe (text + 5 images + 5 audios + nutrition) renders in <2 minutes on the chosen hardware.
|
| 529 |
-
|
| 530 |
-
**Tasks**
|
| 531 |
-
1. Write `scripts/smoke_test.py` that runs the full pipeline on 3 sample fridge photos and asserts:
|
| 532 |
-
- Each ingredient list is non-empty
|
| 533 |
-
- Each recipe has 5–7 steps
|
| 534 |
-
- Each step has a non-empty image and audio path
|
| 535 |
-
- Nutrition has all 5 macros set
|
| 536 |
-
2. Implement **on-disk caching** for FLUX outputs (key = SHA256 of prompt) so re-runs of the same recipe are instant. Save to `~/.cache/cook-with-me/flux/`.
|
| 537 |
-
3. Pre-render and commit **3 fully-prepared demo recipes** (chicken tinga, pasta carbonara, chicken tikka) so judges see results in <5s on first click.
|
| 538 |
-
4. Add error handling at every UI boundary: a model failure should display a friendly message, not a stack trace.
|
| 539 |
-
5. Add a "Loading models..." progress bar on first request — first cold start can take 90s.
|
| 540 |
-
|
| 541 |
-
**Deliverable:** smoke test passes on the live Space.
|
| 542 |
-
|
| 543 |
-
---
|
| 544 |
-
|
| 545 |
-
### Phase 10 — Day 10: README, demo video, social post, submit
|
| 546 |
-
|
| 547 |
-
**Tasks**
|
| 548 |
-
1. Write `README.md` with the required HF Space frontmatter:
|
| 549 |
-
```yaml
|
| 550 |
-
---
|
| 551 |
-
title: Cook With Me
|
| 552 |
-
emoji: 🍲
|
| 553 |
-
colorFrom: orange
|
| 554 |
-
colorTo: yellow
|
| 555 |
-
sdk: gradio
|
| 556 |
-
sdk_version: 4.44.0
|
| 557 |
-
app_file: app.py
|
| 558 |
-
pinned: false
|
| 559 |
-
license: apache-2.0
|
| 560 |
-
---
|
| 561 |
-
```
|
| 562 |
-
Followed by:
|
| 563 |
-
- One-paragraph pitch
|
| 564 |
-
- 60-second demo video embed
|
| 565 |
-
- Architecture diagram (export from `arquitectura.html` as PNG)
|
| 566 |
-
- Section: "How closed-loop visual cooking guidance works"
|
| 567 |
-
- Models used (with HF links + total parameter count)
|
| 568 |
-
- Badges declared
|
| 569 |
-
- Build / run instructions
|
| 570 |
-
2. Record a 60–90 second demo video: real person cooks a recipe end-to-end with the app guiding via voice, ending with the cooked plate on camera.
|
| 571 |
-
3. Write the Field Notes blog post: one of the engineering surprises (e.g., "FLUX.2 step images at 4 steps look better than 8 — here's why" or "Closed-loop validation needs the same vision model on both sides").
|
| 572 |
-
4. Social post on X / LinkedIn with the demo video.
|
| 573 |
-
5. Submit on the hackathon platform.
|
| 574 |
-
|
| 575 |
-
---
|
| 576 |
-
|
| 577 |
-
## 4. Tools usage matrix (when to reach for what)
|
| 578 |
-
|
| 579 |
-
| Phase | Primary tools | Why |
|
| 580 |
-
|---|---|---|
|
| 581 |
-
| 0 — setup | HF CLI, Kaggle CLI, OpenAI Codex CLI | one-shot config |
|
| 582 |
-
| 1 — data | `kagglehub`, `pandas`, `sentence-transformers` | offline dataset prep |
|
| 583 |
-
| 2 — vision | `llama-cpp-python` + `MiniCPMv26ChatHandler` | runs inside Space, badge: Llama Champion |
|
| 584 |
-
| 3 — planner | `llama-cpp-python` + retrieval over local parquet | grounded JSON output |
|
| 585 |
-
| 3.5 — nutrition | local CSV + regex parser | reliable, no LLM math |
|
| 586 |
-
| 4 — illustrator | `diffusers` + `Flux2KleinPipeline` | sponsor model showcase |
|
| 587 |
-
| 5 — narrator | VoxCPM2 via `transformers` (or its native API) | local TTS |
|
| 588 |
-
| 6 — UI | `gradio` + custom CSS theme | Off-Brand badge |
|
| 589 |
-
| 7 — validator | same vision singleton as phase 2 | closed-loop innovation, Best Agent |
|
| 590 |
-
| 8 — fine-tune | `peft`, `trl`, `llama.cpp` convert/quantize, on a GPU machine | Well-Tuned badge |
|
| 591 |
-
| 9 — test/cache | `pytest`, `hashlib`, on-disk FLUX cache | demo reliability |
|
| 592 |
-
| 10 — submit | HF Spaces, video tool, social | shipping |
|
| 593 |
-
|
| 594 |
-
---
|
| 595 |
-
|
| 596 |
-
## 5. Performance budget on the HF Space
|
| 597 |
-
|
| 598 |
-
| Operation | Target latency | Hardware needed |
|
| 599 |
-
|---|---|---|
|
| 600 |
-
| Vision: ingredient ID | < 8 s | CPU 4-thread |
|
| 601 |
-
| Planner: propose 3 dishes | < 12 s | CPU 4-thread |
|
| 602 |
-
| Planner: build full recipe JSON | < 20 s | CPU 4-thread |
|
| 603 |
-
| Nutrition computation | < 0.1 s | CPU |
|
| 604 |
-
| FLUX: 1 image (4 steps) | < 12 s on T4 / < 90 s on CPU offload | GPU strongly recommended |
|
| 605 |
-
| FLUX: 6 images (final + 5 steps) | < 80 s on T4 | GPU |
|
| 606 |
-
| VoxCPM2: 1 step narration | < 5 s | CPU |
|
| 607 |
-
| Validator: 1 progress check | < 8 s | CPU |
|
| 608 |
-
| **Full recipe end-to-end** | **< 2 min on T4 Space** | — |
|
| 609 |
-
|
| 610 |
-
**Hardware decision:** rent a T4 Space (~$0.40/hr) for the demo week. The $20 HF credits cover ~50 hours.
|
| 611 |
-
|
| 612 |
-
---
|
| 613 |
-
|
| 614 |
-
## 6. Risks and mitigations (delta from `estrategia.md`)
|
| 615 |
-
|
| 616 |
-
| Risk | Mitigation |
|
| 617 |
-
|---|---|
|
| 618 |
-
| MiniCPM-V-4 has no public GGUF | Convert yourself with `llama.cpp/convert_hf_to_gguf.py`. Allow a half-day buffer in Phase 2. |
|
| 619 |
-
| llama-cpp-python's MiniCPM-V chat handler version mismatch | Pin `llama-cpp-python==0.3.2` minimum; test the handler import on Day 2. If it fails, fall back to MiniCPM-V-2.6 GGUF (well-supported) for vision and document the swap. |
|
| 620 |
-
| FLUX.2 Klein 9B too slow on free CPU Space | Upgrade to a paid GPU Space (~$10 for the demo week). Document this in the README so judges expect it. |
|
| 621 |
-
| VoxCPM2 docs sparse | Drop to Kokoro-82M or Piper TTS as a backup. Lose the OpenBMB voice angle but keep the audio. |
|
| 622 |
-
| Kaggle dataset has format quirks (HTML in instructions, missing fields) | The Phase 1 normalization step handles this; budget 2 hours. |
|
| 623 |
-
| Nutrition CSV missing exotic ingredients | Skip-and-log strategy already designed; demo-day recipes use common ingredients only. |
|
| 624 |
-
| Total params >32B if VoxCPM2 turns out to be 7B | Check size in Phase 0; if too large, drop to a smaller TTS. |
|
| 625 |
-
|
| 626 |
-
---
|
| 627 |
-
|
| 628 |
-
## 7. "Day-1 hello world" checklist
|
| 629 |
-
|
| 630 |
-
Before writing any agent code, get this minimal end-to-end loop working — it proves your stack:
|
| 631 |
-
|
| 632 |
-
1. ☐ Empty Gradio Space deployed, shows "Hello"
|
| 633 |
-
2. ☐ `huggingface-cli login` works locally
|
| 634 |
-
3. ☐ `kaggle datasets download thedevastator/better-recipes-for-a-better-life` succeeds
|
| 635 |
-
4. ☐ `from llama_cpp import Llama` runs in your venv
|
| 636 |
-
5. ☐ Download one tiny GGUF (e.g., TinyLlama Q4) and call it from a Gradio textbox round-trip
|
| 637 |
-
6. ☐ Push the round-trip to the Space; confirm it answers in the cloud
|
| 638 |
-
|
| 639 |
-
**Only after all 6 are checked, start Phase 1.**
|
| 640 |
-
|
| 641 |
-
---
|
| 642 |
-
|
| 643 |
-
## 8. Where this plan differs from `estrategia.md` (deltas to communicate)
|
| 644 |
-
|
| 645 |
-
| Topic | `estrategia.md` (Spanish, Mexican-cuisine focus) | This document (current requirements) |
|
| 646 |
-
|---|---|---|
|
| 647 |
-
| Language | Spanish-first | **English only** |
|
| 648 |
-
| Cuisine | Mexican | **International** (Kaggle dataset) |
|
| 649 |
-
| Voice models | OpenBMB voice + Cohere Labs | **VoxCPM2** only (single voice) |
|
| 650 |
-
| Vision model | MiniCPM-V 2.6 / 4 | **MiniCPM-V-4.6** |
|
| 651 |
-
| Reasoning model | MiniCPM-4 4B | **MiniCPM-V-4** |
|
| 652 |
-
| FLUX runtime | Modal endpoint | **Inside Space (llama.cpp principle)**; Modal kept as a future migration target only |
|
| 653 |
-
| External APIs at runtime | Allowed (Modal, OpenAI optional) | **None** — full local inference inside Space |
|
| 654 |
-
| Nutritional info | Not specified | **Required** at end of recipe |
|
| 655 |
-
| Fine-tune dataset | 200 synthetic Mexican recipes | **Kaggle better-recipes (international)** |
|
| 656 |
-
|
| 657 |
-
If anything in `plan.md` or `estrategia.md` conflicts with this document, **this document wins** — it reflects the latest user requirements.
|
| 658 |
-
|
| 659 |
-
---
|
| 660 |
-
|
| 661 |
-
## 9. Definition of done
|
| 662 |
-
|
| 663 |
-
The implementation is complete when **all** of these are true:
|
| 664 |
-
|
| 665 |
-
- [ ] Public HF Space `https://huggingface.co/spaces/<you>/cook-with-me` loads
|
| 666 |
-
- [ ] App is fully in English
|
| 667 |
-
- [ ] Fridge photo → ingredient list → 3 dish options → full recipe with images, audio, and nutrition works end-to-end
|
| 668 |
-
- [ ] Progress validator returns sensible verdicts on 3+ test photos
|
| 669 |
-
- [ ] All inference (vision, planner, TTS) runs through llama.cpp / local diffusers — **no external API calls at runtime**
|
| 670 |
-
- [ ] Total parameters declared in README ≤ 32B
|
| 671 |
-
- [ ] Fine-tuned Planner GGUF published to HF Hub (Well-Tuned badge)
|
| 672 |
-
- [ ] Demo video (60–90s) recorded with a real person cooking
|
| 673 |
-
- [ ] Field Notes blog post published
|
| 674 |
-
- [ ] Submitted on the hackathon platform before deadline
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
app.py
CHANGED
|
@@ -1,5 +1,4 @@
|
|
| 1 |
import logging
|
| 2 |
-
logging.basicConfig(level=logging.INFO)
|
| 3 |
log = logging.getLogger(__name__)
|
| 4 |
|
| 5 |
from typing import Any
|
|
@@ -7,11 +6,12 @@ from typing import Any
|
|
| 7 |
import gradio as gr
|
| 8 |
from PIL import Image
|
| 9 |
|
|
|
|
| 10 |
from src.agents.mise_en_place import identify_ingredients
|
| 11 |
-
from src.agents.progress_validator import validate
|
| 12 |
-
from src.agents.recipe_planner import plan_recipe, propose_dishes
|
| 13 |
-
from src.
|
| 14 |
-
from src.
|
| 15 |
from src.ui.components import (
|
| 16 |
DishOptions,
|
| 17 |
IngredientChips,
|
|
@@ -19,265 +19,135 @@ from src.ui.components import (
|
|
| 19 |
RecipeHero,
|
| 20 |
StepCard,
|
| 21 |
VerdictBadge,
|
|
|
|
| 22 |
)
|
| 23 |
from src.ui.theme import CSS, theme
|
| 24 |
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
# Callbacks
|
| 28 |
-
# ---------------------------------------------------------------------------
|
| 29 |
-
|
| 30 |
-
def _clean_ingredients(items: list | None) -> list[str]:
|
| 31 |
-
"""Normalize a raw ingredient list (dedup, lowercase, strip empties)."""
|
| 32 |
-
out, seen = [], set()
|
| 33 |
-
for it in (items or []):
|
| 34 |
-
name = str(it).strip().lower()
|
| 35 |
-
if name and name not in seen:
|
| 36 |
-
seen.add(name)
|
| 37 |
-
out.append(name)
|
| 38 |
-
return out
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
def on_propose(fridge_image: Image.Image | None, state: dict | None):
|
| 42 |
-
"""Photo → ingredients → 3 dish options (and fill the editable list)."""
|
| 43 |
state = state or {}
|
| 44 |
-
if fridge_image is None:
|
| 45 |
-
return (
|
| 46 |
-
IngredientChips.render({}),
|
| 47 |
-
DishOptions.render({}),
|
| 48 |
-
gr.update(choices=[], value=None),
|
| 49 |
-
state,
|
| 50 |
-
gr.update(choices=[], value=[]),
|
| 51 |
-
)
|
| 52 |
-
|
| 53 |
ingredients = identify_ingredients(fridge_image)
|
| 54 |
-
options = propose_dishes(ingredients)
|
| 55 |
|
| 56 |
-
state.update({
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
|
| 61 |
-
radio_choices = [o.name for o in options]
|
| 62 |
-
return (
|
| 63 |
-
IngredientChips.render({"have": ingredients, "missing": []}),
|
| 64 |
-
DishOptions.render({"options": state["options"]}),
|
| 65 |
-
gr.update(choices=radio_choices, value=radio_choices[0] if radio_choices else None),
|
| 66 |
-
state,
|
| 67 |
-
gr.update(choices=ingredients, value=ingredients),
|
| 68 |
-
)
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
def on_update_ingredients(state: dict | None, ingredients: list | None):
|
| 72 |
-
"""Manual edit of the ingredient list → refresh chips + re-propose dishes."""
|
| 73 |
-
state = state or {}
|
| 74 |
-
ingredients = _clean_ingredients(ingredients)
|
| 75 |
-
state["ingredients_have"] = ingredients
|
| 76 |
-
|
| 77 |
-
if not ingredients:
|
| 78 |
-
state["options"] = []
|
| 79 |
-
return (
|
| 80 |
-
IngredientChips.render({}),
|
| 81 |
-
DishOptions.render({}),
|
| 82 |
-
gr.update(choices=[], value=None),
|
| 83 |
-
state,
|
| 84 |
-
)
|
| 85 |
|
| 86 |
-
options = propose_dishes(ingredients)
|
| 87 |
-
state["options"] = [o.model_dump() for o in options]
|
| 88 |
-
radio_choices = [o.name for o in options]
|
| 89 |
-
return (
|
| 90 |
-
IngredientChips.render({"have": ingredients, "missing": []}),
|
| 91 |
-
DishOptions.render({"options": state["options"]}),
|
| 92 |
-
gr.update(choices=radio_choices, value=radio_choices[0] if radio_choices else None),
|
| 93 |
-
state,
|
| 94 |
-
)
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
def on_cook(state: dict | None, dish_name: str | None, illustrate: bool, ingredients: list | None):
|
| 98 |
-
"""Chosen dish → full recipe + nutrition (+ FLUX images if requested)."""
|
| 99 |
-
state = state or {}
|
| 100 |
-
if not dish_name:
|
| 101 |
-
return (
|
| 102 |
-
RecipeHero.render({}),
|
| 103 |
-
StepCard.render({}),
|
| 104 |
-
NutritionGrid.render({"nutrition": {}}),
|
| 105 |
-
state,
|
| 106 |
-
)
|
| 107 |
-
|
| 108 |
-
# Prefer the (possibly hand-edited) ingredient list from the editor.
|
| 109 |
-
ingredients = _clean_ingredients(ingredients) or state.get("ingredients_have", [])
|
| 110 |
-
state["ingredients_have"] = ingredients
|
| 111 |
-
recipe = plan_recipe(dish_name, ingredients)
|
| 112 |
-
|
| 113 |
-
nutrition = compute_nutrition(ingredients, recipe.servings)
|
| 114 |
-
recipe.nutrition = nutrition
|
| 115 |
-
state["recipe"] = recipe.model_dump()
|
| 116 |
-
|
| 117 |
-
if illustrate:
|
| 118 |
-
log.info("Generating FLUX step images via Modal...")
|
| 119 |
-
recipe = illustrate_recipe(recipe)
|
| 120 |
-
state["recipe"] = recipe.model_dump()
|
| 121 |
-
|
| 122 |
-
return (
|
| 123 |
-
RecipeHero.render(recipe.model_dump()),
|
| 124 |
-
StepCard.render({"steps": [s.model_dump() for s in recipe.steps]}),
|
| 125 |
-
NutritionGrid.render({"nutrition": nutrition}),
|
| 126 |
-
state,
|
| 127 |
-
)
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
def on_validate(state: dict | None, step_idx: float, progress_image: Image.Image | None):
|
| 131 |
-
"""Progress photo + step number → verdict badge."""
|
| 132 |
-
state = state or {}
|
| 133 |
-
recipe = state.get("recipe", {})
|
| 134 |
-
steps = recipe.get("steps", [])
|
| 135 |
-
idx = max(0, int(step_idx) - 1)
|
| 136 |
-
instruction = steps[idx]["instruction"] if idx < len(steps) else "Cook the dish properly."
|
| 137 |
-
result = validate(progress_image, instruction)
|
| 138 |
-
return VerdictBadge.render(result)
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
# ---------------------------------------------------------------------------
|
| 142 |
-
# UI
|
| 143 |
-
# ---------------------------------------------------------------------------
|
| 144 |
|
|
|
|
|
|
|
|
|
|
| 145 |
def build_ui() -> gr.Blocks:
|
| 146 |
initial_state: dict[str, Any] = {}
|
| 147 |
|
| 148 |
-
with gr.Blocks(title="Cook With Me"
|
| 149 |
gr.Markdown(
|
| 150 |
"# 🍲 Cook With Me\n"
|
| 151 |
-
"
|
| 152 |
)
|
| 153 |
|
| 154 |
state = gr.State(initial_state)
|
| 155 |
|
| 156 |
with gr.Tabs():
|
| 157 |
-
# ---------------------------------------------------
|
| 158 |
-
|
| 159 |
-
# ----------------------------------------------------------------
|
| 160 |
-
with gr.Tab("🍳 Cook"):
|
| 161 |
with gr.Row():
|
| 162 |
-
# Left — inputs
|
| 163 |
with gr.Column(scale=1):
|
| 164 |
fridge_input = gr.Image(
|
| 165 |
label="📸 Photo of your fridge or pantry",
|
| 166 |
type="pil",
|
| 167 |
-
height=
|
| 168 |
)
|
| 169 |
-
propose_btn = gr.Button("
|
| 170 |
|
| 171 |
gr.Markdown("### Ingredients I see")
|
| 172 |
chips = gr.HTML(IngredientChips.render({}))
|
| 173 |
|
| 174 |
-
ingredient_editor = gr.Dropdown(
|
| 175 |
-
choices=[],
|
| 176 |
-
value=[],
|
| 177 |
-
multiselect=True,
|
| 178 |
-
allow_custom_value=True,
|
| 179 |
-
label="✏️ Add or remove ingredients (type + Enter to add, ✕ to remove)",
|
| 180 |
-
interactive=True,
|
| 181 |
-
)
|
| 182 |
-
update_btn = gr.Button("🔄 Update ingredients & dishes")
|
| 183 |
-
|
| 184 |
gr.Markdown("### Pick a dish")
|
| 185 |
-
|
| 186 |
-
dish_radio = gr.Radio(
|
| 187 |
-
choices=[],
|
| 188 |
-
label="Choose one",
|
| 189 |
-
interactive=True,
|
| 190 |
-
)
|
| 191 |
|
| 192 |
-
with gr.Accordion("
|
| 193 |
-
illustrate_chk = gr.Checkbox(
|
| 194 |
-
|
| 195 |
-
label="🎨 Generate step images with FLUX.2 (requires Modal deployment)",
|
| 196 |
-
)
|
| 197 |
|
| 198 |
-
cook_btn = gr.Button("
|
| 199 |
|
| 200 |
-
# Right — recipe output
|
| 201 |
with gr.Column(scale=2):
|
| 202 |
hero = gr.HTML(RecipeHero.render({}))
|
| 203 |
steps_panel = gr.HTML(StepCard.render({}))
|
| 204 |
nutrition_panel = gr.HTML(NutritionGrid.render({"nutrition": {}}))
|
| 205 |
|
| 206 |
-
# ----------------------------------------
|
| 207 |
-
|
| 208 |
-
|
| 209 |
-
with gr.Tab("📷 Check Progress"):
|
| 210 |
-
gr.Markdown(
|
| 211 |
-
"Upload a photo of your pan or plate. The vision model compares it "
|
| 212 |
-
"against the current recipe step and tells you if you can move on."
|
| 213 |
-
)
|
| 214 |
with gr.Row():
|
| 215 |
with gr.Column():
|
| 216 |
step_idx = gr.Number(value=1, precision=0, label="Active step #")
|
| 217 |
-
progress_input = gr.Image(
|
| 218 |
-
|
| 219 |
-
type="pil",
|
| 220 |
-
height=300,
|
| 221 |
-
)
|
| 222 |
-
validate_btn = gr.Button("✅ How am I doing?", variant="primary")
|
| 223 |
with gr.Column():
|
| 224 |
verdict_panel = gr.HTML(VerdictBadge.render({}))
|
|
|
|
| 225 |
|
| 226 |
-
# -------------------------------------------------
|
| 227 |
-
|
| 228 |
-
# ----------------------------------------------------------------
|
| 229 |
-
with gr.Tab("ℹ️ About"):
|
| 230 |
gr.Markdown(
|
| 231 |
"""
|
| 232 |
-
###
|
| 233 |
-
|
| 234 |
-
|
| 235 |
-
|
| 236 |
-
|
| 237 |
-
|
| 238 |
-
|
| 239 |
-
|
| 240 |
-
|
| 241 |
-
|
| 242 |
-
|
| 243 |
-
|
| 244 |
-
|
| 245 |
-
|
| 246 |
-
|
| 247 |
-
|
| 248 |
-
|
| 249 |
-
|
| 250 |
-
|
| 251 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 252 |
"""
|
| 253 |
)
|
| 254 |
|
| 255 |
-
# ----------------------------------------------------
|
| 256 |
-
# Wire callbacks
|
| 257 |
-
# --------------------------------------------------------------------
|
| 258 |
propose_btn.click(
|
| 259 |
fn=on_propose,
|
| 260 |
inputs=[fridge_input, state],
|
| 261 |
-
outputs=[chips,
|
| 262 |
-
|
| 263 |
-
|
| 264 |
-
update_btn.click(
|
| 265 |
-
fn=on_update_ingredients,
|
| 266 |
-
inputs=[state, ingredient_editor],
|
| 267 |
-
outputs=[chips, dish_options_html, dish_radio, state],
|
| 268 |
-
)
|
| 269 |
-
|
| 270 |
-
cook_btn.click(
|
| 271 |
-
fn=on_cook,
|
| 272 |
-
inputs=[state, dish_radio, illustrate_chk, ingredient_editor],
|
| 273 |
-
outputs=[hero, steps_panel, nutrition_panel, state],
|
| 274 |
-
)
|
| 275 |
-
|
| 276 |
-
validate_btn.click(
|
| 277 |
-
fn=on_validate,
|
| 278 |
-
inputs=[state, step_idx, progress_input],
|
| 279 |
-
outputs=[verdict_panel],
|
| 280 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 281 |
|
| 282 |
return demo
|
| 283 |
|
|
@@ -289,4 +159,6 @@ if __name__ == "__main__":
|
|
| 289 |
server_port=int(__import__("os").environ.get("PORT", 7860)),
|
| 290 |
show_error=True,
|
| 291 |
inbrowser=True,
|
| 292 |
-
|
|
|
|
|
|
|
|
|
| 1 |
import logging
|
|
|
|
| 2 |
log = logging.getLogger(__name__)
|
| 3 |
|
| 4 |
from typing import Any
|
|
|
|
| 6 |
import gradio as gr
|
| 7 |
from PIL import Image
|
| 8 |
|
| 9 |
+
# from src import config
|
| 10 |
from src.agents.mise_en_place import identify_ingredients
|
| 11 |
+
# from src.agents.progress_validator import validate
|
| 12 |
+
# from src.agents.recipe_planner import plan_recipe, propose_dishes
|
| 13 |
+
# from src.data.nutrition import compute_nutrition
|
| 14 |
+
# from src.pipeline import Recipe
|
| 15 |
from src.ui.components import (
|
| 16 |
DishOptions,
|
| 17 |
IngredientChips,
|
|
|
|
| 19 |
RecipeHero,
|
| 20 |
StepCard,
|
| 21 |
VerdictBadge,
|
| 22 |
+
recipe_to_state,
|
| 23 |
)
|
| 24 |
from src.ui.theme import CSS, theme
|
| 25 |
|
| 26 |
+
def on_propose(fridge_image: Image.Image | None, state: dict | None) -> tuple[str, str, list[str], dict]:
|
| 27 |
+
"""Photo → ingredients → 3 dish options."""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
state = state or {}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
ingredients = identify_ingredients(fridge_image)
|
| 30 |
+
# options = propose_dishes(ingredients)
|
| 31 |
|
| 32 |
+
# state.update({
|
| 33 |
+
# "ingredients_have": ingredients,
|
| 34 |
+
# "ingredients_missing": [],
|
| 35 |
+
# "options": [o.model_dump() for o in options],
|
| 36 |
+
# })
|
| 37 |
+
chips_html = IngredientChips.render({"have": ingredients, "missing": []})
|
| 38 |
+
log.info(ingredients)
|
| 39 |
+
# options_html = DishOptions.render({"options": state["options"]})
|
| 40 |
+
# radio_choices = [o.name for o in options]
|
| 41 |
+
# return chips_html, options_html, gr.update(choices=radio_choices, value=radio_choices[0] if radio_choices else None), state
|
| 42 |
+
return chips_html
|
| 43 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
|
| 46 |
+
# ----------------
|
| 47 |
+
# UI definition
|
| 48 |
+
# ----------------
|
| 49 |
def build_ui() -> gr.Blocks:
|
| 50 |
initial_state: dict[str, Any] = {}
|
| 51 |
|
| 52 |
+
with gr.Blocks(title="Cook With Me") as demo:
|
| 53 |
gr.Markdown(
|
| 54 |
"# 🍲 Cook With Me\n"
|
| 55 |
+
"_A multimodal sous-chef. See it. Plan it. Show it. Cook it._"
|
| 56 |
)
|
| 57 |
|
| 58 |
state = gr.State(initial_state)
|
| 59 |
|
| 60 |
with gr.Tabs():
|
| 61 |
+
# --- Tab 1: Cook ------------------------------------------------
|
| 62 |
+
with gr.Tab("Cook"):
|
|
|
|
|
|
|
| 63 |
with gr.Row():
|
|
|
|
| 64 |
with gr.Column(scale=1):
|
| 65 |
fridge_input = gr.Image(
|
| 66 |
label="📸 Photo of your fridge or pantry",
|
| 67 |
type="pil",
|
| 68 |
+
height=320,
|
| 69 |
)
|
| 70 |
+
propose_btn = gr.Button("What can I cook?", variant="primary")
|
| 71 |
|
| 72 |
gr.Markdown("### Ingredients I see")
|
| 73 |
chips = gr.HTML(IngredientChips.render({}))
|
| 74 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
gr.Markdown("### Pick a dish")
|
| 76 |
+
options = gr.HTML(DishOptions.render({}))
|
| 77 |
+
dish_radio = gr.Radio(choices=[], label="Choose one", interactive=True)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 78 |
|
| 79 |
+
with gr.Accordion("Generation options", open=False):
|
| 80 |
+
illustrate_chk = gr.Checkbox(value=False, label="Render step images (FLUX, slow on CPU)")
|
| 81 |
+
narrate_chk = gr.Checkbox(value=False, label="Generate voice narration (VoxCPM2)")
|
|
|
|
|
|
|
| 82 |
|
| 83 |
+
cook_btn = gr.Button("Build recipe", variant="primary")
|
| 84 |
|
|
|
|
| 85 |
with gr.Column(scale=2):
|
| 86 |
hero = gr.HTML(RecipeHero.render({}))
|
| 87 |
steps_panel = gr.HTML(StepCard.render({}))
|
| 88 |
nutrition_panel = gr.HTML(NutritionGrid.render({"nutrition": {}}))
|
| 89 |
|
| 90 |
+
# --- Tab 2: Check Progress -------------------------------------
|
| 91 |
+
with gr.Tab("Check Progress"):
|
| 92 |
+
gr.Markdown("Upload a photo of your pan or plate; the same vision model that planned your recipe will compare it against the target step.")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 93 |
with gr.Row():
|
| 94 |
with gr.Column():
|
| 95 |
step_idx = gr.Number(value=1, precision=0, label="Active step #")
|
| 96 |
+
progress_input = gr.Image(label="📸 Your pan / plate", type="pil", height=320)
|
| 97 |
+
validate_btn = gr.Button("How am I doing?", variant="primary")
|
|
|
|
|
|
|
|
|
|
|
|
|
| 98 |
with gr.Column():
|
| 99 |
verdict_panel = gr.HTML(VerdictBadge.render({}))
|
| 100 |
+
verdict_audio = gr.Audio(label="Tip (voice)", autoplay=False)
|
| 101 |
|
| 102 |
+
# --- Tab 3: About ----------------------------------------------
|
| 103 |
+
with gr.Tab("About"):
|
|
|
|
|
|
|
| 104 |
gr.Markdown(
|
| 105 |
"""
|
| 106 |
+
### Models
|
| 107 |
+
- **Vision** — `openbmb/MiniCPM-V-4_6-gguf` via `llama-cpp-python` (~4.6B)
|
| 108 |
+
- **Planner** — `openbmb/MiniCPM-V-4-gguf` via `llama-cpp-python` (~4B)
|
| 109 |
+
- **Illustrator** — `black-forest-labs/FLUX.2-klein-9B` via `diffusers` (9B)
|
| 110 |
+
- **Narrator** — `openbmb/VoxCPM2` via `transformers` (~1B)
|
| 111 |
+
- **Retrieval** — `sentence-transformers/all-MiniLM-L6-v2` (22M)
|
| 112 |
+
**Total ≈ 18.6B params** (≤ 32B requirement ✓).
|
| 113 |
+
### Pipeline
|
| 114 |
+
```
|
| 115 |
+
Fridge photo → Vision → ingredients
|
| 116 |
+
│
|
| 117 |
+
▼
|
| 118 |
+
Planner (+ Kaggle retrieval) → Recipe JSON
|
| 119 |
+
│
|
| 120 |
+
▼
|
| 121 |
+
Illustrator (FLUX) → hero + per-step images
|
| 122 |
+
│
|
| 123 |
+
▼
|
| 124 |
+
Narrator (VoxCPM2) → MP3 per step
|
| 125 |
+
│
|
| 126 |
+
▼
|
| 127 |
+
Progress photo → Validator (same vision model) → go|wait|fix
|
| 128 |
+
```
|
| 129 |
+
### Badges targeted
|
| 130 |
+
✓ Llama Champion · ✓ Well-Tuned · ✓ Off-Brand · ✓ Sharing is Caring · ✓ Field Notes
|
| 131 |
"""
|
| 132 |
)
|
| 133 |
|
| 134 |
+
# Wire callbacks ----------------------------------------------------
|
|
|
|
|
|
|
| 135 |
propose_btn.click(
|
| 136 |
fn=on_propose,
|
| 137 |
inputs=[fridge_input, state],
|
| 138 |
+
# outputs=[chips, options, dish_radio, state],
|
| 139 |
+
outputs=[chips],
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 140 |
)
|
| 141 |
+
# cook_btn.click(
|
| 142 |
+
# fn=on_pick_dish,
|
| 143 |
+
# inputs=[state, dish_radio, illustrate_chk, narrate_chk],
|
| 144 |
+
# outputs=[hero, steps_panel, nutrition_panel, chips, state],
|
| 145 |
+
# )
|
| 146 |
+
# validate_btn.click(
|
| 147 |
+
# fn=on_validate,
|
| 148 |
+
# inputs=[state, step_idx, progress_input],
|
| 149 |
+
# outputs=[verdict_panel, verdict_audio],
|
| 150 |
+
# )
|
| 151 |
|
| 152 |
return demo
|
| 153 |
|
|
|
|
| 159 |
server_port=int(__import__("os").environ.get("PORT", 7860)),
|
| 160 |
show_error=True,
|
| 161 |
inbrowser=True,
|
| 162 |
+
theme=theme,
|
| 163 |
+
css=CSS
|
| 164 |
+
)
|
modal_app/__init__.py
DELETED
|
File without changes
|
modal_app/flux_endpoint.py
DELETED
|
@@ -1,124 +0,0 @@
|
|
| 1 |
-
"""Modal FLUX.2 Klein endpoint.
|
| 2 |
-
|
| 3 |
-
Deploy once with:
|
| 4 |
-
modal deploy modal_app/flux_endpoint.py
|
| 5 |
-
|
| 6 |
-
Then the HF Space calls it via modal.Function.lookup().
|
| 7 |
-
"""
|
| 8 |
-
import io
|
| 9 |
-
import modal
|
| 10 |
-
|
| 11 |
-
# ---------------------------------------------------------------------------
|
| 12 |
-
# App & image
|
| 13 |
-
# ---------------------------------------------------------------------------
|
| 14 |
-
|
| 15 |
-
app = modal.App("cook-with-me-flux")
|
| 16 |
-
|
| 17 |
-
image = (
|
| 18 |
-
modal.Image.debian_slim(python_version="3.12")
|
| 19 |
-
.pip_install(
|
| 20 |
-
"torch==2.7.0", # >=2.5 needed: diffusers custom-op schema uses PEP604 unions
|
| 21 |
-
"torchvision==0.22.0", # matches torch 2.7.0; silences diffusers image-processor fallback
|
| 22 |
-
"diffusers>=0.38", # FLUX.2 support
|
| 23 |
-
"transformers>=4.45",
|
| 24 |
-
"accelerate",
|
| 25 |
-
"safetensors",
|
| 26 |
-
"Pillow",
|
| 27 |
-
"huggingface_hub>=1.17",
|
| 28 |
-
"sentencepiece",
|
| 29 |
-
)
|
| 30 |
-
)
|
| 31 |
-
|
| 32 |
-
# HF token secret so Modal can pull gated/private model weights
|
| 33 |
-
hf_secret = modal.Secret.from_name("huggingface-secret")
|
| 34 |
-
|
| 35 |
-
# Tried in order. FLUX models are gated (need license acceptance on HF);
|
| 36 |
-
# SDXL-Turbo is public and always works, so it's the guaranteed fallback.
|
| 37 |
-
FLUX_MODEL = "black-forest-labs/FLUX.2-klein-9B"
|
| 38 |
-
FLUX_FALLBACK = "black-forest-labs/FLUX.1-schnell"
|
| 39 |
-
SDXL_TURBO = "stabilityai/sdxl-turbo" # non-gated, fast (1-2 steps)
|
| 40 |
-
|
| 41 |
-
# ---------------------------------------------------------------------------
|
| 42 |
-
# GPU class
|
| 43 |
-
# ---------------------------------------------------------------------------
|
| 44 |
-
|
| 45 |
-
@app.cls(
|
| 46 |
-
image=image,
|
| 47 |
-
gpu="L4",
|
| 48 |
-
scaledown_window=180, # keep warm 3 min after last request
|
| 49 |
-
secrets=[hf_secret],
|
| 50 |
-
)
|
| 51 |
-
class FluxKlein:
|
| 52 |
-
@modal.enter()
|
| 53 |
-
def load(self):
|
| 54 |
-
import torch
|
| 55 |
-
|
| 56 |
-
dtype = torch.bfloat16
|
| 57 |
-
self.steps = 4
|
| 58 |
-
|
| 59 |
-
# 1) FLUX.2-klein (gated) ------------------------------------------------
|
| 60 |
-
try:
|
| 61 |
-
from diffusers import FluxPipeline
|
| 62 |
-
self.pipe = FluxPipeline.from_pretrained(FLUX_MODEL, torch_dtype=dtype).to("cuda")
|
| 63 |
-
self.guidance, self.steps, self.backend = 1.0, 4, "FLUX.2-klein-9B"
|
| 64 |
-
print(f"Loaded {self.backend}")
|
| 65 |
-
return
|
| 66 |
-
except Exception as e:
|
| 67 |
-
print(f"FLUX.2-klein unavailable ({type(e).__name__}); trying FLUX.1-schnell...")
|
| 68 |
-
|
| 69 |
-
# 2) FLUX.1-schnell (gated) ---------------------------------------------
|
| 70 |
-
try:
|
| 71 |
-
from diffusers import FluxPipeline
|
| 72 |
-
self.pipe = FluxPipeline.from_pretrained(FLUX_FALLBACK, torch_dtype=dtype).to("cuda")
|
| 73 |
-
self.guidance, self.steps, self.backend = 0.0, 4, "FLUX.1-schnell"
|
| 74 |
-
print(f"Loaded {self.backend}")
|
| 75 |
-
return
|
| 76 |
-
except Exception as e:
|
| 77 |
-
print(f"FLUX.1-schnell unavailable ({type(e).__name__}); falling back to SDXL-Turbo...")
|
| 78 |
-
|
| 79 |
-
# 3) SDXL-Turbo (public, always works) ----------------------------------
|
| 80 |
-
from diffusers import AutoPipelineForText2Image
|
| 81 |
-
self.pipe = AutoPipelineForText2Image.from_pretrained(
|
| 82 |
-
SDXL_TURBO, torch_dtype=torch.float16, variant="fp16"
|
| 83 |
-
).to("cuda")
|
| 84 |
-
self.guidance, self.steps, self.backend = 0.0, 2, "SDXL-Turbo"
|
| 85 |
-
print(f"Loaded {self.backend}")
|
| 86 |
-
|
| 87 |
-
@modal.method()
|
| 88 |
-
def render_step(self, prompt: str, seed: int = 42) -> bytes:
|
| 89 |
-
"""Generate a 512×512 PNG and return its raw bytes."""
|
| 90 |
-
import torch
|
| 91 |
-
|
| 92 |
-
img = self.pipe(
|
| 93 |
-
prompt=prompt,
|
| 94 |
-
height=512,
|
| 95 |
-
width=512,
|
| 96 |
-
guidance_scale=self.guidance,
|
| 97 |
-
num_inference_steps=self.steps,
|
| 98 |
-
generator=torch.Generator(device="cuda").manual_seed(seed),
|
| 99 |
-
).images[0]
|
| 100 |
-
|
| 101 |
-
buf = io.BytesIO()
|
| 102 |
-
img.save(buf, format="PNG")
|
| 103 |
-
return buf.getvalue()
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
# ---------------------------------------------------------------------------
|
| 107 |
-
# Local test entrypoint
|
| 108 |
-
# ---------------------------------------------------------------------------
|
| 109 |
-
|
| 110 |
-
@app.local_entrypoint()
|
| 111 |
-
def test():
|
| 112 |
-
import os
|
| 113 |
-
flux = FluxKlein()
|
| 114 |
-
png = flux.render_step.remote(
|
| 115 |
-
"Top-down photo of a kitchen pan with sautéed onions. "
|
| 116 |
-
"Mexican cooking. Warm lighting. Photorealistic.",
|
| 117 |
-
seed=0,
|
| 118 |
-
)
|
| 119 |
-
out = os.path.join(os.path.dirname(__file__), "..", "data", "test_flux.png")
|
| 120 |
-
out = os.path.abspath(out)
|
| 121 |
-
os.makedirs(os.path.dirname(out), exist_ok=True)
|
| 122 |
-
with open(out, "wb") as f:
|
| 123 |
-
f.write(png)
|
| 124 |
-
print(f"Saved {out} ({len(png)} bytes)")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
modal_app/planner_endpoint.py
DELETED
|
@@ -1,117 +0,0 @@
|
|
| 1 |
-
"""Modal endpoint for the fine-tuned MiniCPM4.1-8B recipe planner.
|
| 2 |
-
|
| 3 |
-
Runs in its OWN container because MiniCPM4.1's custom code requires
|
| 4 |
-
transformers 4.x (CacheLayerMixin + is_torch_fx_available), which conflicts
|
| 5 |
-
with the MiniCPM-V-4.6 vision model in the main app (needs transformers 5.x).
|
| 6 |
-
|
| 7 |
-
Deploy:
|
| 8 |
-
modal deploy modal_app/planner_endpoint.py
|
| 9 |
-
|
| 10 |
-
The Gradio app calls it via modal.Cls.from_name("cook-with-me-planner",
|
| 11 |
-
"Planner").infer.remote(prompt, ...).
|
| 12 |
-
"""
|
| 13 |
-
from __future__ import annotations
|
| 14 |
-
|
| 15 |
-
import os
|
| 16 |
-
|
| 17 |
-
import modal
|
| 18 |
-
|
| 19 |
-
app = modal.App("cook-with-me-planner")
|
| 20 |
-
|
| 21 |
-
# 8B bf16 weights cached on a volume so cold starts don't re-download ~16GB.
|
| 22 |
-
hf_cache = modal.Volume.from_name("cook-with-me-planner-cache", create_if_missing=True)
|
| 23 |
-
hf_secret = modal.Secret.from_name("huggingface-secret")
|
| 24 |
-
|
| 25 |
-
image = (
|
| 26 |
-
modal.Image.debian_slim(python_version="3.12")
|
| 27 |
-
.pip_install(
|
| 28 |
-
"torch==2.4.0",
|
| 29 |
-
# MiniCPM4.1 custom code needs BOTH CacheLayerMixin (>=4.54) and
|
| 30 |
-
# is_torch_fx_available (removed in 5.0) — only 4.54..4.x has both.
|
| 31 |
-
"transformers>=4.54,<5.0",
|
| 32 |
-
"huggingface_hub>=0.26,<1.0",
|
| 33 |
-
"accelerate",
|
| 34 |
-
"sentencepiece",
|
| 35 |
-
"safetensors",
|
| 36 |
-
)
|
| 37 |
-
.env({"HF_HOME": "/cache/hf"})
|
| 38 |
-
)
|
| 39 |
-
|
| 40 |
-
# Fine-tuned weights; tokenizer pulled from base (FT tokenizer_config was saved
|
| 41 |
-
# by transformers 5.x and is not readable by 4.x).
|
| 42 |
-
PLANNER_REPO = os.environ.get("COOK_WITH_ME_PLANNER_FT_REPO", "eldinosaur/cook-with-me-planner-8b")
|
| 43 |
-
BASE_REPO = "openbmb/MiniCPM4.1-8B"
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
@app.cls(
|
| 47 |
-
image=image,
|
| 48 |
-
gpu="L4",
|
| 49 |
-
volumes={"/cache": hf_cache},
|
| 50 |
-
secrets=[hf_secret],
|
| 51 |
-
scaledown_window=240,
|
| 52 |
-
timeout=600,
|
| 53 |
-
)
|
| 54 |
-
class Planner:
|
| 55 |
-
@modal.enter()
|
| 56 |
-
def load(self):
|
| 57 |
-
import torch
|
| 58 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 59 |
-
|
| 60 |
-
print(f"Loading planner weights from {PLANNER_REPO}...")
|
| 61 |
-
self.tokenizer = AutoTokenizer.from_pretrained(BASE_REPO, trust_remote_code=True)
|
| 62 |
-
if self.tokenizer.pad_token is None:
|
| 63 |
-
self.tokenizer.pad_token = self.tokenizer.eos_token
|
| 64 |
-
self.model = AutoModelForCausalLM.from_pretrained(
|
| 65 |
-
PLANNER_REPO,
|
| 66 |
-
torch_dtype=torch.bfloat16,
|
| 67 |
-
trust_remote_code=True,
|
| 68 |
-
device_map="cuda",
|
| 69 |
-
).eval()
|
| 70 |
-
print("Planner ready.")
|
| 71 |
-
|
| 72 |
-
@modal.method()
|
| 73 |
-
def infer(self, prompt: str, max_new_tokens: int = 1024, temperature: float = 0.0) -> str:
|
| 74 |
-
import torch
|
| 75 |
-
|
| 76 |
-
messages = [{"role": "user", "content": prompt}]
|
| 77 |
-
# enable_thinking=False -> direct JSON, no <think> reasoning preamble
|
| 78 |
-
try:
|
| 79 |
-
enc = self.tokenizer.apply_chat_template(
|
| 80 |
-
messages,
|
| 81 |
-
add_generation_prompt=True,
|
| 82 |
-
tokenize=True,
|
| 83 |
-
return_tensors="pt",
|
| 84 |
-
return_dict=True,
|
| 85 |
-
enable_thinking=False,
|
| 86 |
-
)
|
| 87 |
-
except TypeError:
|
| 88 |
-
enc = self.tokenizer.apply_chat_template(
|
| 89 |
-
messages, add_generation_prompt=True, tokenize=True,
|
| 90 |
-
return_tensors="pt", return_dict=True,
|
| 91 |
-
)
|
| 92 |
-
|
| 93 |
-
input_ids = enc["input_ids"].to(self.model.device)
|
| 94 |
-
input_len = input_ids.shape[1]
|
| 95 |
-
gen_inputs = {"input_ids": input_ids}
|
| 96 |
-
if enc.get("attention_mask") is not None:
|
| 97 |
-
gen_inputs["attention_mask"] = enc["attention_mask"].to(self.model.device)
|
| 98 |
-
|
| 99 |
-
gen_kwargs = dict(max_new_tokens=max_new_tokens, repetition_penalty=1.05)
|
| 100 |
-
if temperature and temperature > 0:
|
| 101 |
-
gen_kwargs.update(do_sample=True, temperature=temperature, top_p=0.9)
|
| 102 |
-
else:
|
| 103 |
-
gen_kwargs.update(do_sample=False)
|
| 104 |
-
|
| 105 |
-
with torch.no_grad():
|
| 106 |
-
out = self.model.generate(**gen_inputs, **gen_kwargs)
|
| 107 |
-
return self.tokenizer.decode(out[0][input_len:], skip_special_tokens=True)
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
@app.local_entrypoint()
|
| 111 |
-
def test():
|
| 112 |
-
prompt = (
|
| 113 |
-
"You are a creative chef. Available ingredients: tomato, onion, garlic, pasta, olive oil.\n"
|
| 114 |
-
'Respond ONLY with JSON: {"options": [{"name": "...", "why": "..."}, {"name": "...", "why": "..."}, {"name": "...", "why": "..."}]}'
|
| 115 |
-
)
|
| 116 |
-
out = Planner().infer.remote(prompt, max_new_tokens=400)
|
| 117 |
-
print("OUTPUT:\n", out)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
modal_app/serve_app.py
DELETED
|
@@ -1,102 +0,0 @@
|
|
| 1 |
-
"""Serve the full Cook With Me Gradio app on Modal GPU.
|
| 2 |
-
|
| 3 |
-
This gives a permanent public URL (*.modal.run) that runs the real models:
|
| 4 |
-
- MiniCPM-V-4.6 (vision: ingredients + progress validation)
|
| 5 |
-
- MiniCPM4.1-8B (planner: dish proposals + recipes)
|
| 6 |
-
- FLUX.2-klein (step images, via the separate cook-with-me-flux endpoint)
|
| 7 |
-
|
| 8 |
-
Deploy with:
|
| 9 |
-
modal deploy modal_app/serve_app.py
|
| 10 |
-
Or run a temporary dev session (auto-stops on Ctrl-C):
|
| 11 |
-
modal serve modal_app/serve_app.py
|
| 12 |
-
|
| 13 |
-
Both models live in one A100-40GB container (~25GB VRAM total).
|
| 14 |
-
Set the fine-tuned planner repo via the COOK_WITH_ME_PLANNER_FT_REPO env
|
| 15 |
-
on the Modal function once training finishes.
|
| 16 |
-
"""
|
| 17 |
-
from __future__ import annotations
|
| 18 |
-
|
| 19 |
-
from pathlib import Path
|
| 20 |
-
|
| 21 |
-
import modal
|
| 22 |
-
|
| 23 |
-
LOCAL_ROOT = Path(__file__).resolve().parent.parent
|
| 24 |
-
REMOTE_ROOT = "/root/cook"
|
| 25 |
-
|
| 26 |
-
app = modal.App("cook-with-me-app")
|
| 27 |
-
|
| 28 |
-
# HF model cache persisted across restarts (avoids re-downloading ~25GB)
|
| 29 |
-
hf_cache = modal.Volume.from_name("cook-with-me-hf-cache", create_if_missing=True)
|
| 30 |
-
hf_secret = modal.Secret.from_name("huggingface-secret")
|
| 31 |
-
|
| 32 |
-
image = (
|
| 33 |
-
modal.Image.debian_slim(python_version="3.12")
|
| 34 |
-
.pip_install(
|
| 35 |
-
"torch==2.4.0",
|
| 36 |
-
"torchvision==0.19.0",
|
| 37 |
-
"transformers>=5.0",
|
| 38 |
-
"accelerate",
|
| 39 |
-
"safetensors",
|
| 40 |
-
"sentencepiece",
|
| 41 |
-
"Pillow",
|
| 42 |
-
"av",
|
| 43 |
-
"pydantic>=2",
|
| 44 |
-
"gradio==6.15.2",
|
| 45 |
-
"huggingface_hub>=1.17",
|
| 46 |
-
"modal",
|
| 47 |
-
)
|
| 48 |
-
.env({
|
| 49 |
-
"COOK_WITH_ME_CACHE": "/cache/cook",
|
| 50 |
-
# Use the fine-tuned planner pushed by scripts/train_planner.py
|
| 51 |
-
"COOK_WITH_ME_PLANNER_FT_REPO": "eldinosaur/cook-with-me-planner-8b",
|
| 52 |
-
})
|
| 53 |
-
.add_local_dir(
|
| 54 |
-
str(LOCAL_ROOT),
|
| 55 |
-
REMOTE_ROOT,
|
| 56 |
-
ignore=[
|
| 57 |
-
"data/*", ".git/*", "**/__pycache__", "**/*.pyc",
|
| 58 |
-
"assets/*", ".venv/*", "venv/*",
|
| 59 |
-
],
|
| 60 |
-
)
|
| 61 |
-
)
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
@app.function(
|
| 65 |
-
image=image,
|
| 66 |
-
gpu="L40S",
|
| 67 |
-
secrets=[hf_secret],
|
| 68 |
-
volumes={"/cache": hf_cache},
|
| 69 |
-
timeout=3600,
|
| 70 |
-
scaledown_window=300, # stay warm 5 min after last request
|
| 71 |
-
max_containers=1,
|
| 72 |
-
)
|
| 73 |
-
@modal.concurrent(max_inputs=20)
|
| 74 |
-
@modal.asgi_app()
|
| 75 |
-
def serve():
|
| 76 |
-
import os
|
| 77 |
-
import sys
|
| 78 |
-
import types
|
| 79 |
-
|
| 80 |
-
# --- env: cache model downloads on the volume, before any HF import ---
|
| 81 |
-
os.environ["HF_HOME"] = "/cache/hf"
|
| 82 |
-
os.environ.setdefault("HF_HUB_ENABLE_HF_TRANSFER", "0")
|
| 83 |
-
|
| 84 |
-
# --- mock `spaces` so @spaces.GPU becomes a no-op (we're already on GPU) ---
|
| 85 |
-
spaces_mock = types.ModuleType("spaces")
|
| 86 |
-
spaces_mock.GPU = lambda *a, **k: (lambda fn: fn)
|
| 87 |
-
sys.modules["spaces"] = spaces_mock
|
| 88 |
-
|
| 89 |
-
# --- make the mounted project importable ---
|
| 90 |
-
sys.path.insert(0, REMOTE_ROOT)
|
| 91 |
-
|
| 92 |
-
import gradio as gr
|
| 93 |
-
from fastapi import FastAPI
|
| 94 |
-
|
| 95 |
-
# Importing app triggers the vision model load (module-level singleton).
|
| 96 |
-
from app import build_ui
|
| 97 |
-
|
| 98 |
-
demo = build_ui()
|
| 99 |
-
demo.queue(max_size=20)
|
| 100 |
-
|
| 101 |
-
fastapi_app = FastAPI()
|
| 102 |
-
return gr.mount_gradio_app(app=fastapi_app, blocks=demo, path="/")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
packages.txt
DELETED
|
@@ -1,2 +0,0 @@
|
|
| 1 |
-
ffmpeg
|
| 2 |
-
libsndfile1
|
|
|
|
|
|
|
|
|
requirements.txt
CHANGED
|
@@ -1,7 +1,10 @@
|
|
|
|
|
|
|
|
| 1 |
gradio==6.15.2
|
| 2 |
huggingface_hub>=1.17
|
| 3 |
|
| 4 |
-
|
|
|
|
| 5 |
torch
|
| 6 |
torchvision
|
| 7 |
spaces
|
|
@@ -9,7 +12,4 @@ Pillow
|
|
| 9 |
transformers>=4.45
|
| 10 |
accelerate
|
| 11 |
safetensors
|
| 12 |
-
av
|
| 13 |
-
|
| 14 |
-
# Pipeline & data
|
| 15 |
-
pydantic>=2
|
|
|
|
| 1 |
+
# --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
|
| 2 |
+
# llama-cpp-python
|
| 3 |
gradio==6.15.2
|
| 4 |
huggingface_hub>=1.17
|
| 5 |
|
| 6 |
+
|
| 7 |
+
# --- Librerías añadidas y desbloqueadas para MiniCPM-V-4.6 ---
|
| 8 |
torch
|
| 9 |
torchvision
|
| 10 |
spaces
|
|
|
|
| 12 |
transformers>=4.45
|
| 13 |
accelerate
|
| 14 |
safetensors
|
| 15 |
+
av
|
|
|
|
|
|
|
|
|
scripts/build_recipe_dataset.py
DELETED
|
@@ -1,281 +0,0 @@
|
|
| 1 |
-
"""Build the SFT dataset for the MiniCPM4.1-8B recipe planner.
|
| 2 |
-
|
| 3 |
-
Reads the Kaggle "better-recipes-for-a-better-life" dataset and produces
|
| 4 |
-
supervised fine-tuning pairs for BOTH planner tasks, matching the exact
|
| 5 |
-
prompt formats the app uses (src/prompts/planner_propose.txt and
|
| 6 |
-
planner_recipe.txt):
|
| 7 |
-
|
| 8 |
-
1. propose : ingredients -> {"options": [{name, why} x3]}
|
| 9 |
-
2. recipe : dish + ingredients -> {"name", "cuisine", "servings",
|
| 10 |
-
"total_time_minutes", "final_dish_visual", "steps":[...]}
|
| 11 |
-
|
| 12 |
-
Run locally (once) before fine-tuning:
|
| 13 |
-
python scripts/build_recipe_dataset.py
|
| 14 |
-
|
| 15 |
-
Requires:
|
| 16 |
-
pip install kagglehub pandas pyarrow datasets huggingface_hub tqdm
|
| 17 |
-
~/.kaggle/kaggle.json with your credentials
|
| 18 |
-
"""
|
| 19 |
-
from __future__ import annotations
|
| 20 |
-
|
| 21 |
-
import json
|
| 22 |
-
import random
|
| 23 |
-
import re
|
| 24 |
-
import sys
|
| 25 |
-
from pathlib import Path
|
| 26 |
-
|
| 27 |
-
ROOT = Path(__file__).resolve().parent.parent
|
| 28 |
-
sys.path.insert(0, str(ROOT))
|
| 29 |
-
|
| 30 |
-
import pandas as pd
|
| 31 |
-
from tqdm import tqdm
|
| 32 |
-
|
| 33 |
-
from src import config
|
| 34 |
-
|
| 35 |
-
random.seed(42)
|
| 36 |
-
|
| 37 |
-
HF_DATASET_REPO = "eldinosaur/cook-with-me-recipes-sft"
|
| 38 |
-
|
| 39 |
-
# ---------------------------------------------------------------------------
|
| 40 |
-
# 1. Download (use ONLY recipes.csv — test_recipes.csv has a different schema
|
| 41 |
-
# whose capitalized columns shadowed the real data in the old version)
|
| 42 |
-
# ---------------------------------------------------------------------------
|
| 43 |
-
|
| 44 |
-
print("Pulling Kaggle dataset…")
|
| 45 |
-
import kagglehub
|
| 46 |
-
|
| 47 |
-
raw_path = Path(kagglehub.dataset_download(config.KAGGLE_DATASET))
|
| 48 |
-
main_csv = raw_path / "recipes.csv"
|
| 49 |
-
print(f"Reading {main_csv}")
|
| 50 |
-
|
| 51 |
-
# cp1252 decodes the fraction/symbol bytes that show up as � under utf-8
|
| 52 |
-
try:
|
| 53 |
-
raw_df = pd.read_csv(main_csv, encoding="cp1252", on_bad_lines="skip")
|
| 54 |
-
except Exception:
|
| 55 |
-
raw_df = pd.read_csv(main_csv, encoding="utf-8", on_bad_lines="skip")
|
| 56 |
-
|
| 57 |
-
print(f"Rows: {len(raw_df)} columns: {list(raw_df.columns)}")
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
# ---------------------------------------------------------------------------
|
| 61 |
-
# 2. Cleaning helpers
|
| 62 |
-
# ---------------------------------------------------------------------------
|
| 63 |
-
|
| 64 |
-
_UNIT = (
|
| 65 |
-
r"(cups?|tablespoons?|tbsps?|teaspoons?|tsps?|pounds?|lbs?|ounces?|ozs?|"
|
| 66 |
-
r"grams?|kgs?|mls?|liters?|pinch(?:es)?|dash(?:es)?|cloves?|cans?|"
|
| 67 |
-
r"packages?|pkgs?|sheets?|slices?|sticks?|quarts?|pints?|jars?|bunch(?:es)?|"
|
| 68 |
-
r"heads?|stalks?|sprigs?|pieces?|fillets?)"
|
| 69 |
-
)
|
| 70 |
-
_PREP_WORDS = {
|
| 71 |
-
"peeled", "chopped", "diced", "sliced", "minced", "cored", "thawed",
|
| 72 |
-
"drained", "rinsed", "softened", "melted", "beaten", "divided", "cubed",
|
| 73 |
-
"to taste", "optional", "or more", "plus more", "for garnish", "for serving",
|
| 74 |
-
"lightly beaten", "room temperature", "at room temperature", "finely chopped",
|
| 75 |
-
"thinly sliced", "cut into", "more", "and", "or other", "such as",
|
| 76 |
-
}
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
def _clean_text(val: str) -> str:
|
| 80 |
-
if not isinstance(val, str):
|
| 81 |
-
return ""
|
| 82 |
-
# drop any remaining replacement chars and collapse whitespace
|
| 83 |
-
val = val.replace("�", " ")
|
| 84 |
-
return re.sub(r"[ \t]+", " ", val).strip()
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
def _simplify_ingredient(raw: str) -> str:
|
| 88 |
-
s = re.sub(r"\([^)]*\)", "", raw) # remove parentheticals
|
| 89 |
-
s = _clean_text(s).lower()
|
| 90 |
-
s = re.sub(r"^[\d\s./¼½¾⅓⅔⅛+-]+", "", s) # leading quantities
|
| 91 |
-
s = re.sub(rf"^{_UNIT}\b\.?\s*", "", s) # leading unit word
|
| 92 |
-
s = re.sub(r"^(of|the|a|an)\s+", "", s)
|
| 93 |
-
s = s.split(",")[0] # drop trailing prep clause
|
| 94 |
-
s = re.sub(r"[^a-z\s-]", "", s) # keep letters only
|
| 95 |
-
s = re.sub(r"\s+", " ", s).strip()
|
| 96 |
-
return s
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
def _ingredient_list(raw: str) -> list[str]:
|
| 100 |
-
if not isinstance(raw, str):
|
| 101 |
-
return []
|
| 102 |
-
out, seen = [], set()
|
| 103 |
-
for part in raw.split(","):
|
| 104 |
-
name = _simplify_ingredient(part)
|
| 105 |
-
if not name or len(name) < 3 or len(name.split()) > 4:
|
| 106 |
-
continue
|
| 107 |
-
if name in _PREP_WORDS or name in seen:
|
| 108 |
-
continue
|
| 109 |
-
seen.add(name)
|
| 110 |
-
out.append(name)
|
| 111 |
-
return out
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
def _steps_from_directions(raw: str) -> list[str]:
|
| 115 |
-
if not isinstance(raw, str):
|
| 116 |
-
return []
|
| 117 |
-
raw = _clean_text(raw.replace("\r", "\n"))
|
| 118 |
-
# Prefer explicit newlines; otherwise split into sentences.
|
| 119 |
-
parts = [p.strip() for p in raw.split("\n") if p.strip()]
|
| 120 |
-
if len(parts) < 2:
|
| 121 |
-
parts = [p.strip() for p in re.split(r"(?<=[.!?])\s+(?=[A-Z])", raw) if p.strip()]
|
| 122 |
-
# merge very short fragments into the previous step
|
| 123 |
-
steps: list[str] = []
|
| 124 |
-
for p in parts:
|
| 125 |
-
if steps and len(p) < 25:
|
| 126 |
-
steps[-1] = steps[-1] + " " + p
|
| 127 |
-
else:
|
| 128 |
-
steps.append(p)
|
| 129 |
-
return [s for s in steps if len(s) > 15]
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
def _minutes(row) -> int:
|
| 133 |
-
for col in ("total_time", "cook_time", "prep_time"):
|
| 134 |
-
v = row.get(col)
|
| 135 |
-
if isinstance(v, str):
|
| 136 |
-
h = re.search(r"(\d+)\s*hr", v)
|
| 137 |
-
m = re.search(r"(\d+)\s*min", v)
|
| 138 |
-
total = (int(h.group(1)) * 60 if h else 0) + (int(m.group(1)) if m else 0)
|
| 139 |
-
if total:
|
| 140 |
-
return total
|
| 141 |
-
return 0
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
def _cuisine(row) -> str:
|
| 145 |
-
cp = row.get("cuisine_path")
|
| 146 |
-
if isinstance(cp, str):
|
| 147 |
-
segs = [s for s in cp.split("/") if s]
|
| 148 |
-
if segs:
|
| 149 |
-
return segs[0].replace("-", " ").strip().title()
|
| 150 |
-
return "International"
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
def _distribute(total: int, n: int) -> list[int]:
|
| 154 |
-
if n <= 0:
|
| 155 |
-
return []
|
| 156 |
-
if total <= 0:
|
| 157 |
-
total = n * 6
|
| 158 |
-
base = max(2, total // n)
|
| 159 |
-
durs = [base] * n
|
| 160 |
-
durs[-1] = max(2, total - base * (n - 1))
|
| 161 |
-
return durs
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
# ---------------------------------------------------------------------------
|
| 165 |
-
# 3. Normalize into clean recipe records
|
| 166 |
-
# ---------------------------------------------------------------------------
|
| 167 |
-
|
| 168 |
-
recipes: list[dict] = []
|
| 169 |
-
for _, r in tqdm(raw_df.iterrows(), total=len(raw_df), desc="Normalizing"):
|
| 170 |
-
name = _clean_text(r.get("recipe_name", ""))
|
| 171 |
-
ings = _ingredient_list(r.get("ingredients", ""))
|
| 172 |
-
steps = _steps_from_directions(r.get("directions", ""))
|
| 173 |
-
if not name or len(ings) < 3 or len(steps) < 2:
|
| 174 |
-
continue
|
| 175 |
-
steps = steps[:7]
|
| 176 |
-
if len(steps) < 4 and len(steps) >= 2:
|
| 177 |
-
pass # keep short recipes too, 2-3 steps is fine
|
| 178 |
-
minutes = _minutes(r) or len(steps) * 6
|
| 179 |
-
try:
|
| 180 |
-
servings = int(float(str(r.get("servings", "2")).split()[0]))
|
| 181 |
-
except Exception:
|
| 182 |
-
servings = 2
|
| 183 |
-
servings = min(max(servings, 1), 12)
|
| 184 |
-
recipes.append({
|
| 185 |
-
"name": name,
|
| 186 |
-
"ingredients": ings[:14],
|
| 187 |
-
"steps": steps,
|
| 188 |
-
"cuisine": _cuisine(r),
|
| 189 |
-
"minutes": int(minutes),
|
| 190 |
-
"servings": servings,
|
| 191 |
-
})
|
| 192 |
-
|
| 193 |
-
print(f"\nClean recipes: {len(recipes)}")
|
| 194 |
-
|
| 195 |
-
config.DATA_DIR.mkdir(parents=True, exist_ok=True)
|
| 196 |
-
pd.DataFrame(recipes).to_parquet(config.RECIPES_PARQUET, index=False)
|
| 197 |
-
print(f"Saved -> {config.RECIPES_PARQUET}")
|
| 198 |
-
|
| 199 |
-
|
| 200 |
-
# ---------------------------------------------------------------------------
|
| 201 |
-
# 4. Build SFT pairs matching the app's exact prompt formats
|
| 202 |
-
# ---------------------------------------------------------------------------
|
| 203 |
-
|
| 204 |
-
PROPOSE_TMPL = (config.PROMPTS_DIR / "planner_propose.txt").read_text(encoding="utf-8")
|
| 205 |
-
RECIPE_TMPL = (config.PROMPTS_DIR / "planner_recipe.txt").read_text(encoding="utf-8")
|
| 206 |
-
|
| 207 |
-
_WHY = [
|
| 208 |
-
"Uses your {a} and {b} for a quick, satisfying result.",
|
| 209 |
-
"A fresh way to combine {a} with {b}.",
|
| 210 |
-
"Turns {a} and {b} into a comforting classic.",
|
| 211 |
-
"Light and flavorful, built around {a} and {b}.",
|
| 212 |
-
"Makes the most of {a}, {b} and a few pantry staples.",
|
| 213 |
-
]
|
| 214 |
-
|
| 215 |
-
|
| 216 |
-
def _recipe_json(rec: dict) -> str:
|
| 217 |
-
durs = _distribute(rec["minutes"], len(rec["steps"]))
|
| 218 |
-
steps = [
|
| 219 |
-
{"n": i + 1, "instruction": s, "duration": f"{d} min", "tip": None}
|
| 220 |
-
for i, (s, d) in enumerate(zip(rec["steps"], durs))
|
| 221 |
-
]
|
| 222 |
-
obj = {
|
| 223 |
-
"name": rec["name"],
|
| 224 |
-
"cuisine": rec["cuisine"],
|
| 225 |
-
"servings": rec["servings"],
|
| 226 |
-
"total_time_minutes": rec["minutes"],
|
| 227 |
-
"final_dish_visual": f"A beautifully plated {rec['name'].lower()}, ready to serve.",
|
| 228 |
-
"steps": steps,
|
| 229 |
-
}
|
| 230 |
-
return json.dumps(obj, ensure_ascii=False)
|
| 231 |
-
|
| 232 |
-
|
| 233 |
-
def _propose_json(rec: dict, others: list[dict]) -> str:
|
| 234 |
-
a = rec["ingredients"][0] if rec["ingredients"] else "your ingredients"
|
| 235 |
-
b = rec["ingredients"][1] if len(rec["ingredients"]) > 1 else "pantry staples"
|
| 236 |
-
options = [{"name": rec["name"], "why": random.choice(_WHY).format(a=a, b=b)}]
|
| 237 |
-
for o in others:
|
| 238 |
-
oa = o["ingredients"][0] if o["ingredients"] else a
|
| 239 |
-
ob = o["ingredients"][1] if len(o["ingredients"]) > 1 else b
|
| 240 |
-
options.append({"name": o["name"], "why": random.choice(_WHY).format(a=oa, b=ob)})
|
| 241 |
-
return json.dumps({"options": options}, ensure_ascii=False)
|
| 242 |
-
|
| 243 |
-
|
| 244 |
-
sft_path = config.DATA_DIR / "recipes_sft.jsonl"
|
| 245 |
-
n_recipe = n_propose = 0
|
| 246 |
-
with open(sft_path, "w", encoding="utf-8") as f:
|
| 247 |
-
for idx, rec in enumerate(tqdm(recipes, desc="Building SFT")):
|
| 248 |
-
ing_str = ", ".join(rec["ingredients"])
|
| 249 |
-
|
| 250 |
-
# --- recipe task ---
|
| 251 |
-
user_recipe = RECIPE_TMPL.replace("{dish_name}", rec["name"]).replace("{ingredients}", ing_str)
|
| 252 |
-
f.write(json.dumps({"messages": [
|
| 253 |
-
{"role": "user", "content": user_recipe},
|
| 254 |
-
{"role": "assistant", "content": _recipe_json(rec)},
|
| 255 |
-
]}, ensure_ascii=False) + "\n")
|
| 256 |
-
n_recipe += 1
|
| 257 |
-
|
| 258 |
-
# --- propose task (use two other recipes as alternative options) ---
|
| 259 |
-
others = [recipes[(idx + 7) % len(recipes)], recipes[(idx + 53) % len(recipes)]]
|
| 260 |
-
user_propose = PROPOSE_TMPL.replace("{ingredients}", ing_str)
|
| 261 |
-
f.write(json.dumps({"messages": [
|
| 262 |
-
{"role": "user", "content": user_propose},
|
| 263 |
-
{"role": "assistant", "content": _propose_json(rec, others)},
|
| 264 |
-
]}, ensure_ascii=False) + "\n")
|
| 265 |
-
n_propose += 1
|
| 266 |
-
|
| 267 |
-
print(f"\nSFT pairs: {n_recipe} recipe + {n_propose} propose = {n_recipe + n_propose} -> {sft_path}")
|
| 268 |
-
|
| 269 |
-
|
| 270 |
-
# ---------------------------------------------------------------------------
|
| 271 |
-
# 5. Push to HF Hub
|
| 272 |
-
# ---------------------------------------------------------------------------
|
| 273 |
-
|
| 274 |
-
if HF_DATASET_REPO:
|
| 275 |
-
from datasets import load_dataset
|
| 276 |
-
|
| 277 |
-
ds = load_dataset("json", data_files=str(sft_path), split="train")
|
| 278 |
-
ds.push_to_hub(HF_DATASET_REPO)
|
| 279 |
-
print(f"Pushed {len(ds)} rows to {HF_DATASET_REPO}")
|
| 280 |
-
|
| 281 |
-
print("\nDone.")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
scripts/diag_planner.py
DELETED
|
@@ -1,73 +0,0 @@
|
|
| 1 |
-
"""Diagnose why the fine-tuned planner produces empty generations.
|
| 2 |
-
|
| 3 |
-
modal run scripts/diag_planner.py
|
| 4 |
-
"""
|
| 5 |
-
import modal
|
| 6 |
-
|
| 7 |
-
app = modal.App("cook-with-me-diag")
|
| 8 |
-
|
| 9 |
-
image = (
|
| 10 |
-
modal.Image.debian_slim(python_version="3.12")
|
| 11 |
-
.pip_install(
|
| 12 |
-
"torch==2.4.0",
|
| 13 |
-
"transformers>=4.54,<5.0", # window with BOTH CacheLayerMixin and is_torch_fx_available
|
| 14 |
-
"huggingface_hub>=0.26,<1.0",
|
| 15 |
-
"accelerate",
|
| 16 |
-
"sentencepiece",
|
| 17 |
-
)
|
| 18 |
-
)
|
| 19 |
-
hf_secret = modal.Secret.from_name("huggingface-secret")
|
| 20 |
-
|
| 21 |
-
MODEL_ID = "eldinosaur/cook-with-me-planner-8b" # fine-tuned model under transformers 4.x
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
@app.function(image=image, gpu="L4", secrets=[hf_secret], timeout=900)
|
| 25 |
-
def diag():
|
| 26 |
-
import torch
|
| 27 |
-
import transformers
|
| 28 |
-
print("transformers version:", transformers.__version__)
|
| 29 |
-
|
| 30 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 31 |
-
|
| 32 |
-
print("Loading tokenizer (from base) + model (from FT)...")
|
| 33 |
-
tok = AutoTokenizer.from_pretrained("openbmb/MiniCPM4.1-8B", trust_remote_code=True)
|
| 34 |
-
model = AutoModelForCausalLM.from_pretrained(
|
| 35 |
-
MODEL_ID, torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="cuda"
|
| 36 |
-
).eval()
|
| 37 |
-
print("has generate:", hasattr(model, "generate"))
|
| 38 |
-
print("class mro:", [c.__name__ for c in type(model).__mro__])
|
| 39 |
-
|
| 40 |
-
prompt = (
|
| 41 |
-
"You are a chef. Given ingredients: tomato, onion, garlic, pasta, olive oil.\n"
|
| 42 |
-
'Return ONLY JSON: {"options": [{"name": "...", "why": "..."}, ...]} with 3 dish ideas.'
|
| 43 |
-
)
|
| 44 |
-
messages = [{"role": "user", "content": prompt}]
|
| 45 |
-
|
| 46 |
-
# Mirror the fixed planner.py path
|
| 47 |
-
try:
|
| 48 |
-
enc = tok.apply_chat_template(
|
| 49 |
-
messages, add_generation_prompt=True, tokenize=True,
|
| 50 |
-
return_tensors="pt", return_dict=True,
|
| 51 |
-
)
|
| 52 |
-
input_ids = enc["input_ids"].to("cuda")
|
| 53 |
-
input_len = input_ids.shape[1]
|
| 54 |
-
gen_inputs = {"input_ids": input_ids}
|
| 55 |
-
if enc.get("attention_mask") is not None:
|
| 56 |
-
gen_inputs["attention_mask"] = enc["attention_mask"].to("cuda")
|
| 57 |
-
print("input length:", input_len)
|
| 58 |
-
with torch.no_grad():
|
| 59 |
-
out = model.generate(**gen_inputs, max_new_tokens=400, do_sample=False)
|
| 60 |
-
text = tok.decode(out[0][input_len:], skip_special_tokens=True)
|
| 61 |
-
print("=== GENERATION OK (transformers 4.x, cache on) ===")
|
| 62 |
-
print("OUTPUT:", repr(text[:1000]))
|
| 63 |
-
except Exception as e:
|
| 64 |
-
import traceback
|
| 65 |
-
print("=== GENERATION FAILED ===")
|
| 66 |
-
print("Exception type:", type(e).__name__)
|
| 67 |
-
print("Exception repr:", repr(e))
|
| 68 |
-
traceback.print_exc()
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
@app.local_entrypoint()
|
| 72 |
-
def main():
|
| 73 |
-
diag.remote()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
scripts/train_planner.py
DELETED
|
@@ -1,172 +0,0 @@
|
|
| 1 |
-
"""Fine-tune MiniCPM4.1-8B on the recipe SFT dataset via Modal (A10G GPU).
|
| 2 |
-
|
| 3 |
-
Usage:
|
| 4 |
-
modal run scripts/train_planner.py
|
| 5 |
-
|
| 6 |
-
After training, the adapter is merged and the full model is pushed to HF Hub
|
| 7 |
-
as <HF_USERNAME>/cook-with-me-planner-8b
|
| 8 |
-
|
| 9 |
-
Set HF_USERNAME below (or export HF_TOKEN env var before running).
|
| 10 |
-
"""
|
| 11 |
-
from __future__ import annotations
|
| 12 |
-
|
| 13 |
-
import modal
|
| 14 |
-
|
| 15 |
-
# ---------------------------------------------------------------------------
|
| 16 |
-
# Config — change these two values
|
| 17 |
-
# ---------------------------------------------------------------------------
|
| 18 |
-
HF_USERNAME = "eldinosaur"
|
| 19 |
-
SFT_DATASET_REPO = f"{HF_USERNAME}/cook-with-me-recipes-sft"
|
| 20 |
-
OUTPUT_REPO = f"{HF_USERNAME}/cook-with-me-planner-8b"
|
| 21 |
-
BASE_MODEL = "openbmb/MiniCPM4.1-8B"
|
| 22 |
-
# ---------------------------------------------------------------------------
|
| 23 |
-
|
| 24 |
-
app = modal.App("cook-with-me-train")
|
| 25 |
-
|
| 26 |
-
volume = modal.Volume.from_name("cook-with-me-train-vol", create_if_missing=True)
|
| 27 |
-
|
| 28 |
-
train_image = (
|
| 29 |
-
modal.Image.debian_slim(python_version="3.12")
|
| 30 |
-
.pip_install(
|
| 31 |
-
"torch==2.4.0",
|
| 32 |
-
"transformers>=5.0",
|
| 33 |
-
"peft>=0.12",
|
| 34 |
-
"trl>=0.10",
|
| 35 |
-
"accelerate",
|
| 36 |
-
"datasets",
|
| 37 |
-
"huggingface_hub>=1.17",
|
| 38 |
-
"bitsandbytes",
|
| 39 |
-
"sentencepiece",
|
| 40 |
-
"safetensors",
|
| 41 |
-
)
|
| 42 |
-
)
|
| 43 |
-
|
| 44 |
-
hf_secret = modal.Secret.from_name("huggingface-secret")
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
@app.function(
|
| 48 |
-
image=train_image,
|
| 49 |
-
gpu="A10G",
|
| 50 |
-
timeout=60 * 60 * 3, # 3-hour hard cap
|
| 51 |
-
secrets=[hf_secret],
|
| 52 |
-
volumes={"/vol": volume},
|
| 53 |
-
)
|
| 54 |
-
def train():
|
| 55 |
-
import os
|
| 56 |
-
import torch
|
| 57 |
-
from datasets import load_dataset
|
| 58 |
-
from peft import LoraConfig, get_peft_model, TaskType
|
| 59 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
|
| 60 |
-
from trl import SFTTrainer, SFTConfig
|
| 61 |
-
|
| 62 |
-
os.environ.setdefault("HF_HOME", "/vol/hf_cache")
|
| 63 |
-
|
| 64 |
-
# MiniCPM4.1-8B custom code references is_torch_fx_available which was
|
| 65 |
-
# removed in transformers 5.x. Patch it back before loading the model.
|
| 66 |
-
import transformers.utils.import_utils as _iutils
|
| 67 |
-
if not hasattr(_iutils, "is_torch_fx_available"):
|
| 68 |
-
def _is_torch_fx_available():
|
| 69 |
-
try:
|
| 70 |
-
import torch.fx # noqa: F401
|
| 71 |
-
return True
|
| 72 |
-
except ImportError:
|
| 73 |
-
return False
|
| 74 |
-
_iutils.is_torch_fx_available = _is_torch_fx_available
|
| 75 |
-
|
| 76 |
-
# ---- Load tokenizer & model ----
|
| 77 |
-
print(f"Loading {BASE_MODEL}…")
|
| 78 |
-
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)
|
| 79 |
-
if tokenizer.pad_token is None:
|
| 80 |
-
tokenizer.pad_token = tokenizer.eos_token
|
| 81 |
-
|
| 82 |
-
model = AutoModelForCausalLM.from_pretrained(
|
| 83 |
-
BASE_MODEL,
|
| 84 |
-
torch_dtype=torch.bfloat16,
|
| 85 |
-
trust_remote_code=True,
|
| 86 |
-
device_map="cuda",
|
| 87 |
-
)
|
| 88 |
-
|
| 89 |
-
# ---- LoRA config ----
|
| 90 |
-
lora_cfg = LoraConfig(
|
| 91 |
-
task_type=TaskType.CAUSAL_LM,
|
| 92 |
-
r=16,
|
| 93 |
-
lora_alpha=32,
|
| 94 |
-
lora_dropout=0.05,
|
| 95 |
-
target_modules="all-linear",
|
| 96 |
-
bias="none",
|
| 97 |
-
)
|
| 98 |
-
model = get_peft_model(model, lora_cfg)
|
| 99 |
-
model.print_trainable_parameters()
|
| 100 |
-
|
| 101 |
-
# ---- Dataset ----
|
| 102 |
-
print(f"Loading dataset {SFT_DATASET_REPO}…")
|
| 103 |
-
ds = load_dataset(SFT_DATASET_REPO, split="train")
|
| 104 |
-
|
| 105 |
-
def _format(example):
|
| 106 |
-
return {"text": tokenizer.apply_chat_template(
|
| 107 |
-
example["messages"], tokenize=False, add_generation_prompt=False
|
| 108 |
-
)}
|
| 109 |
-
|
| 110 |
-
ds = ds.map(_format, remove_columns=ds.column_names)
|
| 111 |
-
|
| 112 |
-
# ---- Training ----
|
| 113 |
-
output_dir = "/vol/planner_out"
|
| 114 |
-
trainer = SFTTrainer(
|
| 115 |
-
model=model,
|
| 116 |
-
processing_class=tokenizer,
|
| 117 |
-
train_dataset=ds,
|
| 118 |
-
args=SFTConfig(
|
| 119 |
-
output_dir=output_dir,
|
| 120 |
-
num_train_epochs=3, # 2046 examples — 3 epochs converges without overfitting
|
| 121 |
-
per_device_train_batch_size=2,
|
| 122 |
-
gradient_accumulation_steps=4,
|
| 123 |
-
learning_rate=2e-4,
|
| 124 |
-
lr_scheduler_type="cosine",
|
| 125 |
-
warmup_ratio=0.05,
|
| 126 |
-
bf16=True,
|
| 127 |
-
logging_steps=20,
|
| 128 |
-
save_steps=200,
|
| 129 |
-
max_length=2048,
|
| 130 |
-
dataset_text_field="text",
|
| 131 |
-
),
|
| 132 |
-
)
|
| 133 |
-
trainer.train()
|
| 134 |
-
trainer.save_model(output_dir)
|
| 135 |
-
|
| 136 |
-
# ---- Merge LoRA + push ----
|
| 137 |
-
print("Merging LoRA adapter…")
|
| 138 |
-
from peft import PeftModel
|
| 139 |
-
|
| 140 |
-
base = AutoModelForCausalLM.from_pretrained(
|
| 141 |
-
BASE_MODEL, torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="cpu"
|
| 142 |
-
)
|
| 143 |
-
merged = PeftModel.from_pretrained(base, output_dir)
|
| 144 |
-
merged = merged.merge_and_unload()
|
| 145 |
-
|
| 146 |
-
# MiniCPM custom code declares `_tied_weights_keys` as a list, but
|
| 147 |
-
# transformers 5.x's save path calls `.keys()` on it. Patch the walker
|
| 148 |
-
# to tolerate both list and dict formats before saving/pushing.
|
| 149 |
-
import transformers.modeling_utils as _mu
|
| 150 |
-
|
| 151 |
-
def _safe_get_tied_weight_keys(model, *args, **kwargs):
|
| 152 |
-
keys = []
|
| 153 |
-
for module_name, module in model.named_modules():
|
| 154 |
-
tied = getattr(module, "_tied_weights_keys", None)
|
| 155 |
-
if not tied:
|
| 156 |
-
continue
|
| 157 |
-
names = tied.keys() if isinstance(tied, dict) else tied
|
| 158 |
-
for k in names:
|
| 159 |
-
keys.append(f"{module_name}.{k}" if module_name else k)
|
| 160 |
-
return keys
|
| 161 |
-
|
| 162 |
-
_mu._get_tied_weight_keys = _safe_get_tied_weight_keys
|
| 163 |
-
|
| 164 |
-
print(f"Pushing merged model to {OUTPUT_REPO}…")
|
| 165 |
-
merged.push_to_hub(OUTPUT_REPO, private=False)
|
| 166 |
-
tokenizer.push_to_hub(OUTPUT_REPO, private=False)
|
| 167 |
-
print("Done.")
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
@app.local_entrypoint()
|
| 171 |
-
def main():
|
| 172 |
-
train.remote()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
src/agents/progress_validator.py
DELETED
|
@@ -1,84 +0,0 @@
|
|
| 1 |
-
"""Progress validation agent: compare cooking photo against target step."""
|
| 2 |
-
from __future__ import annotations
|
| 3 |
-
|
| 4 |
-
import logging
|
| 5 |
-
from typing import Optional
|
| 6 |
-
|
| 7 |
-
import spaces
|
| 8 |
-
import torch
|
| 9 |
-
from PIL import Image
|
| 10 |
-
|
| 11 |
-
from src import config
|
| 12 |
-
from src.agents.mise_en_place import model, processor
|
| 13 |
-
from src.agents.recipe_planner import _extract_json
|
| 14 |
-
|
| 15 |
-
log = logging.getLogger(__name__)
|
| 16 |
-
|
| 17 |
-
_VALIDATOR_PROMPT = (config.PROMPTS_DIR / "validator_prompt.txt").read_text(encoding="utf-8")
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
@spaces.GPU(duration=45)
|
| 21 |
-
def validate(image: Optional[Image.Image], step_instruction: str) -> dict:
|
| 22 |
-
"""Compare a cooking-progress photo to the target step description.
|
| 23 |
-
|
| 24 |
-
Returns a dict with keys: verdict ('go'|'wait'|'fix'), feedback, tip.
|
| 25 |
-
"""
|
| 26 |
-
if image is None:
|
| 27 |
-
return {
|
| 28 |
-
"verdict": "wait",
|
| 29 |
-
"feedback": "No image provided.",
|
| 30 |
-
"tip": "Upload a photo of your cooking progress to get feedback.",
|
| 31 |
-
}
|
| 32 |
-
try:
|
| 33 |
-
img = image.convert("RGB")
|
| 34 |
-
prompt = _VALIDATOR_PROMPT.replace("{step_instruction}", step_instruction)
|
| 35 |
-
|
| 36 |
-
messages = [{"role": "user", "content": [
|
| 37 |
-
{"type": "image", "image": img},
|
| 38 |
-
{"type": "text", "text": prompt},
|
| 39 |
-
]}]
|
| 40 |
-
|
| 41 |
-
inputs = processor.apply_chat_template(
|
| 42 |
-
messages,
|
| 43 |
-
add_generation_prompt=True,
|
| 44 |
-
tokenize=True,
|
| 45 |
-
return_dict=True,
|
| 46 |
-
return_tensors="pt",
|
| 47 |
-
enable_thinking=False,
|
| 48 |
-
processor_kwargs={"downsample_mode": "16x", "max_slice_nums": 9, "use_image_id": True},
|
| 49 |
-
)
|
| 50 |
-
device = model.device
|
| 51 |
-
inputs = {k: v.to(device) if isinstance(v, torch.Tensor) else v for k, v in inputs.items()}
|
| 52 |
-
for k, v in inputs.items():
|
| 53 |
-
if isinstance(v, torch.Tensor) and torch.is_floating_point(v):
|
| 54 |
-
inputs[k] = v.to(dtype=torch.bfloat16)
|
| 55 |
-
|
| 56 |
-
with torch.no_grad():
|
| 57 |
-
generated_ids = model.generate(
|
| 58 |
-
**inputs,
|
| 59 |
-
max_new_tokens=256,
|
| 60 |
-
do_sample=False,
|
| 61 |
-
downsample_mode="16x",
|
| 62 |
-
)
|
| 63 |
-
|
| 64 |
-
trimmed = [out[len(inp):] for inp, out in zip(inputs["input_ids"], generated_ids)]
|
| 65 |
-
raw = processor.batch_decode(trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
|
| 66 |
-
log.info("validate raw: %s", raw[:400])
|
| 67 |
-
|
| 68 |
-
data = _extract_json(raw)
|
| 69 |
-
verdict = str(data.get("verdict", "wait"))
|
| 70 |
-
if verdict not in ("go", "wait", "fix"):
|
| 71 |
-
verdict = "wait"
|
| 72 |
-
|
| 73 |
-
return {
|
| 74 |
-
"verdict": verdict,
|
| 75 |
-
"feedback": str(data.get("feedback", "")),
|
| 76 |
-
"tip": str(data.get("tip", "")),
|
| 77 |
-
}
|
| 78 |
-
except Exception as exc:
|
| 79 |
-
log.warning("validate failed: %s", exc)
|
| 80 |
-
return {
|
| 81 |
-
"verdict": "wait",
|
| 82 |
-
"feedback": "Could not analyse the photo.",
|
| 83 |
-
"tip": "Make sure the image is well-lit and in focus.",
|
| 84 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
src/agents/recipe_planner.py
DELETED
|
@@ -1,167 +0,0 @@
|
|
| 1 |
-
"""Recipe planner agent: propose dishes + generate step-by-step recipe.
|
| 2 |
-
|
| 3 |
-
Uses openbmb/MiniCPM4.1-8B (text-only) as the primary planner.
|
| 4 |
-
Falls back to the shared vision model (MiniCPM-V-4.6) when the planner
|
| 5 |
-
model is unavailable (e.g. insufficient RAM on the Space).
|
| 6 |
-
"""
|
| 7 |
-
from __future__ import annotations
|
| 8 |
-
|
| 9 |
-
import json
|
| 10 |
-
import logging
|
| 11 |
-
import re
|
| 12 |
-
|
| 13 |
-
import spaces
|
| 14 |
-
import torch
|
| 15 |
-
|
| 16 |
-
from src import config
|
| 17 |
-
from src.pipeline import DishOption, Recipe, RecipeStep
|
| 18 |
-
|
| 19 |
-
log = logging.getLogger(__name__)
|
| 20 |
-
|
| 21 |
-
_PROPOSE_PROMPT = (config.PROMPTS_DIR / "planner_propose.txt").read_text(encoding="utf-8")
|
| 22 |
-
_RECIPE_PROMPT = (config.PROMPTS_DIR / "planner_recipe.txt").read_text(encoding="utf-8")
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
# ---------------------------------------------------------------------------
|
| 26 |
-
# JSON extraction helpers
|
| 27 |
-
# ---------------------------------------------------------------------------
|
| 28 |
-
|
| 29 |
-
def _extract_json(text: str) -> dict:
|
| 30 |
-
"""Robustly extract the first JSON object from raw model output."""
|
| 31 |
-
text = text.strip()
|
| 32 |
-
try:
|
| 33 |
-
return json.loads(text)
|
| 34 |
-
except Exception:
|
| 35 |
-
pass
|
| 36 |
-
# Markdown code-block
|
| 37 |
-
m = re.search(r"```(?:json)?\s*(\{.*?\})\s*```", text, re.DOTALL)
|
| 38 |
-
if m:
|
| 39 |
-
try:
|
| 40 |
-
return json.loads(m.group(1))
|
| 41 |
-
except Exception:
|
| 42 |
-
pass
|
| 43 |
-
# First {...} block with minor auto-fixes
|
| 44 |
-
m = re.search(r"\{.*\}", text, re.DOTALL)
|
| 45 |
-
if m:
|
| 46 |
-
candidate = m.group(0)
|
| 47 |
-
candidate = candidate.replace("'", '"')
|
| 48 |
-
candidate = re.sub(r",\s*([}\]])", r"\1", candidate)
|
| 49 |
-
try:
|
| 50 |
-
return json.loads(candidate)
|
| 51 |
-
except Exception:
|
| 52 |
-
pass
|
| 53 |
-
log.warning("Could not extract JSON from output (first 300 chars): %.300s", text)
|
| 54 |
-
return {}
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
# ---------------------------------------------------------------------------
|
| 58 |
-
# Inference dispatcher
|
| 59 |
-
# ---------------------------------------------------------------------------
|
| 60 |
-
|
| 61 |
-
def _infer(prompt: str, max_new_tokens: int = 1024, temperature: float = 0.0) -> str:
|
| 62 |
-
"""Run text inference.
|
| 63 |
-
|
| 64 |
-
Primary: the dedicated MiniCPM4.1-8B planner Modal endpoint (transformers
|
| 65 |
-
4.x). Falls back to the local vision model (text-only) if the endpoint is
|
| 66 |
-
unavailable or returns nothing.
|
| 67 |
-
"""
|
| 68 |
-
try:
|
| 69 |
-
import modal
|
| 70 |
-
cls = modal.Cls.from_name(config.PLANNER_MODAL_APP, config.PLANNER_MODAL_CLS)
|
| 71 |
-
out = cls().infer.remote(prompt, max_new_tokens=max_new_tokens, temperature=temperature)
|
| 72 |
-
if out and out.strip():
|
| 73 |
-
return out
|
| 74 |
-
log.warning("Planner endpoint returned empty — falling back to vision model.")
|
| 75 |
-
except Exception as exc:
|
| 76 |
-
log.warning("Planner endpoint call failed: %s — falling back to vision model.", exc)
|
| 77 |
-
|
| 78 |
-
# Fallback: use the vision model in text-only mode
|
| 79 |
-
log.warning("Using vision model as text fallback.")
|
| 80 |
-
from src.agents.mise_en_place import model as vis_model, processor as vis_proc
|
| 81 |
-
|
| 82 |
-
messages = [{"role": "user", "content": [{"type": "text", "text": prompt}]}]
|
| 83 |
-
inputs = vis_proc.apply_chat_template(
|
| 84 |
-
messages,
|
| 85 |
-
add_generation_prompt=True,
|
| 86 |
-
tokenize=True,
|
| 87 |
-
return_dict=True,
|
| 88 |
-
return_tensors="pt",
|
| 89 |
-
enable_thinking=False,
|
| 90 |
-
)
|
| 91 |
-
device = vis_model.device
|
| 92 |
-
inputs = {k: v.to(device) if isinstance(v, torch.Tensor) else v for k, v in inputs.items()}
|
| 93 |
-
for k, v in inputs.items():
|
| 94 |
-
if isinstance(v, torch.Tensor) and torch.is_floating_point(v):
|
| 95 |
-
inputs[k] = v.to(dtype=torch.bfloat16)
|
| 96 |
-
|
| 97 |
-
with torch.no_grad():
|
| 98 |
-
generated_ids = vis_model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False)
|
| 99 |
-
|
| 100 |
-
trimmed = [out[len(inp):] for inp, out in zip(inputs["input_ids"], generated_ids)]
|
| 101 |
-
return vis_proc.batch_decode(trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
# ---------------------------------------------------------------------------
|
| 105 |
-
# Public agent functions
|
| 106 |
-
# ---------------------------------------------------------------------------
|
| 107 |
-
|
| 108 |
-
@spaces.GPU(duration=90)
|
| 109 |
-
def propose_dishes(ingredients: list[str]) -> list[DishOption]:
|
| 110 |
-
"""Given detected ingredients, return up to 3 dish proposals."""
|
| 111 |
-
try:
|
| 112 |
-
prompt = _PROPOSE_PROMPT.replace("{ingredients}", ", ".join(ingredients))
|
| 113 |
-
raw = _infer(prompt, max_new_tokens=512, temperature=0.7)
|
| 114 |
-
log.info("propose_dishes raw: %.500s", raw)
|
| 115 |
-
data = _extract_json(raw)
|
| 116 |
-
options = data.get("options", [])
|
| 117 |
-
return [
|
| 118 |
-
DishOption(name=str(o.get("name", "Dish")), why=str(o.get("why", "")))
|
| 119 |
-
for o in options[:3]
|
| 120 |
-
if o.get("name")
|
| 121 |
-
] or [DishOption(name="Simple Stir-fry", why="Quick and adaptable to most ingredients.")]
|
| 122 |
-
except Exception as exc:
|
| 123 |
-
log.warning("propose_dishes failed: %s", exc)
|
| 124 |
-
return [DishOption(name="Simple Stir-fry", why="Quick and adaptable to most ingredients.")]
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
@spaces.GPU(duration=120)
|
| 128 |
-
def plan_recipe(dish_name: str, ingredients: list[str]) -> Recipe:
|
| 129 |
-
"""Generate a full step-by-step recipe for the chosen dish."""
|
| 130 |
-
try:
|
| 131 |
-
prompt = (
|
| 132 |
-
_RECIPE_PROMPT
|
| 133 |
-
.replace("{dish_name}", dish_name)
|
| 134 |
-
.replace("{ingredients}", ", ".join(ingredients))
|
| 135 |
-
)
|
| 136 |
-
raw = _infer(prompt, max_new_tokens=1024, temperature=0.0)
|
| 137 |
-
log.info("plan_recipe raw: %.800s", raw)
|
| 138 |
-
data = _extract_json(raw)
|
| 139 |
-
|
| 140 |
-
raw_steps = data.get("steps", [])
|
| 141 |
-
steps = []
|
| 142 |
-
for i, s in enumerate(raw_steps, start=1):
|
| 143 |
-
if not s.get("instruction"):
|
| 144 |
-
continue
|
| 145 |
-
tip_val = s.get("tip")
|
| 146 |
-
steps.append(RecipeStep(
|
| 147 |
-
n=int(s.get("n", i)),
|
| 148 |
-
instruction=str(s["instruction"]),
|
| 149 |
-
duration=str(s.get("duration", "5 min")),
|
| 150 |
-
tip=str(tip_val) if tip_val and str(tip_val).lower() not in ("null", "none") else None,
|
| 151 |
-
visual=str(s.get("visual", "")),
|
| 152 |
-
))
|
| 153 |
-
|
| 154 |
-
return Recipe(
|
| 155 |
-
name=str(data.get("name", dish_name)),
|
| 156 |
-
cuisine=str(data.get("cuisine", "International")),
|
| 157 |
-
servings=int(data.get("servings", 2)),
|
| 158 |
-
total_time_minutes=int(data.get("total_time_minutes", 30)),
|
| 159 |
-
final_dish_visual=str(data.get("final_dish_visual", "")),
|
| 160 |
-
steps=steps or [RecipeStep(n=1, instruction="Prepare and cook ingredients to taste.", duration="20 min")],
|
| 161 |
-
)
|
| 162 |
-
except Exception as exc:
|
| 163 |
-
log.warning("plan_recipe failed: %s", exc)
|
| 164 |
-
return Recipe(
|
| 165 |
-
name=dish_name,
|
| 166 |
-
steps=[RecipeStep(n=1, instruction="Prepare and cook ingredients to taste.", duration="20 min")],
|
| 167 |
-
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
src/agents/step_illustrator.py
DELETED
|
@@ -1,81 +0,0 @@
|
|
| 1 |
-
"""Step image generator — delegates to the deployed Modal FLUX.2 endpoint."""
|
| 2 |
-
from __future__ import annotations
|
| 3 |
-
|
| 4 |
-
import base64
|
| 5 |
-
import logging
|
| 6 |
-
from typing import Optional
|
| 7 |
-
|
| 8 |
-
from src import config
|
| 9 |
-
from src.pipeline import Recipe, RecipeStep
|
| 10 |
-
|
| 11 |
-
log = logging.getLogger(__name__)
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
# ---------------------------------------------------------------------------
|
| 15 |
-
# Helpers
|
| 16 |
-
# ---------------------------------------------------------------------------
|
| 17 |
-
|
| 18 |
-
def _b64(png_bytes: bytes) -> str:
|
| 19 |
-
return base64.b64encode(png_bytes).decode()
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
def _step_prompt(visual: str, cuisine: str, n: int) -> str:
|
| 23 |
-
desc = visual.strip() or f"cooking step {n}"
|
| 24 |
-
return (
|
| 25 |
-
f"Top-down photo of a kitchen pan or plate showing {desc}. "
|
| 26 |
-
f"{cuisine} home cooking. Warm natural lighting. "
|
| 27 |
-
"Recipe magazine style. Photorealistic. Appetizing."
|
| 28 |
-
)
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
def _dish_prompt(visual: str, cuisine: str) -> str:
|
| 32 |
-
desc = visual.strip() or "the finished plated dish, garnished and beautifully presented"
|
| 33 |
-
return (
|
| 34 |
-
f"Top-down photo of a {desc} on a rustic wooden table. "
|
| 35 |
-
f"{cuisine} home cooking. Warm natural lighting. "
|
| 36 |
-
"Recipe magazine style. Photorealistic. Appetizing."
|
| 37 |
-
)
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
# ---------------------------------------------------------------------------
|
| 41 |
-
# Modal call
|
| 42 |
-
# ---------------------------------------------------------------------------
|
| 43 |
-
|
| 44 |
-
def _call_modal(prompt: str, seed: int = 42) -> Optional[bytes]:
|
| 45 |
-
"""Call the deployed Modal FLUX endpoint. Returns PNG bytes or None."""
|
| 46 |
-
try:
|
| 47 |
-
import modal
|
| 48 |
-
cls = modal.Cls.from_name(config.MODAL_APP_NAME, config.MODAL_CLS_NAME)
|
| 49 |
-
return cls().render_step.remote(prompt, seed=seed)
|
| 50 |
-
except Exception as exc:
|
| 51 |
-
log.warning("Modal FLUX call failed: %s", exc)
|
| 52 |
-
return None
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
# ---------------------------------------------------------------------------
|
| 56 |
-
# Public function
|
| 57 |
-
# ---------------------------------------------------------------------------
|
| 58 |
-
|
| 59 |
-
def illustrate_recipe(recipe: Recipe) -> Recipe:
|
| 60 |
-
"""Generate FLUX images for every step + final dish.
|
| 61 |
-
|
| 62 |
-
Mutates and returns the same Recipe with image_b64 fields populated
|
| 63 |
-
(or left as None when Modal is unavailable).
|
| 64 |
-
"""
|
| 65 |
-
cuisine = recipe.cuisine or "International"
|
| 66 |
-
|
| 67 |
-
# Final dish hero image
|
| 68 |
-
final_bytes = _call_modal(_dish_prompt(recipe.final_dish_visual, cuisine), seed=0)
|
| 69 |
-
if final_bytes:
|
| 70 |
-
recipe.final_dish_image_b64 = _b64(final_bytes)
|
| 71 |
-
log.info("Generated final dish image.")
|
| 72 |
-
|
| 73 |
-
# Per-step images (sequential to respect GPU limits on Modal)
|
| 74 |
-
for step in recipe.steps:
|
| 75 |
-
prompt = _step_prompt(step.visual, cuisine, step.n)
|
| 76 |
-
step_bytes = _call_modal(prompt, seed=step.n)
|
| 77 |
-
if step_bytes:
|
| 78 |
-
step.image_b64 = _b64(step_bytes)
|
| 79 |
-
log.info("Generated image for step %d.", step.n)
|
| 80 |
-
|
| 81 |
-
return recipe
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
src/config.py
CHANGED
|
@@ -21,21 +21,10 @@ VISION_REPO = "openbmb/MiniCPM-V-4_6-GGUF"
|
|
| 21 |
VISION_MODEL_FILE = "MiniCPM-V-4_6-Q4_K_M.gguf"
|
| 22 |
VISION_MMPROJ_FILE = "mmproj-model-f16.gguf"
|
| 23 |
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
PLANNER_FINETUNED_REPO = os.environ.get("COOK_WITH_ME_PLANNER_FT_REPO", "") # set after fine-tune
|
| 27 |
|
| 28 |
-
|
| 29 |
-
MODAL_APP_NAME = "cook-with-me-flux"
|
| 30 |
-
MODAL_CLS_NAME = "FluxKlein"
|
| 31 |
-
|
| 32 |
-
# Planner runs in its own Modal app (transformers 4.x, conflicts with the
|
| 33 |
-
# vision model's transformers 5.x — so it can't live in the same container).
|
| 34 |
-
PLANNER_MODAL_APP = "cook-with-me-planner"
|
| 35 |
-
PLANNER_MODAL_CLS = "Planner"
|
| 36 |
-
|
| 37 |
-
FLUX_REPO = os.environ.get("COOK_WITH_ME_FLUX_REPO", "black-forest-labs/FLUX.2-klein-9B")
|
| 38 |
-
FLUX_FALLBACK_REPO = "black-forest-labs/FLUX.1-schnell"
|
| 39 |
NARRATOR_REPO = "openbmb/VoxCPM2"
|
| 40 |
EMBED_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
|
| 41 |
|
|
|
|
| 21 |
VISION_MODEL_FILE = "MiniCPM-V-4_6-Q4_K_M.gguf"
|
| 22 |
VISION_MMPROJ_FILE = "mmproj-model-f16.gguf"
|
| 23 |
|
| 24 |
+
PLANNER_REPO = "openbmb/MiniCPM-V-4-gguf"
|
| 25 |
+
PLANNER_MODEL_FILE = "Model-Q4_K_M.gguf"
|
|
|
|
| 26 |
|
| 27 |
+
FLUX_REPO = "black-forest-labs/FLUX.2-klein-9B"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
NARRATOR_REPO = "openbmb/VoxCPM2"
|
| 29 |
EMBED_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
|
| 30 |
|
src/data/__init__.py
DELETED
|
File without changes
|
src/data/nutrition.py
DELETED
|
@@ -1,112 +0,0 @@
|
|
| 1 |
-
"""Per-serving macro estimator — ingredient lookup, no extra model call needed."""
|
| 2 |
-
from __future__ import annotations
|
| 3 |
-
|
| 4 |
-
# (calories kcal, protein g, carbs g, fat g, fiber g) per 100 g
|
| 5 |
-
_MACROS: dict[str, tuple[float, float, float, float, float]] = {
|
| 6 |
-
# proteins
|
| 7 |
-
"chicken": (165, 31, 0, 3.6, 0),
|
| 8 |
-
"beef": (250, 26, 0, 16, 0),
|
| 9 |
-
"pork": (242, 27, 0, 14, 0),
|
| 10 |
-
"fish": (130, 20, 0, 5, 0),
|
| 11 |
-
"salmon": (208, 20, 0, 13, 0),
|
| 12 |
-
"tuna": (130, 29, 0, 0.5, 0),
|
| 13 |
-
"shrimp": (99, 24, 0, 0.3, 0),
|
| 14 |
-
"egg": (155, 13, 1.1, 11, 0),
|
| 15 |
-
"eggs": (155, 13, 1.1, 11, 0),
|
| 16 |
-
"tofu": (76, 8, 1.9, 4.8, 0.3),
|
| 17 |
-
# dairy
|
| 18 |
-
"milk": (61, 3.2, 4.8, 3.3, 0),
|
| 19 |
-
"cheese": (402, 25, 1.3, 33, 0),
|
| 20 |
-
"butter": (717, 0.9, 0.1, 81, 0),
|
| 21 |
-
"yogurt": (59, 3.5, 4.7, 3.3, 0),
|
| 22 |
-
"cream": (340, 2.1, 2.8, 36, 0),
|
| 23 |
-
# starches
|
| 24 |
-
"rice": (130, 2.7, 28, 0.3, 0.4),
|
| 25 |
-
"pasta": (158, 5.8, 31, 0.9, 1.8),
|
| 26 |
-
"bread": (265, 9, 49, 3.2, 2.7),
|
| 27 |
-
"potato": (77, 2, 17, 0.1, 2.2),
|
| 28 |
-
"potatoes": (77, 2, 17, 0.1, 2.2),
|
| 29 |
-
"flour": (364, 10, 76, 1, 2.7),
|
| 30 |
-
"oats": (389, 17, 66, 7, 10.6),
|
| 31 |
-
"quinoa": (120, 4.1, 21, 1.9, 2.8),
|
| 32 |
-
"lentils": (116, 9, 20, 0.4, 7.9),
|
| 33 |
-
"beans": (347, 21, 60, 1.2, 15),
|
| 34 |
-
"chickpeas": (164, 8.9, 27, 2.6, 7.6),
|
| 35 |
-
# vegetables
|
| 36 |
-
"tomato": (18, 0.9, 3.9, 0.2, 1.2),
|
| 37 |
-
"tomatoes": (18, 0.9, 3.9, 0.2, 1.2),
|
| 38 |
-
"onion": (40, 1.1, 9.3, 0.1, 1.7),
|
| 39 |
-
"onions": (40, 1.1, 9.3, 0.1, 1.7),
|
| 40 |
-
"garlic": (149, 6.4, 33, 0.5, 2.1),
|
| 41 |
-
"carrot": (41, 0.9, 10, 0.2, 2.8),
|
| 42 |
-
"carrots": (41, 0.9, 10, 0.2, 2.8),
|
| 43 |
-
"broccoli": (34, 2.8, 7, 0.4, 2.6),
|
| 44 |
-
"spinach": (23, 2.9, 3.6, 0.4, 2.2),
|
| 45 |
-
"pepper": (31, 1, 6, 0.3, 2.1),
|
| 46 |
-
"peppers": (31, 1, 6, 0.3, 2.1),
|
| 47 |
-
"mushroom": (22, 3.1, 3.3, 0.3, 1),
|
| 48 |
-
"mushrooms": (22, 3.1, 3.3, 0.3, 1),
|
| 49 |
-
"zucchini": (17, 1.2, 3.1, 0.3, 1),
|
| 50 |
-
"corn": (86, 3.3, 19, 1.4, 2.7),
|
| 51 |
-
"lettuce": (15, 1.4, 2.9, 0.2, 1.3),
|
| 52 |
-
"cucumber": (16, 0.7, 3.6, 0.1, 0.5),
|
| 53 |
-
"eggplant": (25, 1, 5.9, 0.2, 3),
|
| 54 |
-
"cabbage": (25, 1.3, 5.8, 0.1, 2.5),
|
| 55 |
-
"celery": (16, 0.7, 3, 0.2, 1.6),
|
| 56 |
-
"leek": (61, 1.5, 14, 0.3, 1.8),
|
| 57 |
-
# fruits
|
| 58 |
-
"apple": (52, 0.3, 14, 0.2, 2.4),
|
| 59 |
-
"banana": (89, 1.1, 23, 0.3, 2.6),
|
| 60 |
-
"lemon": (29, 1.1, 9.3, 0.3, 2.8),
|
| 61 |
-
"lime": (30, 0.7, 10.5, 0.2, 2.8),
|
| 62 |
-
"orange": (47, 0.9, 12, 0.1, 2.4),
|
| 63 |
-
# fats & condiments
|
| 64 |
-
"olive oil": (884, 0, 0, 100, 0),
|
| 65 |
-
"oil": (884, 0, 0, 100, 0),
|
| 66 |
-
"soy sauce": (53, 8.1, 4.9, 0.1, 0.8),
|
| 67 |
-
"honey": (304, 0.3, 82, 0, 0.2),
|
| 68 |
-
"sugar": (387, 0, 100, 0, 0),
|
| 69 |
-
"salt": (0, 0, 0, 0, 0),
|
| 70 |
-
"vinegar": (18, 0, 0.9, 0, 0),
|
| 71 |
-
}
|
| 72 |
-
|
| 73 |
-
# Typical portion weight per ingredient (grams)
|
| 74 |
-
_GRAMS: dict[str, int] = {
|
| 75 |
-
"egg": 50, "eggs": 100,
|
| 76 |
-
"butter": 15,
|
| 77 |
-
"olive oil": 14, "oil": 14,
|
| 78 |
-
"soy sauce": 15,
|
| 79 |
-
"salt": 3,
|
| 80 |
-
"garlic": 10,
|
| 81 |
-
"honey": 21,
|
| 82 |
-
"sugar": 12,
|
| 83 |
-
"lemon": 30, "lime": 30,
|
| 84 |
-
}
|
| 85 |
-
_DEFAULT_GRAMS = 80
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
def compute_nutrition(ingredients: list[str], servings: int = 2) -> dict[str, float]:
|
| 89 |
-
"""Return per-serving macro estimates keyed to the NutritionGrid format."""
|
| 90 |
-
cal = prot = carb = fat = fib = 0.0
|
| 91 |
-
for ing in ingredients:
|
| 92 |
-
key = ing.lower().strip()
|
| 93 |
-
row = _MACROS.get(key) or _MACROS.get(key.split()[0]) if key else None
|
| 94 |
-
if row is None:
|
| 95 |
-
continue
|
| 96 |
-
grams = _GRAMS.get(key, _DEFAULT_GRAMS)
|
| 97 |
-
f = grams / 100
|
| 98 |
-
c, p, cb, ft, fb = row
|
| 99 |
-
cal += c * f
|
| 100 |
-
prot += p * f
|
| 101 |
-
carb += cb * f
|
| 102 |
-
fat += ft * f
|
| 103 |
-
fib += fb * f
|
| 104 |
-
|
| 105 |
-
sv = max(servings, 1)
|
| 106 |
-
return {
|
| 107 |
-
"calories": round(cal / sv),
|
| 108 |
-
"protein_g": round(prot / sv, 1),
|
| 109 |
-
"carbs_g": round(carb / sv, 1),
|
| 110 |
-
"fat_g": round(fat / sv, 1),
|
| 111 |
-
"fiber_g": round(fib / sv, 1),
|
| 112 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
src/models/planner.py
DELETED
|
@@ -1,103 +0,0 @@
|
|
| 1 |
-
"""MiniCPM4.1-8B text-only planner — lazy singleton."""
|
| 2 |
-
from __future__ import annotations
|
| 3 |
-
|
| 4 |
-
import logging
|
| 5 |
-
import os
|
| 6 |
-
from typing import Any, Optional, Tuple
|
| 7 |
-
|
| 8 |
-
import torch
|
| 9 |
-
|
| 10 |
-
from src import config
|
| 11 |
-
|
| 12 |
-
log = logging.getLogger(__name__)
|
| 13 |
-
|
| 14 |
-
_model: Any = None
|
| 15 |
-
_tokenizer: Any = None
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
def get_planner() -> Tuple[Optional[Any], Optional[Any]]:
|
| 19 |
-
"""Return (model, tokenizer). Loads once; returns (None, None) on failure."""
|
| 20 |
-
global _model, _tokenizer
|
| 21 |
-
if _model is not None:
|
| 22 |
-
return _model, _tokenizer
|
| 23 |
-
|
| 24 |
-
# Prefer fine-tuned repo when available
|
| 25 |
-
model_id = config.PLANNER_FINETUNED_REPO or config.PLANNER_REPO
|
| 26 |
-
try:
|
| 27 |
-
# MiniCPM4.1 custom code imports is_torch_fx_available, which was
|
| 28 |
-
# removed in transformers 5.x. Patch it back before loading.
|
| 29 |
-
import transformers.utils.import_utils as _iutils
|
| 30 |
-
if not hasattr(_iutils, "is_torch_fx_available"):
|
| 31 |
-
def _is_torch_fx_available():
|
| 32 |
-
try:
|
| 33 |
-
import torch.fx # noqa: F401
|
| 34 |
-
return True
|
| 35 |
-
except ImportError:
|
| 36 |
-
return False
|
| 37 |
-
_iutils.is_torch_fx_available = _is_torch_fx_available
|
| 38 |
-
|
| 39 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 40 |
-
|
| 41 |
-
device_map = "auto" if os.environ.get("SPACE_ID") else (
|
| 42 |
-
"cuda" if torch.cuda.is_available() else "cpu"
|
| 43 |
-
)
|
| 44 |
-
log.info("Loading planner model %s (device_map=%s)...", model_id, device_map)
|
| 45 |
-
_tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
|
| 46 |
-
_model = AutoModelForCausalLM.from_pretrained(
|
| 47 |
-
model_id,
|
| 48 |
-
torch_dtype=torch.bfloat16,
|
| 49 |
-
trust_remote_code=True,
|
| 50 |
-
device_map=device_map,
|
| 51 |
-
).eval()
|
| 52 |
-
log.info("Planner model ready.")
|
| 53 |
-
except Exception as exc:
|
| 54 |
-
log.error("Could not load planner model '%s': %s", model_id, exc)
|
| 55 |
-
_model = None
|
| 56 |
-
_tokenizer = None
|
| 57 |
-
|
| 58 |
-
return _model, _tokenizer
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
def infer(prompt: str, max_new_tokens: int = 1024, temperature: float = 0.0) -> str:
|
| 62 |
-
"""Run text inference with the planner model.
|
| 63 |
-
|
| 64 |
-
Returns empty string if the model is unavailable.
|
| 65 |
-
"""
|
| 66 |
-
model, tokenizer = get_planner()
|
| 67 |
-
if model is None or tokenizer is None:
|
| 68 |
-
return ""
|
| 69 |
-
|
| 70 |
-
try:
|
| 71 |
-
messages = [{"role": "user", "content": prompt}]
|
| 72 |
-
|
| 73 |
-
# return_dict=True yields a BatchEncoding (dict-like) with input_ids +
|
| 74 |
-
# attention_mask. NOTE: BatchEncoding is NOT a `dict` instance, so we
|
| 75 |
-
# must access it via mapping keys, never via tensor attrs like .shape.
|
| 76 |
-
enc = tokenizer.apply_chat_template(
|
| 77 |
-
messages,
|
| 78 |
-
add_generation_prompt=True,
|
| 79 |
-
tokenize=True,
|
| 80 |
-
return_tensors="pt",
|
| 81 |
-
return_dict=True,
|
| 82 |
-
)
|
| 83 |
-
input_ids = enc["input_ids"].to(model.device)
|
| 84 |
-
input_len = input_ids.shape[1]
|
| 85 |
-
|
| 86 |
-
gen_inputs = {"input_ids": input_ids}
|
| 87 |
-
attn = enc.get("attention_mask")
|
| 88 |
-
if attn is not None:
|
| 89 |
-
gen_inputs["attention_mask"] = attn.to(model.device)
|
| 90 |
-
|
| 91 |
-
gen_kwargs: dict = dict(max_new_tokens=max_new_tokens, do_sample=False)
|
| 92 |
-
if temperature > 0:
|
| 93 |
-
gen_kwargs.update(do_sample=True, temperature=temperature, top_p=0.95)
|
| 94 |
-
|
| 95 |
-
with torch.no_grad():
|
| 96 |
-
output = model.generate(**gen_inputs, **gen_kwargs)
|
| 97 |
-
|
| 98 |
-
token_ids = output[0][input_len:]
|
| 99 |
-
return tokenizer.decode(token_ids, skip_special_tokens=True)
|
| 100 |
-
|
| 101 |
-
except Exception as exc:
|
| 102 |
-
log.error("Planner inference error: %r", exc, exc_info=True)
|
| 103 |
-
return ""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
src/pipeline.py
DELETED
|
@@ -1,32 +0,0 @@
|
|
| 1 |
-
"""Shared data models for the Cook-with-Me pipeline."""
|
| 2 |
-
from __future__ import annotations
|
| 3 |
-
|
| 4 |
-
from typing import Optional
|
| 5 |
-
from pydantic import BaseModel, Field
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
class DishOption(BaseModel):
|
| 9 |
-
name: str
|
| 10 |
-
why: str = ""
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
class RecipeStep(BaseModel):
|
| 14 |
-
n: int = 1
|
| 15 |
-
instruction: str
|
| 16 |
-
duration: str = "5 min"
|
| 17 |
-
tip: Optional[str] = None
|
| 18 |
-
visual: str = ""
|
| 19 |
-
image_path: Optional[str] = None
|
| 20 |
-
image_b64: Optional[str] = None # base64 PNG from FLUX
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
class Recipe(BaseModel):
|
| 24 |
-
name: str
|
| 25 |
-
cuisine: str = "International"
|
| 26 |
-
servings: int = 2
|
| 27 |
-
total_time_minutes: int = 30
|
| 28 |
-
steps: list[RecipeStep] = Field(default_factory=list)
|
| 29 |
-
nutrition: dict = Field(default_factory=dict)
|
| 30 |
-
final_dish_visual: str = ""
|
| 31 |
-
final_dish_image_path: Optional[str] = None
|
| 32 |
-
final_dish_image_b64: Optional[str] = None # base64 PNG from FLUX
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
src/prompts/planner_propose.txt
DELETED
|
@@ -1,11 +0,0 @@
|
|
| 1 |
-
You are a creative chef assistant. Given a list of available ingredients, suggest exactly 3 diverse and delicious dishes.
|
| 2 |
-
|
| 3 |
-
Available ingredients: {ingredients}
|
| 4 |
-
|
| 5 |
-
Rules:
|
| 6 |
-
- Each dish must be realistic to make with the listed ingredients
|
| 7 |
-
- Vary the style: aim for different cuisines or preparations
|
| 8 |
-
- Be specific with dish names (e.g., "Garlic Butter Shrimp Pasta" not "Pasta")
|
| 9 |
-
|
| 10 |
-
Respond ONLY with valid JSON and nothing else — no explanation, no markdown fences:
|
| 11 |
-
{"options": [{"name": "Dish Name 1", "why": "One sentence on why this works with the ingredients"}, {"name": "Dish Name 2", "why": "..."}, {"name": "Dish Name 3", "why": "..."}]}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
src/prompts/planner_recipe.txt
DELETED
|
@@ -1,11 +0,0 @@
|
|
| 1 |
-
You are a professional chef writing a clear, detailed recipe.
|
| 2 |
-
|
| 3 |
-
Dish to prepare: {dish_name}
|
| 4 |
-
Available ingredients: {ingredients}
|
| 5 |
-
|
| 6 |
-
Create a complete recipe with 4 to 7 steps. Each step must be specific and actionable.
|
| 7 |
-
|
| 8 |
-
Respond ONLY with valid JSON and nothing else — no explanation, no markdown fences:
|
| 9 |
-
{"name": "Full Recipe Title", "cuisine": "Cuisine type", "servings": 2, "total_time_minutes": 30, "final_dish_visual": "One evocative sentence describing how the finished dish looks and smells", "steps": [{"n": 1, "instruction": "Detailed step description.", "duration": "5 min", "tip": "Optional chef tip or null"}, {"n": 2, "instruction": "...", "duration": "3 min", "tip": null}]}
|
| 10 |
-
|
| 11 |
-
Important: tip must be a string or null, never omit it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
src/prompts/validator_prompt.txt
DELETED
|
@@ -1,14 +0,0 @@
|
|
| 1 |
-
You are a supportive cooking coach reviewing a student's progress photo.
|
| 2 |
-
|
| 3 |
-
The step they are working on:
|
| 4 |
-
"{step_instruction}"
|
| 5 |
-
|
| 6 |
-
Look carefully at the photo and decide:
|
| 7 |
-
- "go" → the step is correctly completed, they can move on
|
| 8 |
-
- "wait" → it's progressing but needs more time (undercooked, still mixing, etc.)
|
| 9 |
-
- "fix" → there is a clear mistake that needs correction right now
|
| 10 |
-
|
| 11 |
-
Respond ONLY with valid JSON and nothing else:
|
| 12 |
-
{"verdict": "go", "feedback": "One sentence describing exactly what you see in the photo.", "tip": "One specific, actionable piece of advice for the cook."}
|
| 13 |
-
|
| 14 |
-
verdict must be exactly one of: go, wait, fix.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
src/ui/components.py
CHANGED
|
@@ -80,7 +80,7 @@ class TemplatedHTML(gr.HTML):
|
|
| 80 |
class RecipeHero(TemplatedHTML):
|
| 81 |
css_template = """
|
| 82 |
.cwm-hero {
|
| 83 |
-
background: #fffbf0
|
| 84 |
border: 1px solid #d8c9ad;
|
| 85 |
border-radius: 16px;
|
| 86 |
padding: 32px;
|
|
@@ -94,15 +94,15 @@ class RecipeHero(TemplatedHTML):
|
|
| 94 |
background: #efe3c8;
|
| 95 |
}
|
| 96 |
.cwm-hero h1 {
|
| 97 |
-
font-family: 'Lora', serif; font-size: 38px; color: #6b4a2a
|
| 98 |
margin: 0 0 8px;
|
| 99 |
}
|
| 100 |
.cwm-hero .meta {
|
| 101 |
-
color: #8a6a3a
|
| 102 |
text-transform: uppercase; margin-bottom: 18px;
|
| 103 |
}
|
| 104 |
.cwm-hero .visual {
|
| 105 |
-
font-family: 'Lora', serif; font-style: italic; color: #6b4a2a
|
| 106 |
font-size: 17px; line-height: 1.55;
|
| 107 |
}
|
| 108 |
@media (max-width: 720px) { .cwm-hero { grid-template-columns: 1fr; } }
|
|
@@ -115,14 +115,11 @@ class RecipeHero(TemplatedHTML):
|
|
| 115 |
servings = state.get("servings") or 0
|
| 116 |
time = state.get("total_time_minutes") or 0
|
| 117 |
visual = html.escape(state.get("final_dish_visual") or "")
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
img_tag = f'<img src="/file={html.escape(img_path)}" alt="final dish"/>'
|
| 124 |
-
else:
|
| 125 |
-
img_tag = '<div style="background:#efe3c8;border-radius:12px;height:320px;display:flex;align-items:center;justify-content:center;color:#8a6a3a;font-family:\'Lora\',serif;font-style:italic;">Image will appear here</div>'
|
| 126 |
return f"""
|
| 127 |
<div class="cwm-hero">
|
| 128 |
<div>{img_tag}</div>
|
|
@@ -189,15 +186,15 @@ class IngredientChips(TemplatedHTML):
|
|
| 189 |
class DishOptions(TemplatedHTML):
|
| 190 |
css_template = """
|
| 191 |
.cwm-options { display: grid; grid-template-columns: repeat(3, 1fr); gap: 14px; }
|
| 192 |
-
.cwm-
|
| 193 |
-
background: #fffbf0
|
| 194 |
padding: 18px; text-align: left;
|
| 195 |
}
|
| 196 |
-
.cwm-
|
| 197 |
-
font-family: 'Lora', serif; font-size: 19px; color: #6b4a2a
|
| 198 |
margin: 0 0 6px;
|
| 199 |
}
|
| 200 |
-
.cwm-
|
| 201 |
@media (max-width: 720px) { .cwm-options { grid-template-columns: 1fr; } }
|
| 202 |
"""
|
| 203 |
|
|
@@ -220,32 +217,32 @@ class DishOptions(TemplatedHTML):
|
|
| 220 |
class StepCard(TemplatedHTML):
|
| 221 |
css_template = """
|
| 222 |
.cwm-steps { display: flex; flex-direction: column; gap: 16px; }
|
| 223 |
-
.cwm-
|
| 224 |
display: grid; grid-template-columns: 220px 1fr; gap: 22px;
|
| 225 |
-
background: #fffbf0
|
| 226 |
padding: 18px 22px;
|
| 227 |
}
|
| 228 |
-
.cwm-
|
| 229 |
width: 220px; height: 160px; object-fit: cover; border-radius: 8px;
|
| 230 |
background: #efe3c8;
|
| 231 |
}
|
| 232 |
-
.cwm-
|
| 233 |
width: 220px; height: 160px; border-radius: 8px;
|
| 234 |
background: linear-gradient(135deg,#efe3c8,#dccaa3);
|
| 235 |
display:flex; align-items:center; justify-content:center;
|
| 236 |
-
color: #8a6a3a
|
| 237 |
}
|
| 238 |
-
.cwm-
|
| 239 |
-
font-family: 'Lora', serif; color: #6b4a2a
|
| 240 |
}
|
| 241 |
-
.cwm-
|
| 242 |
-
.cwm-
|
| 243 |
-
display: inline-block; background: #a85c2a
|
| 244 |
border-radius: 999px; padding: 3px 10px; font-size: 12px; letter-spacing: 0.04em;
|
| 245 |
}
|
| 246 |
-
.cwm-
|
| 247 |
-
margin-top: 10px; padding: 10px 12px; background: #fff3d8
|
| 248 |
-
border-radius: 8px; font-size: 14px; color: #6b4a2a
|
| 249 |
}
|
| 250 |
.cwm-step .tip::before { content: "💡 "; }
|
| 251 |
@media (max-width: 720px) { .cwm-step { grid-template-columns: 1fr; } .cwm-step img, .cwm-step .placeholder { width: 100%; } }
|
|
@@ -263,14 +260,11 @@ class StepCard(TemplatedHTML):
|
|
| 263 |
dur = html.escape(s.get("duration", ""))
|
| 264 |
tip = s.get("tip")
|
| 265 |
visual = html.escape(s.get("visual", ""))
|
| 266 |
-
|
| 267 |
-
|
| 268 |
-
|
| 269 |
-
|
| 270 |
-
|
| 271 |
-
img_block = f'<img src="/file={html.escape(img_path)}" alt="step {n}"/>'
|
| 272 |
-
else:
|
| 273 |
-
img_block = f'<div class="placeholder">{visual[:80] if visual else f"Step {n}"}</div>'
|
| 274 |
tip_block = f'<div class="tip">{html.escape(tip)}</div>' if tip else ""
|
| 275 |
cards.append(f"""
|
| 276 |
<div class="cwm-step">
|
|
@@ -293,22 +287,22 @@ class NutritionGrid(TemplatedHTML):
|
|
| 293 |
css_template = """
|
| 294 |
.cwm-nutri-wrap { margin-top: 10px; }
|
| 295 |
.cwm-nutri-title {
|
| 296 |
-
font-family: 'Lora', serif; color: #6b4a2a
|
| 297 |
}
|
| 298 |
.cwm-nutri {
|
| 299 |
display: grid; grid-template-columns: repeat(5, 1fr); gap: 12px;
|
| 300 |
}
|
| 301 |
-
.cwm-nutri
|
| 302 |
-
background: #fffbf0
|
| 303 |
padding: 14px 10px; text-align: center;
|
| 304 |
}
|
| 305 |
-
.cwm-nutri
|
| 306 |
-
font-family: 'Lora', serif; font-size: 24px; font-weight: 700; color: #6b4a2a
|
| 307 |
display: block;
|
| 308 |
}
|
| 309 |
-
.cwm-nutri
|
| 310 |
font-size: 11px; letter-spacing: 0.08em; text-transform: uppercase;
|
| 311 |
-
color: #8a6a3a
|
| 312 |
}
|
| 313 |
@media (max-width: 720px) { .cwm-nutri { grid-template-columns: repeat(2, 1fr); } }
|
| 314 |
"""
|
|
@@ -343,7 +337,7 @@ class VerdictBadge(TemplatedHTML):
|
|
| 343 |
css_template = """
|
| 344 |
.cwm-verdict {
|
| 345 |
display: flex; align-items: center; gap: 18px;
|
| 346 |
-
background: #fffbf0
|
| 347 |
border: 1px solid #d8c9ad;
|
| 348 |
}
|
| 349 |
.cwm-verdict.go { border-left: 6px solid #4f8b4a; }
|
|
@@ -357,8 +351,8 @@ class VerdictBadge(TemplatedHTML):
|
|
| 357 |
.cwm-verdict.go .cwm-verdict-pill { background: #4f8b4a; }
|
| 358 |
.cwm-verdict.wait .cwm-verdict-pill { background: #d4a23c; }
|
| 359 |
.cwm-verdict.fix .cwm-verdict-pill { background: #b94a3a; }
|
| 360 |
-
.cwm-verdict-text { font-size: 16px; color: #4a3722
|
| 361 |
-
.cwm-verdict-text small { color: #8a6a3a
|
| 362 |
.cwm-verdict-empty {
|
| 363 |
color: #b39870; font-style: italic; padding: 14px 0;
|
| 364 |
}
|
|
|
|
| 80 |
class RecipeHero(TemplatedHTML):
|
| 81 |
css_template = """
|
| 82 |
.cwm-hero {
|
| 83 |
+
background: #fffbf0;
|
| 84 |
border: 1px solid #d8c9ad;
|
| 85 |
border-radius: 16px;
|
| 86 |
padding: 32px;
|
|
|
|
| 94 |
background: #efe3c8;
|
| 95 |
}
|
| 96 |
.cwm-hero h1 {
|
| 97 |
+
font-family: 'Lora', serif; font-size: 38px; color: #6b4a2a;
|
| 98 |
margin: 0 0 8px;
|
| 99 |
}
|
| 100 |
.cwm-hero .meta {
|
| 101 |
+
color: #8a6a3a; font-size: 14px; letter-spacing: 0.04em;
|
| 102 |
text-transform: uppercase; margin-bottom: 18px;
|
| 103 |
}
|
| 104 |
.cwm-hero .visual {
|
| 105 |
+
font-family: 'Lora', serif; font-style: italic; color: #6b4a2a;
|
| 106 |
font-size: 17px; line-height: 1.55;
|
| 107 |
}
|
| 108 |
@media (max-width: 720px) { .cwm-hero { grid-template-columns: 1fr; } }
|
|
|
|
| 115 |
servings = state.get("servings") or 0
|
| 116 |
time = state.get("total_time_minutes") or 0
|
| 117 |
visual = html.escape(state.get("final_dish_visual") or "")
|
| 118 |
+
img = state.get("final_dish_image_path") or ""
|
| 119 |
+
img_tag = (
|
| 120 |
+
f'<img src="/file={html.escape(img)}" alt="final dish"/>'
|
| 121 |
+
if img else '<div class="cwm-hero" style="background:#efe3c8;border-radius:12px;height:320px;"></div>'
|
| 122 |
+
)
|
|
|
|
|
|
|
|
|
|
| 123 |
return f"""
|
| 124 |
<div class="cwm-hero">
|
| 125 |
<div>{img_tag}</div>
|
|
|
|
| 186 |
class DishOptions(TemplatedHTML):
|
| 187 |
css_template = """
|
| 188 |
.cwm-options { display: grid; grid-template-columns: repeat(3, 1fr); gap: 14px; }
|
| 189 |
+
.cwm-option {
|
| 190 |
+
background: #fffbf0; border: 1px solid #d8c9ad; border-radius: 12px;
|
| 191 |
padding: 18px; text-align: left;
|
| 192 |
}
|
| 193 |
+
.cwm-option h3 {
|
| 194 |
+
font-family: 'Lora', serif; font-size: 19px; color: #6b4a2a;
|
| 195 |
margin: 0 0 6px;
|
| 196 |
}
|
| 197 |
+
.cwm-option p { color: #7a5a35; font-size: 14px; line-height: 1.45; margin: 0; }
|
| 198 |
@media (max-width: 720px) { .cwm-options { grid-template-columns: 1fr; } }
|
| 199 |
"""
|
| 200 |
|
|
|
|
| 217 |
class StepCard(TemplatedHTML):
|
| 218 |
css_template = """
|
| 219 |
.cwm-steps { display: flex; flex-direction: column; gap: 16px; }
|
| 220 |
+
.cwm-step {
|
| 221 |
display: grid; grid-template-columns: 220px 1fr; gap: 22px;
|
| 222 |
+
background: #fffbf0; border-left: 4px solid #a85c2a; border-radius: 10px;
|
| 223 |
padding: 18px 22px;
|
| 224 |
}
|
| 225 |
+
.cwm-step img {
|
| 226 |
width: 220px; height: 160px; object-fit: cover; border-radius: 8px;
|
| 227 |
background: #efe3c8;
|
| 228 |
}
|
| 229 |
+
.cwm-step .placeholder {
|
| 230 |
width: 220px; height: 160px; border-radius: 8px;
|
| 231 |
background: linear-gradient(135deg,#efe3c8,#dccaa3);
|
| 232 |
display:flex; align-items:center; justify-content:center;
|
| 233 |
+
color: #8a6a3a; font-family: 'Lora', serif; font-size: 14px;
|
| 234 |
}
|
| 235 |
+
.cwm-step h3 {
|
| 236 |
+
font-family: 'Lora', serif; color: #6b4a2a; margin: 0 0 6px; font-size: 22px;
|
| 237 |
}
|
| 238 |
+
.cwm-step p { font-size: 16px; line-height: 1.55; color: #4a3722; margin: 0 0 8px; }
|
| 239 |
+
.cwm-step .duration {
|
| 240 |
+
display: inline-block; background: #a85c2a; color: #fffbf0;
|
| 241 |
border-radius: 999px; padding: 3px 10px; font-size: 12px; letter-spacing: 0.04em;
|
| 242 |
}
|
| 243 |
+
.cwm-step .tip {
|
| 244 |
+
margin-top: 10px; padding: 10px 12px; background: #fff3d8;
|
| 245 |
+
border-radius: 8px; font-size: 14px; color: #6b4a2a;
|
| 246 |
}
|
| 247 |
.cwm-step .tip::before { content: "💡 "; }
|
| 248 |
@media (max-width: 720px) { .cwm-step { grid-template-columns: 1fr; } .cwm-step img, .cwm-step .placeholder { width: 100%; } }
|
|
|
|
| 260 |
dur = html.escape(s.get("duration", ""))
|
| 261 |
tip = s.get("tip")
|
| 262 |
visual = html.escape(s.get("visual", ""))
|
| 263 |
+
img = s.get("image_path")
|
| 264 |
+
img_block = (
|
| 265 |
+
f'<img src="/file={html.escape(img)}" alt="step {n}"/>'
|
| 266 |
+
if img else f'<div class="placeholder">{visual[:80]}</div>'
|
| 267 |
+
)
|
|
|
|
|
|
|
|
|
|
| 268 |
tip_block = f'<div class="tip">{html.escape(tip)}</div>' if tip else ""
|
| 269 |
cards.append(f"""
|
| 270 |
<div class="cwm-step">
|
|
|
|
| 287 |
css_template = """
|
| 288 |
.cwm-nutri-wrap { margin-top: 10px; }
|
| 289 |
.cwm-nutri-title {
|
| 290 |
+
font-family: 'Lora', serif; color: #6b4a2a; font-size: 22px; margin: 0 0 14px;
|
| 291 |
}
|
| 292 |
.cwm-nutri {
|
| 293 |
display: grid; grid-template-columns: repeat(5, 1fr); gap: 12px;
|
| 294 |
}
|
| 295 |
+
.cwm-nutri-cell {
|
| 296 |
+
background: #fffbf0; border: 1px solid #d8c9ad; border-radius: 10px;
|
| 297 |
padding: 14px 10px; text-align: center;
|
| 298 |
}
|
| 299 |
+
.cwm-nutri-cell .v {
|
| 300 |
+
font-family: 'Lora', serif; font-size: 24px; font-weight: 700; color: #6b4a2a;
|
| 301 |
display: block;
|
| 302 |
}
|
| 303 |
+
.cwm-nutri-cell .l {
|
| 304 |
font-size: 11px; letter-spacing: 0.08em; text-transform: uppercase;
|
| 305 |
+
color: #8a6a3a; margin-top: 4px;
|
| 306 |
}
|
| 307 |
@media (max-width: 720px) { .cwm-nutri { grid-template-columns: repeat(2, 1fr); } }
|
| 308 |
"""
|
|
|
|
| 337 |
css_template = """
|
| 338 |
.cwm-verdict {
|
| 339 |
display: flex; align-items: center; gap: 18px;
|
| 340 |
+
background: #fffbf0; border-radius: 12px; padding: 18px 22px;
|
| 341 |
border: 1px solid #d8c9ad;
|
| 342 |
}
|
| 343 |
.cwm-verdict.go { border-left: 6px solid #4f8b4a; }
|
|
|
|
| 351 |
.cwm-verdict.go .cwm-verdict-pill { background: #4f8b4a; }
|
| 352 |
.cwm-verdict.wait .cwm-verdict-pill { background: #d4a23c; }
|
| 353 |
.cwm-verdict.fix .cwm-verdict-pill { background: #b94a3a; }
|
| 354 |
+
.cwm-verdict-text { font-size: 16px; color: #4a3722; line-height: 1.5; }
|
| 355 |
+
.cwm-verdict-text small { color: #8a6a3a; display: block; margin-top: 4px; }
|
| 356 |
.cwm-verdict-empty {
|
| 357 |
color: #b39870; font-style: italic; padding: 14px 0;
|
| 358 |
}
|
src/ui/components.pyi
CHANGED
|
@@ -63,14 +63,11 @@ class RecipeHero(TemplatedHTML):
|
|
| 63 |
servings = state.get("servings") or 0
|
| 64 |
time = state.get("total_time_minutes") or 0
|
| 65 |
visual = html.escape(state.get("final_dish_visual") or "")
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
img_tag = f'<img src="/file={html.escape(img_path)}" alt="final dish"/>'
|
| 72 |
-
else:
|
| 73 |
-
img_tag = '<div style="background:#efe3c8;border-radius:12px;height:320px;display:flex;align-items:center;justify-content:center;color:#8a6a3a;font-family:\'Lora\',serif;font-style:italic;">Image will appear here</div>'
|
| 74 |
return f"""
|
| 75 |
<div class="cwm-hero">
|
| 76 |
<div>{img_tag}</div>
|
|
|
|
| 63 |
servings = state.get("servings") or 0
|
| 64 |
time = state.get("total_time_minutes") or 0
|
| 65 |
visual = html.escape(state.get("final_dish_visual") or "")
|
| 66 |
+
img = state.get("final_dish_image_path") or ""
|
| 67 |
+
img_tag = (
|
| 68 |
+
f'<img src="/file={html.escape(img)}" alt="final dish"/>'
|
| 69 |
+
if img else '<div class="cwm-hero" style="background:#efe3c8;border-radius:12px;height:320px;"></div>'
|
| 70 |
+
)
|
|
|
|
|
|
|
|
|
|
| 71 |
return f"""
|
| 72 |
<div class="cwm-hero">
|
| 73 |
<div>{img_tag}</div>
|
src/ui/theme.py
CHANGED
|
@@ -13,64 +13,10 @@ theme = gr.themes.Soft(
|
|
| 13 |
|
| 14 |
CSS = """
|
| 15 |
@import url('https://fonts.googleapis.com/css2?family=Lora:wght@400;700&display=swap');
|
| 16 |
-
|
| 17 |
-
/* ---------------------------------------------------------------------------
|
| 18 |
-
Force a warm light palette regardless of the browser/system dark mode.
|
| 19 |
-
We pin the parchment background, so we must also pin DARK text colours via
|
| 20 |
-
Gradio's CSS variables — otherwise dark-mode users get white text on the
|
| 21 |
-
light background and it disappears.
|
| 22 |
-
--------------------------------------------------------------------------- */
|
| 23 |
-
.gradio-container, .gradio-container.dark {
|
| 24 |
-
background: #f5ecd9 !important;
|
| 25 |
-
color-scheme: light !important;
|
| 26 |
-
|
| 27 |
-
--body-text-color: #4a3722;
|
| 28 |
-
--body-text-color-subdued: #7a5a35;
|
| 29 |
-
--block-title-text-color: #6b4a2a;
|
| 30 |
-
--block-label-text-color: #6b4a2a;
|
| 31 |
-
--block-info-text-color: #7a5a35;
|
| 32 |
-
--block-background-fill: #fffbf0;
|
| 33 |
-
--input-background-fill: #fffbf0;
|
| 34 |
-
--border-color-primary: #d8c9ad;
|
| 35 |
-
--color-accent-soft: #fbe2d2;
|
| 36 |
-
}
|
| 37 |
-
|
| 38 |
-
/* Blanket dark text for native Gradio text elements (covers dark mode) */
|
| 39 |
-
.gradio-container,
|
| 40 |
-
.gradio-container .prose,
|
| 41 |
-
.gradio-container label,
|
| 42 |
-
.gradio-container .gr-text,
|
| 43 |
-
.gradio-container span,
|
| 44 |
-
.gradio-container p,
|
| 45 |
-
.gradio-container .gr-check-radio label,
|
| 46 |
-
.gradio-container .wrap,
|
| 47 |
-
.gradio-container .gr-form,
|
| 48 |
-
.gradio-container .tab-nav button,
|
| 49 |
-
.gradio-container .gr-accordion,
|
| 50 |
-
.gradio-container input,
|
| 51 |
-
.gradio-container textarea {
|
| 52 |
-
color: #4a3722 !important;
|
| 53 |
-
}
|
| 54 |
-
|
| 55 |
.gradio-container .prose h1,
|
| 56 |
.gradio-container .prose h2,
|
| 57 |
-
.gradio-container .prose h3 { font-family: 'Lora', serif !important; color: #6b4a2a
|
| 58 |
-
|
| 59 |
-
/* Tabs: dark labels, terracotta active */
|
| 60 |
-
.gradio-container .tab-nav button { color: #6b4a2a !important; }
|
| 61 |
-
.gradio-container .tab-nav button.selected {
|
| 62 |
-
color: #a85c2a !important; border-bottom-color: #a85c2a !important;
|
| 63 |
-
}
|
| 64 |
-
|
| 65 |
-
/* Native blocks (inputs, radio, checkbox, number) on warm cards */
|
| 66 |
-
.gradio-container .block,
|
| 67 |
-
.gradio-container .form,
|
| 68 |
-
.gradio-container input[type="text"],
|
| 69 |
-
.gradio-container input[type="number"] {
|
| 70 |
-
background: #fffbf0 !important;
|
| 71 |
-
border-color: #d8c9ad !important;
|
| 72 |
-
}
|
| 73 |
-
|
| 74 |
/* Generic container shared by every HTMLComponent */
|
| 75 |
.cwm-card {
|
| 76 |
border: 1px solid #d8c9ad;
|
|
@@ -80,7 +26,6 @@ CSS = """
|
|
| 80 |
}
|
| 81 |
button.primary, .gr-button-primary {
|
| 82 |
background: #a85c2a !important;
|
| 83 |
-
color: #fffbf0 !important;
|
| 84 |
font-weight: 600 !important;
|
| 85 |
font-size: 16px !important;
|
| 86 |
padding: 12px 22px !important;
|
|
|
|
| 13 |
|
| 14 |
CSS = """
|
| 15 |
@import url('https://fonts.googleapis.com/css2?family=Lora:wght@400;700&display=swap');
|
| 16 |
+
.gradio-container { background: #f5ecd9 !important; }
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
.gradio-container .prose h1,
|
| 18 |
.gradio-container .prose h2,
|
| 19 |
+
.gradio-container .prose h3 { font-family: 'Lora', serif !important; color: #6b4a2a; }
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
/* Generic container shared by every HTMLComponent */
|
| 21 |
.cwm-card {
|
| 22 |
border: 1px solid #d8c9ad;
|
|
|
|
| 26 |
}
|
| 27 |
button.primary, .gr-button-primary {
|
| 28 |
background: #a85c2a !important;
|
|
|
|
| 29 |
font-weight: 600 !important;
|
| 30 |
font-size: 16px !important;
|
| 31 |
padding: 12px 22px !important;
|