Spaces:

build-small-hackathon
/

Cook_with_a_LLM

Paused

App Files Files Community

strategy

by Fred1e4 - opened Jun 12

base: refs/heads/main

←

from: refs/pr/4

Discussion Files changed

+182

-4130

This PR is in draft mode

Files changed (30) hide show

.gitignore +0 -24
README.md +13 -108
Strategy/arquitectura.html +0 -668
Strategy/estrategia.md +0 -496
Strategy/plan.md +0 -245
Strategy/plan_implementacion.md +0 -674
app.py +84 -212
modal_app/__init__.py +0 -0
modal_app/flux_endpoint.py +0 -124
modal_app/planner_endpoint.py +0 -117
modal_app/serve_app.py +0 -102
packages.txt +0 -2
requirements.txt +8 -15
scripts/build_recipe_dataset.py +0 -281
scripts/diag_planner.py +0 -73
scripts/train_planner.py +0 -172
src/agents/progress_validator.py +0 -84
src/agents/recipe_planner.py +0 -167
src/agents/step_illustrator.py +0 -81
src/config.py +3 -14
src/data/__init__.py +0 -0
src/data/nutrition.py +0 -112
src/models/planner.py +0 -103
src/pipeline.py +0 -32
src/prompts/planner_propose.txt +0 -11
src/prompts/planner_recipe.txt +0 -11
src/prompts/validator_prompt.txt +0 -14
src/ui/components.py +42 -48
src/ui/components.pyi +5 -8
src/ui/theme.py +27 -132

.gitignore DELETED Viewed

@@ -1,24 +0,0 @@
-# Python
-__pycache__/
-*.py[cod]
-*.egg-info/
-.venv/
-venv/
-# Generated data (SFT dataset lives on HF Hub: eldinosaur/cook-with-me-recipes-sft)
-data/*.parquet
-data/*.jsonl
-data/*.png
-data/*.npy
-data/*.csv
-# Local caches / model weights
-*.gguf
-.cache/
-assets/*.png
-# OS / editor
-.DS_Store
-Thumbs.db
-.idea/
-.vscode/

README.md CHANGED Viewed

@@ -1,108 +1,13 @@
----
-title: Cook With A LLM
-emoji: 🍲
-colorFrom: red
-colorTo: yellow
-sdk: gradio
-sdk_version: 6.15.2
-python_version: '3.12'
-app_file: app.py
-pinned: false
-license: apache-2.0
-tags:
-  - backyard-ai
-  - well-tuned
-  - off-brand
-  - sharing-is-caring
-  - field-notes
----
-# 🍲 Cook With Me — Multimodal Sous-Chef
-> *Snap your fridge. Pick a dish. Cook step by step. Check your progress with a photo.*
-A closed-loop multimodal cooking assistant built for the **Hugging Face Small Models / Big Adventures Hackathon (June 2026)**.
----
-# Contributors
-1. **eldinosaur** - Carlos Castañeda Mora
-1. **Fred1e4** - Fredin Vazquez
----
-## 🔗 Links
-- 🎥 **Demo video:** https://youtube.com/shorts/c3PikNvKAjQ
-- 📱 **Social post:** https://www.instagram.com/fd_albert14/p/DZnz-oaGorr/
-- 🤗 **Live Space:** https://huggingface.co/spaces/build-small-hackathon/Cook_with_a_LLM
-- 🧠 **Fine-tuned planner:** https://huggingface.co/eldinosaur/cook-with-me-planner-8b
-- 📊 **SFT dataset:** https://huggingface.co/datasets/eldinosaur/cook-with-me-recipes-sft
----
-## How it works
-```
-📸 Fridge photo  ──▶  [Vision Agent]          identify ingredients
-                            │
-                            ▼
-                      [Recipe Planner]         propose 3 dishes → full recipe JSON
-                            │
-                            ▼
-                      [Nutrition Engine]       per-serving macros (lookup, no hallucination)
-                            │
-                            ▼
-📸 Progress photo ──▶  [Progress Validator]    go / wait / fix verdict
-```
-1. **Snap** your fridge or pantry — the fine-tuned vision model identifies every ingredient.
-2. **Pick** one of three AI-suggested dishes tailored to what you have.
-3. **Cook** step by step with a generated recipe and per-serving nutrition info.
-4. **Check** your progress by uploading a photo of your pan — the model tells you *go*, *wait*, or *fix*.
----
-## Models
-| Role | Model | Params | Runtime |
-|---|---|---|---|
-| Vision — ingredients + progress validation | `openbmb/MiniCPM-V-4.6` (fine-tuned) | ~4.6B | `transformers` / ZeroGPU |
-| Recipe planner — dishes + recipe JSON | `openbmb/MiniCPM4.1-8B` → [`eldinosaur/cook-with-me-planner-8b`](https://huggingface.co/eldinosaur/cook-with-me-planner-8b) (fine-tuned) | ~8B | Modal (transformers 4.x) |
-| Step illustrator — per-step images | `FLUX.2-klein-9B` (SDXL-Turbo fallback) | ~9B | Modal (L4) |
-**Total: ~21.6B parameters** (≤ 32B cap ✓)
-**Two models are fine-tuned:** the vision model on fridge/pantry photos for ingredient
-detection, and the planner on **2,046 recipe pairs** for reliable recipe-JSON generation.
-The planner and illustrator run on dedicated **Modal** GPU endpoints (the planner needs
-`transformers` 4.x while the vision model needs 5.x, so they live in separate containers).
----
-## Badges targeted
-| Badge | Status | How |
-|---|---|---|
-| 🎯 Well-Tuned | ✓ | **Two** fine-tuned models on Hub: MiniCPM-V-4.6 (ingredient detection) + MiniCPM4.1-8B (recipe planner, SFT on 2,046 pairs) |
-| 🎨 Off-Brand | ✓ | Custom recipe-card UI with bespoke CSS components (chips, dish cards, step cards, nutrition pills) |
-| 📡 Sharing is Caring | ✓ | Agent traces shared on Hub |
-| 📓 Field Notes | ✓ | Blog post: "Building a closed-loop visual cooking coach" |
----
-## Architecture highlights
-- **Specialized small models, one pipeline:** a fine-tuned vision model for ingredients/progress, a separately fine-tuned 8B planner for recipe JSON, and a diffusion model for step images — each on the runtime it needs (ZeroGPU + Modal endpoints).
-- **Closed-loop visual validation:** the planner writes the steps → the illustrator renders each step → user cooks → the vision model compares the pan photo and returns *go / wait / fix* — a real agent loop, not a wrapper.
-- **Hallucination-free nutrition:** macros come from a lookup table, not LLM arithmetic.
-- **Robust JSON extraction:** multi-strategy parser handles markdown fences, single quotes, and trailing commas so generation failures degrade gracefully.
----
-## Track
-**Chapter One — Backyard AI** · "Build something for someone you actually know."
-Submission for the Hugging Face Hackathon · June 5–15, 2026.

+---
+title: Cook With A LLM
+emoji: 🐠
+colorFrom: pink
+colorTo: pink
+sdk: gradio
+sdk_version: 6.15.2
+python_version: '3.12'
+app_file: app.py
+pinned: false
+---
+Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Strategy/arquitectura.html DELETED Viewed

@@ -1,668 +0,0 @@
-<!DOCTYPE html>
-<html lang="es">
-<head>
-<meta charset="UTF-8" />
-<meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<title>Cocina Conmigo — Plan visual del proyecto</title>
-<style>
-  :root {
-    --bg: #f5ecd9;
-    --card: #fffbf0;
-    --ink: #2b2018;
-    --accent: #a85c2a;       /* terracotta */
-    --accent-soft: #f6dccc;
-    --accent2: #6b4a2a;
-    --gold: #c9962b;
-    --green: #3f7a3a;
-    --green-soft: #dbe9d8;
-    --red: #b03a2e;
-    --red-soft: #f4d6d2;
-    --gray: #8a7e6f;
-    --line: #d8c9ad;
-  }
-  * { box-sizing: border-box; }
-  body {
-    font-family: 'Inter', -apple-system, sans-serif;
-    background: var(--bg);
-    color: var(--ink);
-    margin: 0;
-    padding: 32px 16px 80px;
-    line-height: 1.55;
-  }
-  .wrap { max-width: 1240px; margin: 0 auto; }
-  h1 { font-family: 'Lora', Georgia, serif; font-size: 46px; margin: 0 0 4px;
-       letter-spacing: -0.5px; font-weight: 700; }
-  h1 em { color: var(--accent); font-style: italic; }
-  .subtitle { color: var(--accent2); font-style: italic; margin-bottom: 28px; font-size: 17px; }
-  h2 {
-    margin-top: 56px; border-top: 1px dashed var(--line); padding-top: 24px;
-    font-size: 26px; font-family: 'Lora', Georgia, serif; letter-spacing: 0.3px;
-  }
-  h2 .num {
-    color: var(--accent); font-family: ui-monospace, monospace;
-    font-size: 20px; margin-right: 10px;
-  }
-  h3 { font-size: 18px; margin-top: 28px; color: var(--accent2); font-family: 'Lora', Georgia, serif; }
-  /* Hero */
-  .hero {
-    background: var(--card); border: 2px solid var(--ink); border-radius: 14px;
-    padding: 30px 32px; display: grid; grid-template-columns: 1fr; gap: 18px;
-  }
-  @media(min-width: 760px){ .hero { grid-template-columns: 2fr 1fr; align-items: center; } }
-  .hero h2 { border:0; margin:0 0 6px; padding:0; font-size: 22px; }
-  .hero .quote {
-    font-style: italic; font-size: 17px; color: var(--accent2);
-    border-left: 3px solid var(--accent); padding-left: 14px; margin: 6px 0 0;
-  }
-  .hero .target {
-    background: #fff3cf; border-radius: 12px; padding: 14px 16px;
-    font-size: 13px; border: 1px solid var(--line); line-height: 1.55;
-  }
-  .hero .target strong { color: var(--accent); }
-  /* Pills */
-  .pill {
-    display: inline-block; padding: 2px 9px; border-radius: 12px;
-    color: white; font-size: 12px; margin: 2px 4px 2px 0; font-family: ui-monospace, monospace;
-  }
-  .pill.user    { background: var(--gray); }
-  .pill.gradio  { background: var(--accent); }
-  .pill.hf      { background: var(--gold); }
-  .pill.modal   { background: var(--green); }
-  .pill.flux    { background: #111; }
-  .pill.openbmb { background: #075e54; }
-  .pill.cohere  { background: #5e3aa3; }
-  .pill.openai  { background: #2c5e8a; }
-  .pill.llama   { background: #6a3d8a; }
-  /* Phone/recipe card mockup */
-  .phone-row {
-    display: grid; grid-template-columns: 1fr; gap: 18px; margin-top: 16px;
-  }
-  @media(min-width: 760px){ .phone-row { grid-template-columns: repeat(4, 1fr); } }
-  .phone {
-    background: #111; border-radius: 24px; padding: 8px;
-    box-shadow: 0 8px 22px rgba(0,0,0,0.18);
-  }
-  .phone .screen {
-    background: #fffbf0; border-radius: 18px; overflow: hidden;
-    height: 380px; display: flex; flex-direction: column;
-  }
-  .phone .topbar {
-    background: var(--accent); color: white; padding: 10px 14px;
-    font-size: 13px; font-family: 'Lora', serif;
-  }
-  .phone .body { padding: 12px; flex: 1; overflow-y: auto; font-size: 12px; }
-  .phone .body .illu {
-    width: 100%; aspect-ratio: 4/3; border-radius: 8px;
-    background: linear-gradient(135deg, #ffd28b 0%, #c97a3e 100%);
-    display: flex; align-items: center; justify-content: center;
-    font-size: 48px; box-shadow: 0 2px 8px rgba(0,0,0,0.1); margin-bottom: 8px;
-  }
-  .phone .body p { margin: 6px 0; line-height: 1.5; }
-  .phone .body .voice {
-    background: var(--green-soft); border-radius: 6px; padding: 6px 10px;
-    margin-top: 8px; font-size: 11px; color: var(--green);
-  }
-  .phone .body .tip {
-    background: var(--red-soft); border-radius: 6px; padding: 6px 10px;
-    margin-top: 6px; font-size: 11px; color: var(--red);
-  }
-  .scenario-label {
-    text-align: center; font-size: 13px; color: var(--accent2);
-    margin-top: 8px; font-style: italic;
-  }
-  /* SVG */
-  svg { width: 100%; height: auto; display: block; }
-  .node-box { fill: var(--card); stroke: var(--ink); stroke-width: 1.5; }
-  .node-text { font-family: 'Inter', sans-serif; font-size: 14px; fill: var(--ink); }
-  .node-title { font-weight: 700; font-size: 15px; }
-  .node-sub { font-size: 11px; fill: var(--accent2); font-style: italic; }
-  .arrow { stroke: var(--ink); stroke-width: 1.8; fill: none; }
-  .arrow-label { font-size: 11px; fill: var(--accent2); font-family: ui-monospace, monospace; }
-  .dashed { stroke-dasharray: 6 4; }
-  .arrow-loop { stroke: var(--accent); stroke-width: 2.2; fill: none; }
-  /* Cards */
-  .grid-2 { display: grid; grid-template-columns: 1fr; gap: 18px; margin-top: 16px; }
-  @media(min-width: 880px){ .grid-2 { grid-template-columns: 1fr 1fr; } }
-  .grid-3 { display: grid; grid-template-columns: 1fr; gap: 14px; margin-top: 14px; }
-  @media(min-width: 760px){ .grid-3 { grid-template-columns: repeat(3, 1fr); } }
-  .card {
-    background: var(--card); border: 1px solid var(--line);
-    border-radius: 10px; padding: 18px 20px;
-  }
-  .card.pick { border: 2px solid var(--accent); }
-  .pick-tag {
-    display: inline-block; background: var(--accent); color: white;
-    font-family: ui-monospace, monospace; font-size: 11px;
-    padding: 1px 7px; border-radius: 10px; margin-bottom: 6px;
-  }
-  table {
-    width: 100%; border-collapse: collapse; background: var(--card);
-    border: 1px solid var(--line); margin-top: 14px; font-size: 14px;
-  }
-  th, td { padding: 8px 10px; text-align: left; border-bottom: 1px solid var(--line); vertical-align: top; }
-  th { background: #efe4cb; font-size: 13px; letter-spacing: 0.5px; text-transform: uppercase; }
-  code {
-    background: #efe4cb; border-radius: 3px; padding: 1px 5px; font-size: 13px;
-  }
-  /* Forbidden zone */
-  .forbidden {
-    background: var(--red-soft); border: 1px solid var(--red);
-    border-radius: 8px; padding: 14px 18px; margin-top: 14px;
-  }
-  .forbidden strong { color: var(--red); }
-  .forbidden ul {
-    columns: 2; column-gap: 28px; margin: 8px 0 0; padding-left: 18px; font-size: 14px;
-  }
-  /* Timeline */
-  .timeline { position: relative; padding-left: 36px; margin-top: 20px; }
-  .timeline::before {
-    content: ""; position: absolute; left: 12px; top: 6px; bottom: 6px;
-    width: 3px; background: var(--accent); border-radius: 2px;
-  }
-  .day {
-    position: relative; margin-bottom: 14px; background: var(--card);
-    border: 1px solid var(--line); border-radius: 8px; padding: 12px 16px;
-  }
-  .day::before {
-    content: ""; position: absolute; left: -29px; top: 16px;
-    width: 13px; height: 13px; background: var(--accent);
-    border: 2px solid var(--card); border-radius: 50%;
-  }
-  .day .lbl {
-    display: inline-block; background: var(--accent); color: white;
-    font-family: ui-monospace, monospace; font-size: 11px;
-    padding: 1px 7px; border-radius: 10px; margin-right: 8px;
-  }
-  .day strong { font-size: 15px; }
-  .day .what { font-size: 13px; color: var(--accent2); margin-top: 2px; }
-  /* Award rows */
-  .award-row {
-    display: flex; justify-content: space-between;
-    padding: 8px 12px; border-bottom: 1px solid var(--line); font-size: 14px;
-  }
-  .award-row:last-child { border-bottom: 0; }
-  .prob {
-    font-family: ui-monospace, monospace; font-size: 12px;
-    padding: 1px 8px; border-radius: 10px; color: white;
-  }
-  .prob-h { background: #2e7d32; }
-  .prob-m { background: #ef9c2c; }
-  .prob-l { background: #b03a2e; }
-  /* Badges grid */
-  .badges-grid {
-    display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr));
-    gap: 12px; margin-top: 14px;
-  }
-  .badge-card {
-    background: var(--card); border: 1px solid var(--line);
-    border-radius: 8px; padding: 12px 14px;
-  }
-  .badge-card.skip { opacity: 0.45; border-style: dashed; }
-  .badge-card .tag {
-    display: inline-block; background: var(--accent); color: white;
-    font-family: ui-monospace, monospace; font-size: 11px;
-    padding: 1px 7px; border-radius: 10px; margin-bottom: 6px;
-  }
-  .badge-card.skip .tag { background: var(--gray); }
-  .badge-card strong { font-size: 14px; }
-  .badge-card p { font-size: 13px; color: var(--accent2); margin: 4px 0 0; }
-  .footnote {
-    margin-top: 30px; padding: 14px 18px;
-    border-left: 4px solid var(--accent);
-    background: var(--card); font-size: 14px; border-radius: 4px;
-  }
-</style>
-<link rel="preconnect" href="https://fonts.googleapis.com">
-<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
-<link href="https://fonts.googleapis.com/css2?family=Lora:wght@400;600;700&family=Inter:wght@400;500;600;700&display=swap" rel="stylesheet">
-</head>
-<body>
-<div class="wrap">
-  <h1><em>Cocina Conmigo</em></h1>
-  <div class="subtitle">Sous-chef multimodal con visión, voz y Flux.2 — para cocinar con tu mamá sin tener las manos libres</div>
-  <div class="hero">
-    <div>
-      <h2>La idea en una frase</h2>
-      <p>Tu mamá toma foto del refri, la app le propone qué cocinar, le <strong>muestra cómo se debe ver cada paso</strong> con Flux.2, y la <strong>narra por voz</strong> mientras ella cocina con las manos llenas.</p>
-      <p class="quote">"Mi mamá me pidió que le enseñara a hacer ramen. Le construí un sous-chef que vive en su tablet."</p>
-      <div style="margin-top: 14px;">
-        <span class="pill flux">Flux.2 Klein 9B</span>
-        <span class="pill openbmb">MiniCPM-V + voice</span>
-        <span class="pill cohere">Cohere voice</span>
-        <span class="pill gradio">Gradio Workflows</span>
-        <span class="pill modal">Modal-powered</span>
-        <span class="pill llama">llama.cpp</span>
-      </div>
-    </div>
-    <div class="target">
-      <strong>Track:</strong> Backyard AI<br/>
-      <strong>Persona:</strong> tu mamá / pareja / vecino<br/>
-      <strong>Idioma:</strong> español-mexicano<br/>
-      <strong>Total params:</strong> ~17B (≤ 32B ✓)<br/>
-      <strong>Cocina:</strong> mexicana tradicional<br/>
-      <strong>Storyline:</strong> "Para que mi mamá deje de googlear"
-    </div>
-  </div>
-  <h2><span class="num">01</span>Por qué esta idea, y no las anteriores</h2>
-  <table>
-    <thead><tr><th>Iteración</th><th>Idea</th><th>Por qué se descartó</th></tr></thead>
-    <tbody>
-      <tr><td>v1</td><td>Abuelita (parent phone helper)</td><td>En la lista pre-cocinada de OpenBMB → 5-15 equipos lo harán</td></tr>
-      <tr><td>v2</td><td>Cuentacuentos (voice storyteller)</td><td>También en la lista pre-cocinada de OpenBMB</td></tr>
-      <tr style="background:#fff3cf;"><td><strong>v3 (ésta)</strong></td><td><strong>Cocina Conmigo</strong></td><td>Refinamiento de tu idea #1 · NO está en ninguna lista pre-cocinada · usa Flux.2 + Workflows + voces · diaria + universal</td></tr>
-    </tbody>
-  </table>
-  <div class="forbidden">
-    <strong>⛔ Las 12 ideas en zona prohibida (clúster OpenBMB):</strong>
-    <ul>
-      <li>parent phone helper</li>
-      <li>receipt / bill explainer</li>
-      <li>shop menu / repair manual</li>
-      <li>offline personal assistant / voice companion</li>
-      <li>voice storyteller</li>
-      <li>visual mystery box</li>
-      <li>AI museum (≈ tu idea #4)</li>
-      <li>doodle creature</li>
-      <li>dream postcard gen</li>
-      <li>omni-modal adventure</li>
-      <li>tiny local NPC / character agent</li>
-      <li>cortes de cabello (tu idea #3, ya saturada)</li>
-    </ul>
-  </div>
-  <h2><span class="num">02</span>Las 4 historias del demo</h2>
-  <div class="phone-row">
-    <div>
-      <div class="phone"><div class="screen">
-        <div class="topbar">📸 Tengo esto en el refri</div>
-        <div class="body">
-          <div class="illu">🍅🌶🐔🧅</div>
-          <p><strong>Veo:</strong> pollo, jitomate, cebolla, cilantro, tortillas, queso.</p>
-          <p style="background:#fff3cf;border-radius:6px;padding:6px 10px;">
-            <strong>3 opciones:</strong><br/>
-            🌮 Tinga · 🌯 Enchiladas · 🧀 Quesadillas
-          </p>
-          <div class="voice">🔊 "¿Qué traes ganas?"</div>
-        </div>
-      </div></div>
-      <div class="scenario-label">1. Visión + Planner</div>
-    </div>
-    <div>
-      <div class="phone"><div class="screen">
-        <div class="topbar">👩‍🍳 Paso 2 de 5</div>
-        <div class="body">
-          <div class="illu">🍳✨</div>
-          <p><strong>Acitrona la cebolla en aceite caliente.</strong></p>
-          <p style="font-size:11px;color:var(--gray);">⏱ 4 minutos · hasta que esté transparente</p>
-          <div class="voice">🔊 OpenBMB voice narra…</div>
-        </div>
-      </div></div>
-      <div class="scenario-label">2. Voz + imagen objetivo</div>
-    </div>
-    <div>
-      <div class="phone"><div class="screen">
-        <div class="topbar">📸 ¿Voy bien?</div>
-        <div class="body">
-          <div class="illu">🍳👀</div>
-          <p style="background:var(--green-soft);border-radius:6px;padding:6px 10px;color:var(--green);">
-            <strong>✅ Va perfecto.</strong> La cebolla ya se ve transparente.
-          </p>
-          <div class="tip">🔊 Cohere voice: "¡Súbele 1 minuto más, está bien!"</div>
-        </div>
-      </div></div>
-      <div class="scenario-label">3. Closed-loop visual</div>
-    </div>
-    <div>
-      <div class="phone"><div class="screen">
-        <div class="topbar">🔄 Replan</div>
-        <div class="body">
-          <p>Usuario: <em>"No tengo cilantro."</em></p>
-          <div class="illu" style="background: linear-gradient(135deg,#ffd28b,#a85c2a);">🌮</div>
-          <p>"No pasa nada. Le ponemos perejil o nada. Sigue siendo tinga."</p>
-          <div class="voice">🔊 Receta regenera · plato final actualizado</div>
-        </div>
-      </div></div>
-      <div class="scenario-label">4. Adaptación en vivo</div>
-    </div>
-  </div>
-  <h2><span class="num">03</span>Arquitectura — 5 agentes en un Gradio Workflow</h2>
-  <svg viewBox="0 0 1240 540" xmlns="http://www.w3.org/2000/svg">
-    <defs>
-      <marker id="ar" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="7" markerHeight="7" orient="auto">
-        <path d="M0,0 L10,5 L0,10 z" fill="#2b2018"/>
-      </marker>
-      <marker id="aro" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="7" markerHeight="7" orient="auto">
-        <path d="M0,0 L10,5 L0,10 z" fill="#a85c2a"/>
-      </marker>
-    </defs>
-    <!-- User input area -->
-    <rect x="20" y="40" width="200" height="240" rx="10" fill="#fff3cf" stroke="#d8c9ad" stroke-dasharray="4 3"/>
-    <text x="40" y="62" class="node-text node-title" fill="#6b4a2a">USUARIO (cocina)</text>
-    <rect class="node-box" x="40" y="80" width="160" height="50" rx="6" fill="#ddd1bd"/>
-    <text x="120" y="102" class="node-text node-title" text-anchor="middle">📸 Foto del refri</text>
-    <text x="120" y="118" class="node-text node-sub" text-anchor="middle">trigger inicial</text>
-    <rect class="node-box" x="40" y="140" width="160" height="50" rx="6" fill="#ddd1bd"/>
-    <text x="120" y="162" class="node-text node-title" text-anchor="middle">🎙️ Pregunta voz</text>
-    <text x="120" y="178" class="node-text node-sub" text-anchor="middle">"¿voy bien?"</text>
-    <rect class="node-box" x="40" y="200" width="160" height="50" rx="6" fill="#ddd1bd"/>
-    <text x="120" y="222" class="node-text node-title" text-anchor="middle">📸 Foto progreso</text>
-    <text x="120" y="238" class="node-text node-sub" text-anchor="middle">closed-loop</text>
-    <!-- Output area -->
-    <rect x="20" y="320" width="200" height="180" rx="10" fill="#fff3cf" stroke="#d8c9ad" stroke-dasharray="4 3"/>
-    <text x="40" y="342" class="node-text node-title" fill="#6b4a2a">SALIDA</text>
-    <rect class="node-box" x="40" y="360" width="160" height="50" rx="6" fill="#dbe9d8"/>
-    <text x="120" y="382" class="node-text node-title" text-anchor="middle">🍽️ Plato final + receta</text>
-    <text x="120" y="398" class="node-text node-sub" text-anchor="middle">imagen + texto</text>
-    <rect class="node-box" x="40" y="420" width="160" height="50" rx="6" fill="#dbe9d8"/>
-    <text x="120" y="442" class="node-text node-title" text-anchor="middle">🔊 Voz por paso</text>
-    <text x="120" y="458" class="node-text node-sub" text-anchor="middle">narrador + tips</text>
-    <!-- Pipeline center -->
-    <rect x="260" y="40" width="700" height="460" rx="10" fill="#fffaf0" stroke="#d8c9ad" stroke-width="1.5"/>
-    <text x="610" y="62" class="node-text node-title" text-anchor="middle" fill="#6b4a2a">HF SPACE — Gradio Workflow (5 agentes)</text>
-    <!-- Vision (Mise en Place) -->
-    <rect class="node-box" x="280" y="90" width="200" height="80" rx="6" fill="#e6d5ed"/>
-    <text x="380" y="110" class="node-text node-title" text-anchor="middle">👁️ MISE EN PLACE</text>
-    <text x="380" y="126" class="node-text node-sub" text-anchor="middle">MiniCPM-V (Q4)</text>
-    <text x="380" y="142" class="node-text node-sub" text-anchor="middle">~2-4B</text>
-    <text x="380" y="160" class="node-text node-sub" text-anchor="middle">identifica ingredientes</text>
-    <!-- Recipe Planner -->
-    <rect class="node-box" x="510" y="90" width="200" height="80" rx="6" fill="#fbe4d3"/>
-    <text x="610" y="110" class="node-text node-title" text-anchor="middle">🧠 RECIPE PLANNER</text>
-    <text x="610" y="126" class="node-text node-sub" text-anchor="middle">MiniCPM-4 (LoRA mx)</text>
-    <text x="610" y="142" class="node-text node-sub" text-anchor="middle">~4B</text>
-    <text x="610" y="160" class="node-text node-sub" text-anchor="middle">arma receta JSON · replan</text>
-    <!-- Step Illustrator -->
-    <rect class="node-box" x="740" y="90" width="200" height="80" rx="6" fill="#f6dccc"/>
-    <text x="840" y="110" class="node-text node-title" text-anchor="middle">🎨 STEP ILLUSTRATOR</text>
-    <text x="840" y="126" class="node-text node-sub" text-anchor="middle">Flux.2 Klein 9B</text>
-    <text x="840" y="142" class="node-text node-sub" text-anchor="middle">en Modal GPU L4</text>
-    <text x="840" y="160" class="node-text node-sub" text-anchor="middle">imagen-objetivo por paso</text>
-    <!-- Sous-Chef Narrator -->
-    <rect class="node-box" x="510" y="200" width="200" height="70" rx="6" fill="#cfe0ee"/>
-    <text x="610" y="222" class="node-text node-title" text-anchor="middle">🔊 SOUS-CHEF NARRATOR</text>
-    <text x="610" y="238" class="node-text node-sub" text-anchor="middle">OpenBMB voice (~1B)</text>
-    <text x="610" y="254" class="node-text node-sub" text-anchor="middle">tono cálido</text>
-    <!-- Tip Giver -->
-    <rect class="node-box" x="740" y="200" width="200" height="70" rx="6" fill="#e9d6f5"/>
-    <text x="840" y="222" class="node-text node-title" text-anchor="middle">🎭 TIP GIVER</text>
-    <text x="840" y="238" class="node-text node-sub" text-anchor="middle">Cohere voice (~1B)</text>
-    <text x="840" y="254" class="node-text node-sub" text-anchor="middle">warnings · enérgico</text>
-    <!-- Progress Validator (closed loop) -->
-    <rect class="node-box" x="280" y="290" width="220" height="90" rx="6" fill="#dbe9d8" stroke="#3f7a3a" stroke-width="2"/>
-    <text x="390" y="312" class="node-text node-title" text-anchor="middle" fill="#3f7a3a">✅ PROGRESS VALIDATOR</text>
-    <text x="390" y="328" class="node-text node-sub" text-anchor="middle">MiniCPM-V (reuso)</text>
-    <text x="390" y="344" class="node-text node-sub" text-anchor="middle">compara foto usuario vs</text>
-    <text x="390" y="360" class="node-text node-sub" text-anchor="middle">imagen-objetivo</text>
-    <text x="390" y="376" class="node-text node-sub" text-anchor="middle">CLOSED LOOP 🔄</text>
-    <!-- STT -->
-    <rect class="node-box" x="280" y="200" width="200" height="70" rx="6" fill="#cfe0ee"/>
-    <text x="380" y="222" class="node-text node-title" text-anchor="middle">🎙️ STT (opcional)</text>
-    <text x="380" y="238" class="node-text node-sub" text-anchor="middle">Whisper-tiny (~40M)</text>
-    <text x="380" y="254" class="node-text node-sub" text-anchor="middle">"¿voy bien?" hands-free</text>
-    <!-- Recipe State -->
-    <rect class="node-box" x="510" y="290" width="430" height="90" rx="6" fill="#fff3cf"/>
-    <text x="725" y="312" class="node-text node-title" text-anchor="middle" fill="#8a6a18">📖 RECIPE STATE (dataclass)</text>
-    <text x="725" y="328" class="node-text node-sub" text-anchor="middle">name · final_dish_image · steps · current_step ·</text>
-    <text x="725" y="344" class="node-text node-sub" text-anchor="middle">missing_ingredients · substitutes · user_progress_photos</text>
-    <text x="725" y="362" class="node-text node-sub" text-anchor="middle">cada agente lee y escribe sobre este objeto</text>
-    <!-- Page assembler -->
-    <rect class="node-box" x="280" y="400" width="660" height="60" rx="6" fill="#f6dccc"/>
-    <text x="610" y="422" class="node-text node-title" text-anchor="middle">📖 RECIPE CARD ASSEMBLER</text>
-    <text x="610" y="438" class="node-text node-sub" text-anchor="middle">renderiza la tarjeta de receta + cards por paso + audio reproducible</text>
-    <!-- Modal box -->
-    <rect x="990" y="40" width="240" height="460" rx="10" fill="#dbe9d8" stroke="#3f7a3a" stroke-width="1.5"/>
-    <text x="1110" y="62" class="node-text node-title" text-anchor="middle" fill="#3f7a3a">MODAL</text>
-    <rect class="node-box" x="1010" y="90" width="200" height="80" rx="6" fill="#fff"/>
-    <text x="1110" y="112" class="node-text node-title" text-anchor="middle">Flux endpoint</text>
-    <text x="1110" y="128" class="node-text node-sub" text-anchor="middle">runtime · @app.cls L4</text>
-    <text x="1110" y="144" class="node-text node-sub" text-anchor="middle">scaledown 180s</text>
-    <text x="1110" y="160" class="node-text node-sub" text-anchor="middle">~1-3s/imagen</text>
-    <rect class="node-box" x="1010" y="190" width="200" height="80" rx="6" fill="#fff"/>
-    <text x="1110" y="212" class="node-text node-title" text-anchor="middle">Dataset cocina mx</text>
-    <text x="1110" y="228" class="node-text node-sub" text-anchor="middle">offline · 200 recetas</text>
-    <text x="1110" y="244" class="node-text node-sub" text-anchor="middle">Codex API genera</text>
-    <text x="1110" y="260" class="node-text node-sub" text-anchor="middle">~$5</text>
-    <rect class="node-box" x="1010" y="290" width="200" height="80" rx="6" fill="#fff"/>
-    <text x="1110" y="312" class="node-text node-title" text-anchor="middle">LoRA Planner</text>
-    <text x="1110" y="328" class="node-text node-sub" text-anchor="middle">offline · A10G ~30 min</text>
-    <text x="1110" y="344" class="node-text node-sub" text-anchor="middle">push GGUF a HF</text>
-    <text x="1110" y="360" class="node-text node-sub" text-anchor="middle">~$1</text>
-    <rect class="node-box" x="1010" y="390" width="200" height="80" rx="6" fill="#fff"/>
-    <text x="1110" y="412" class="node-text node-title" text-anchor="middle">Eval pipeline</text>
-    <text x="1110" y="428" class="node-text node-sub" text-anchor="middle">consistencia visual</text>
-    <text x="1110" y="444" class="node-text node-sub" text-anchor="middle">% ingredientes correctos</text>
-    <text x="1110" y="460" class="node-text node-sub" text-anchor="middle">~$1</text>
-    <!-- Arrows: input → vision -->
-    <path class="arrow" d="M200 105 L278 130" marker-end="url(#ar)"/>
-    <text x="200" y="100" class="arrow-label">refri</text>
-    <!-- input → STT -->
-    <path class="arrow" d="M200 165 L278 235" marker-end="url(#ar)"/>
-    <text x="205" y="200" class="arrow-label">audio</text>
-    <!-- input progress → validator -->
-    <path class="arrow arrow-loop" d="M200 225 L278 330" marker-end="url(#aro)"/>
-    <text x="200" y="270" class="arrow-label" style="fill:#a85c2a;">progreso</text>
-    <!-- Vision → Planner -->
-    <path class="arrow" d="M480 130 L508 130" marker-end="url(#ar)"/>
-    <text x="482" y="120" class="arrow-label">ingredientes</text>
-    <!-- Planner → Illustrator -->
-    <path class="arrow" d="M710 130 L738 130" marker-end="url(#ar)"/>
-    <text x="712" y="120" class="arrow-label">visual prompt</text>
-    <!-- Illustrator → Modal -->
-    <path class="arrow dashed" d="M940 130 L1008 130" marker-end="url(#ar)"/>
-    <text x="945" y="120" class="arrow-label">.remote()</text>
-    <!-- Planner → narrator -->
-    <path class="arrow" d="M610 170 L610 198" marker-end="url(#ar)"/>
-    <!-- Planner → tip giver -->
-    <path class="arrow" d="M710 145 C 760 170, 800 180, 800 198" marker-end="url(#ar)"/>
-    <!-- Validator → Planner (loop) -->
-    <path class="arrow arrow-loop" d="M390 290 C 390 240, 470 190, 510 145" marker-end="url(#aro)"/>
-    <text x="395" y="240" class="arrow-label" style="fill:#a85c2a;">verdict · feedback</text>
-    <!-- STT → Validator -->
-    <path class="arrow dashed" d="M380 270 L380 288" marker-end="url(#ar)"/>
-    <!-- Recipe state ↔ all agents -->
-    <path class="arrow dashed" d="M725 290 L725 270" marker-end="url(#ar)"/>
-    <path class="arrow dashed" d="M610 290 L610 270" marker-end="url(#ar)"/>
-    <!-- All → Assembler -->
-    <path class="arrow" d="M610 380 L610 398" marker-end="url(#ar)"/>
-    <!-- Assembler → output -->
-    <path class="arrow" d="M280 425 C 240 425, 220 410, 200 385" marker-end="url(#ar)"/>
-    <path class="arrow" d="M280 440 C 240 440, 220 445, 200 445" marker-end="url(#ar)"/>
-    <!-- Modal → Planner (LoRA pesos offline) -->
-    <path class="arrow dashed" d="M1010 330 C 870 330, 750 280, 710 165" marker-end="url(#ar)"/>
-    <text x="900" y="280" class="arrow-label">LoRA pesos</text>
-  </svg>
-  <p style="font-size: 13px; color: var(--accent2); margin-top: 10px;">
-    <strong>Flecha naranja</strong> = closed-loop visual (la innovación). El usuario toma foto del progreso, MiniCPM-V valida vs imagen-objetivo, el Planner ajusta o avanza. Ningún recipe app del mercado lo hace.
-  </p>
-  <h2><span class="num">04</span>El truco innovador: closed-loop visual cocinero</h2>
-  <div class="grid-3">
-    <div class="card">
-      <h3>1. Imagen-objetivo por paso</h3>
-      <p style="font-size:13px;">Flux.2 genera "así debe verse el sartén/plato/olla en el paso N". No es texto, no es stock photo: es generación context-aware del estado deseado.</p>
-    </div>
-    <div class="card">
-      <h3>2. Validación con foto del usuario</h3>
-      <p style="font-size:13px;">El usuario sube foto de cómo va. MiniCPM-V compara contra la imagen-objetivo y devuelve verdict: <code>go</code> · <code>wait</code> · <code>fix</code>.</p>
-    </div>
-    <div class="card">
-      <h3>3. Replan adaptativo</h3>
-      <p style="font-size:13px;">"No tengo cilantro." → Planner regenera receta + Flux regenera imagen final. El plan no es estático, evoluciona con el estado real.</p>
-    </div>
-  </div>
-  <p style="margin-top:14px; font-size:14px;">
-    <strong>Esta es la sección destacada del README</strong> y el blog post de Field Notes badge: <em>"How visual closed-loop cooking guidance works."</em>
-  </p>
-  <h2><span class="num">05</span>Badges objetivo (5/6)</h2>
-  <div class="badges-grid">
-    <div class="badge-card"><span class="tag">LLAMA.CPP</span><br/><strong>Llama Champion</strong><p>Vision + Planner via <code>llama-cpp-python</code> con GGUF Q4.</p></div>
-    <div class="badge-card"><span class="tag">FINE-TUNED</span><br/><strong>Well-Tuned</strong><p>LoRA en cocina mexicana · publicado en HF.</p></div>
-    <div class="badge-card"><span class="tag">CUSTOM UI</span><br/><strong>Off-Brand</strong><p>UI tarjeta de receta · serif · paleta cálida · modo cocina XL.</p></div>
-    <div class="badge-card"><span class="tag">OPEN TRACE</span><br/><strong>Sharing is Caring</strong><p>Dataset 150 recetas mx + traces + recetas generadas al Hub.</p></div>
-    <div class="badge-card"><span class="tag">TENTATIVE</span><br/><strong>Field Notes</strong><p>Blog: "Le construí un sous-chef a mi mamá".</p></div>
-    <div class="badge-card skip"><span class="tag">LOCAL-FIRST</span><br/><strong>Off the Grid</strong><p>Sacrificado: Flux.2 corre en Modal por calidad.</p></div>
-  </div>
-  <h2><span class="num">06</span>Premios objetivo</h2>
-  <div class="card">
-    <div class="award-row"><span><strong>Backyard AI Track</strong> · $1K–$4K</span><span class="prob prob-h">ALTA</span></div>
-    <div class="award-row"><span><strong>Modal Awards</strong> · $3K–$10K credits</span><span class="prob prob-h">ALTA</span></div>
-    <div class="award-row"><span><strong>OpenBMB Award</strong> · $1K–$2.5K</span><span class="prob prob-h">ALTA</span></div>
-    <div class="award-row"><span><strong>Best Demo</strong> · $1K</span><span class="prob prob-h">ALTA</span></div>
-    <div class="award-row"><span><strong>Community Choice</strong> · $2K</span><span class="prob prob-h">ALTA</span></div>
-    <div class="award-row"><span><strong>Best Agent</strong> · $1K</span><span class="prob prob-h">ALTA — closed-loop multi-agente real</span></div>
-    <div class="award-row"><span><strong>Bonus Quest Champion</strong> · $2K</span><span class="prob prob-m">MEDIA-ALTA · 5/6 badges</span></div>
-    <div class="award-row"><span><strong>Off-Brand</strong> · $1.5K</span><span class="prob prob-m">MEDIA</span></div>
-    <div class="award-row"><span><strong>Tiny Titan</strong> · $1.5K</span><span class="prob prob-l">BAJA · Flux 9B saca del rango</span></div>
-  </div>
-  <p style="font-size: 14px; margin-top: 8px;"><strong>Cota razonable acumulada: $5K–$12K cash + $3K–$10K Modal credits.</strong></p>
-  <h2><span class="num">07</span>Timeline de 10 días</h2>
-  <div class="timeline">
-    <div class="day"><span class="lbl">D1</span><strong>Setup + Modal Flux endpoint</strong><div class="what">"Hola Flux": prompt → imagen de un platillo. Space vacío deployado.</div></div>
-    <div class="day"><span class="lbl">D2</span><strong>Vision: identificación de ingredientes</strong><div class="what">MiniCPM-V Q4 · prueba con 5 fotos reales del refri.</div></div>
-    <div class="day"><span class="lbl">D3</span><strong>Recipe Planner LLM</strong><div class="what">MiniCPM-4 · JSON estructurado · 3 opciones a partir de ingredientes.</div></div>
-    <div class="day"><span class="lbl">D4</span><strong>Step Illustrator (Flux + consistencia)</strong><div class="what">Imagen del plato final + 5 imágenes-objetivo por paso · i2i suave.</div></div>
-    <div class="day"><span class="lbl">D5</span><strong>Voz: narrador + tip-giver</strong><div class="what">OpenBMB voice + Cohere voice · audio pre-renderizado por paso.</div></div>
-    <div class="day"><span class="lbl">D6</span><strong>UI Off-Brand: recipe card</strong><div class="what">gr.Blocks + CSS serif tierra · modo cocina XL hands-free.</div></div>
-    <div class="day"><span class="lbl">D7</span><strong>Gradio Workflows showcase</strong><div class="what">Pipeline reescrita como Workflow visible · pestaña separada.</div></div>
-    <div class="day"><span class="lbl">D8</span><strong>Fine-tune del Planner en cocina mx</strong><div class="what">200 recetas sintéticas · LoRA · GGUF · push HF.</div></div>
-    <div class="day"><span class="lbl">D9</span><strong>STT + Progress Validator + eval</strong><div class="what">Whisper · closed-loop activo · Sharing is Caring badge.</div></div>
-    <div class="day"><span class="lbl">D10</span><strong>Demo + README + blog + submit</strong><div class="what">Mamá real cocinando · 60-90s · subtítulos EN · Field Notes blog.</div></div>
-  </div>
-  <h2><span class="num">08</span>Plan B (corte de scope)</h2>
-  <table>
-    <thead><tr><th>#</th><th>Cortar</th><th>Pierdes</th><th>Conservas</th></tr></thead>
-    <tbody>
-      <tr><td>1</td><td>STT (preguntas voz)</td><td>comodidad demo</td><td>texto + foto</td></tr>
-      <tr><td>2</td><td>2da voz (Cohere tip-giver)</td><td>1 sponsor voice</td><td>narrador único</td></tr>
-      <tr><td>3</td><td>Progress Validator (closed-loop)</td><td><strong>Best Agent</strong> + innovación principal</td><td>demo lineal</td></tr>
-      <tr><td>4</td><td>Fine-tune del Planner</td><td><strong>Well-Tuned</strong></td><td>resto badges</td></tr>
-      <tr><td>5</td><td>Gradio Workflows showcase</td><td>diferenciador "fresh"</td><td>pipeline Python</td></tr>
-      <tr><td>6</td><td>UI super-custom</td><td><strong>Off-Brand</strong></td><td>UI default</td></tr>
-      <tr style="background:#fff3cf;"><td>—</td><td><strong>NUNCA</strong></td><td colspan="2">Vision + Planner + Illustrator + Narrator + UI mínima + video con persona real cocinando</td></tr>
-    </tbody>
-  </table>
-  <h2><span class="num">09</span>Riesgos clave</h2>
-  <table>
-    <thead><tr><th>Riesgo</th><th>Mitigación</th></tr></thead>
-    <tbody>
-      <tr><td>Flux.2 Klein no tiene API/pesos públicos cuando lo necesitas</td><td>Plan B: Flux.1-schnell o SDXL-Lightning. Pierdes posicionamiento sponsor pero idea sobrevive.</td></tr>
-      <tr><td>MiniCPM-V no identifica ingredientes mexicanos (chile poblano, nopales)</td><td>Few-shot en prompt; eventualmente fine-tune ligero del visión sobre 50 fotos etiquetadas</td></tr>
-      <tr><td>Flux.2 genera comida poco apetitosa</td><td>Itera prompts ("recipe magazine, warm light, top-down"); usa imagen final como ref para los pasos</td></tr>
-      <tr><td>Progress Validator da false positives</td><td>Conservador: solo dice "vas bien" si similitud es alta; default es "sigue" sin juicio fuerte</td></tr>
-      <tr><td>Latencia receta &gt; 30s</td><td>Streaming progresivo; paraleliza Flux + TTS</td></tr>
-      <tr><td>Modal cold start ~30-60s en Flux</td><td>Pre-warm 30s antes de filmar · <code>keep_warm=1</code> el día del demo</td></tr>
-      <tr><td>Persona del demo se quema/cocina mal</td><td>Practica la receta una vez antes · 2-3 candidatos de receta listos</td></tr>
-      <tr><td>Otro equipo presenta "recipe app con AI"</td><td>Diferéncialo con: closed-loop visual + español + cocina mx + dataset publicado + persona real</td></tr>
-    </tbody>
-  </table>
-  <h2><span class="num">10</span>Cómo gastar los créditos</h2>
-  <div class="grid-2">
-    <div class="card">
-      <h3>Modal · $250</h3>
-      <table>
-        <tr><td>Flux dev (días 1-9)</td><td>$5-15</td></tr>
-        <tr><td>Dataset cocina mx</td><td>$3-8</td></tr>
-        <tr><td>LoRA + sweeps</td><td>$4-5</td></tr>
-        <tr><td>Eval</td><td>$1</td></tr>
-        <tr><td>Inferencia grading jueces</td><td>$10-25</td></tr>
-        <tr><th>Subtotal</th><th>$25-65</th></tr>
-        <tr><th>+ Buffer</th><th>$30</th></tr>
-        <tr><th>Proyectado</th><th><strong>~$55-95 / $250</strong></th></tr>
-      </table>
-    </div>
-    <div class="card">
-      <h3>OpenAI Codex · $100</h3>
-      <table>
-        <tr><td>Codex CLI pair-programmer</td><td>$20-40</td></tr>
-        <tr><td>200 recetas mx sintéticas</td><td>$10-25</td></tr>
-        <tr><td>Prompts Flux por paso</td><td>$5-10</td></tr>
-        <tr><td>Reserva</td><td>$30</td></tr>
-        <tr><th>Proyectado</th><th><strong>~$65-105 / $100</strong></th></tr>
-      </table>
-    </div>
-  </div>
-  <div class="footnote">
-    <strong>Mantra del proyecto:</strong> "Una mamá cocinando frente a la cámara. Un platillo que se ve apetitoso. Una voz que la acompaña sin juzgar. Un paso a la vez."
-  </div>
-</div>
-</body>
-</html>

Strategy/estrategia.md DELETED Viewed

@@ -1,496 +0,0 @@
-# Estrategia detallada — "Cocina Conmigo"
-> Documento de ejecución. Lee primero `plan.md` para el "qué" y el "por qué".
-> Este archivo es el "cómo": modelo mental, multi-agente, timeline, gasto de créditos, riesgos, snippets.
----
-## 1. Modelo mental: la "receta" como objeto de estado
-La app no es un chatbot. Es una **máquina de estado** alrededor de un objeto `Recipe` que evoluciona en el tiempo. Ese estado se actualiza en cada turno.
-```python
-@dataclass
-class Recipe:
-    name: str                       # "Tinga de Pollo"
-    final_dish_image: bytes         # imagen Flux del plato final
-    available_ingredients: list[str]  # lo que la cámara vio en el refri
-    missing_ingredients: list[str]    # lo que falta + sus sustitutos
-    steps: list[Step]               # 5-7 pasos
-    current_step: int               # qué paso vamos haciendo
-    user_progress_photos: list[bytes]  # fotos que el usuario tomó
-@dataclass
-class Step:
-    n: int
-    instruction_text: str           # "Pica la cebolla en cubos chicos"
-    visual_target: bytes            # imagen Flux: "así debe verse el sartén"
-    duration_estimate: str          # "4 minutos"
-    audio_narration: bytes          # narración pre-renderizada
-    tip: str | None                 # "no la quemes"
-    tip_audio: bytes | None         # voz Cohere
-```
-Ventajas de pensarlo así:
-- Cada nodo del Workflow toma `Recipe` y devuelve `Recipe` modificada. Composable y observable.
-- El "replan" (no tengo cilantro) es una sola función `recipe.replan(missing="cilantro") → Recipe`.
-- El "validador" toma `Recipe` + `progress_photo` y devuelve `feedback`.
----
-## 2. Los 5 agentes (multi-agente real, no simulado)
-| Agente | Responsabilidad | Trigger | Output |
-|---|---|---|---|
-| **Mise en Place** | Identificar ingredientes en foto del refri | foto del refri | `available_ingredients` |
-| **Recipe Planner** | Proponer 3 recetas factibles · armar la elegida | usuario elige idea | `Recipe` con steps |
-| **Step Illustrator** | Generar imagen-objetivo de cada paso + plato final | nueva receta | `Step.visual_target` para cada paso |
-| **Sous-Chef Narrator** | Narrar instrucciones por voz | paso activo | `Step.audio_narration` |
-| **Progress Validator** | Comparar foto del usuario vs imagen-objetivo | usuario sube foto mid-cooking | `feedback` (texto + voz tip) |
-Esto es un **sistema multi-agente real**: cada agente tiene su propia función, su propio modelo, y se comunican por estado compartido (`Recipe`). No es un agente único con tools — es 5 agentes en pipeline + closed-loop.
-> **Best Agent badge candidate.** Documenta esto en el README con un diagrama explícito.
----
-## 3. El truco innovador: closed-loop visual
-```
-                ┌─────────────────────────────────────┐
-                │                                     │
-                ▼                                     │
-[Step Illustrator]──▶ visual_target ──▶ [Usuario cocina]
-                                              │
-                                              ▼
-                                    📸 progress_photo
-                                              │
-                                              ▼
-                                  [Progress Validator]
-                                    (MiniCPM-V)
-                                              │
-                          ┌───────────────────┤
-                          │                   │
-                    ✅ va bien           ❌ ajustar
-                          │                   │
-                    siguiente paso        [Recipe Planner]
-                                          replan/tip
-                                              │
-                                              └──────▶ vuelta al loop
-```
-Esta es **la innovación técnica** del proyecto. La mayoría de "recipe apps" son listas estáticas. Cocina Conmigo:
-1. Genera *visualmente* cómo debe verse cada paso (no solo texto).
-2. Acepta foto del usuario y la *compara* con el objetivo.
-3. Adapta el plan en vivo si algo no va.
-Sección dedicada en el README: *"How visual closed-loop cooking guidance works"*. Es también el blog post de Field Notes.
----
-## 4. Cronograma — 10 días
-> ~50-70 horas de trabajo + 1 humano + Codex CLI como pair.
-### Día 1 — Setup + Modal Flux endpoint
-- `pip install gradio modal openai huggingface-hub diffusers llama-cpp-python`
-- `modal setup` y deploya el endpoint Flux que devuelve imagen dada un prompt.
-- Crea Space vacío en HF, push inicial.
-- **Entregable:** Space que muestra una imagen Flux dado un texto.
-### Día 2 — Vision: identificación de ingredientes
-- Carga MiniCPM-V Q4 GGUF en local.
-- Función: `identify_ingredients(fridge_photo) → list[str]`.
-- Prueba con 5 fotos de refri reales (el tuyo, el de tu mamá).
-- **Entregable:** dada foto del refri, devuelve lista correcta de 80%+ ingredientes visibles.
-### Día 3 — Recipe Planner LLM
-- Carga MiniCPM-4 Q4 GGUF.
-- Prompt template estructurado que devuelve JSON:
-  ```json
-  {
-    "name": "Tinga de Pollo",
-    "options": [{"name": "...", "why": "..."}, ...],
-    "steps": [{"n": 1, "instruction": "...", "duration": "...", "visual": "..."}],
-    "missing": ["cilantro"],
-    "substitutes": {"cilantro": ["perejil", "nada"]}
-  }
-  ```
-- Conecta Vision + Planner: foto refri → 3 opciones de receta.
-- **Entregable:** dada foto + selección, devuelve receta completa estructurada.
-### Día 4 — Step Illustrator (Flux.2 con consistencia)
-- Para cada `Step.visual` del JSON, llama Flux.2 endpoint con prompt:
-  > *"Top-down view of a kitchen pan with [step.visual]. Mexican cooking style. Warm lighting. Natural ingredients. Photorealistic, recipe magazine style."*
-- Para mantener estilo entre pasos: usa la imagen del paso anterior como `ref` con `strength=0.6` (más relajado que cuentos, porque el contenido cambia mucho).
-- Genera también imagen del plato final (sin `ref`).
-- **Entregable:** receta de 5 pasos cada uno con imagen-objetivo + foto del plato final.
-### Día 5 — Voz: narrador + tip-giver
-- **OpenBMB voice** para `Step.audio_narration`: instrucciones tono cálido y claro.
-- **Cohere Labs voice** para `Step.tip_audio`: tono más enérgico ("¡cuidado!").
-- Genera audio de los 5 pasos por adelantado (no en streaming, evita cold starts molestos).
-- **Entregable:** receta completa con narración audible.
-### Día 6 — UI Off-Brand: tarjeta de receta
-- `gr.Blocks` + CSS custom.
-- Layout: hero con imagen del plato final + título grande, abajo carrusel de pasos cada uno con `imagen objetivo + texto + botón "ya"`, modo cocina hands-free con texto enorme.
-- Estilo: serif elegante (`Lora`), paleta cálida tierra/dorado.
-- **Entregable:** Space que parece tarjeta de revista de cocina, no Gradio.
-### Día 7 — Gradio Workflows showcase
-- Reescribe pipeline como **Gradio Workflow** con nodos visibles.
-- Nodos: `📸 Fridge → 👁️ Vision → 🧠 Planner → 🎨 Illustrator → 🔊 Narrator → 📖 Recipe Card`.
-- Para `Progress Validator`, agrega rama: `📸 Progress Photo → 👁️ Validator → 💬 Feedback`.
-- Pestaña separada en el Space que muestra el grafo del Workflow corriendo en vivo.
-- **Entregable:** Workflow visualmente impresionante en pantalla. Diferenciador para jueces de Gradio.
-### Día 8 — Fine-tune del Planner en cocina mexicana
-- **Dataset sintético en Modal:** Codex API genera 200 recetas mexicanas en formato JSON estructurado (tinga, mole, chiles rellenos, sopes, pozole, etc.). Filtras manualmente las 150 mejores.
-- **LoRA en Modal A10G:** ~30-60 min de fine-tune sobre MiniCPM-4 4B.
-- **GGUF + push HF:** convierte a Q4_K_M, sube a HF Hub.
-- Reemplaza el Planner por la versión fine-tuneada.
-- **Entregable:** modelo `tu-usuario/cocinaconmigo-4b-mx-Q4_K_M-gguf` publicado.
-### Día 9 — STT + Progress Validator + eval
-- `faster-whisper tiny` en español: usuario pregunta hands-free.
-- Implementa **Progress Validator**: foto del usuario → MiniCPM-V compara contra `Step.visual_target` → genera feedback.
-- Eval: 10 recetas generadas, mide:
-  - % ingredientes correctamente identificados.
-  - % pasos con imagen-objetivo coherente.
-  - Calidad subjetiva de validación (5 fotos de progreso).
-- Sube traces al Hub (badge **Sharing is Caring**).
-- **Entregable:** app completa con voz IN, validador, traces publicados.
-### Día 10 — Demo video + README + blog + submit
-- **Filma a una persona real cocinando** una receta sugerida por la app, de principio a fin.
-- 60-90 segundos: foto del refri → 3 opciones → elige → cocina con voz → toma foto mid-cooking → app valida → plato final → la persona come.
-- README: badges declarados, diagrama, link al video, sección "How closed-loop visual cooking guidance works".
-- Blog post (badge **Field Notes**): "Le construí un sous-chef a mi mamá".
-- Submit + post social.
----
-## 5. Decisiones técnicas explícitas
-### 5.1 Por qué Modal en runtime (rompiendo Off the Grid)
-Igual que en planes anteriores: Flux.2 9B en CPU del Space free es inviable (GB de RAM y minutos por imagen). Modal-powered es la elección obligada cuando el centro de la app es generación visual.
-### 5.2 Por qué cocina mexicana específicamente
-- Dataset acotado pero rico. Cubrible en 200 recetas.
-- Diferenciador cultural automático.
-- Se alinea con el público "para mi mamá" (si tu mamá es latina).
-- Si los jueces son mexicanos en Discord/Slack, +1.
-### 5.3 Por qué visual_target con Flux.2 en lugar de imagen stock
-- Stock photos tienen sesgo americano/europeo. Flux genera estilo mexicano si lo prompteas.
-- Stock no se ajusta al ingrediente exacto que tienes (Flux sí).
-- Esto es lo que hace única la app — es el wow factor.
-### 5.4 Por qué pre-renderizar audio en lugar de streaming
-- Latencia: streaming TTS tarda y se ve mal en demo.
-- Cocina es secuencial: sabes los 5 pasos cuando empieza la receta. Pre-render todo en paralelo.
-- Si el usuario hace replan, regeneras solo los pasos afectados.
-### 5.5 LoRA y no full fine-tune
-Mismo argumento de planes anteriores: 150-200 ejemplos = LoRA r=16 es suficiente. ~30 min A10G ≈ $1.
-### 5.6 Cómo gastar los $250 de Modal
-| Concepto | Estimado |
-|---|---|
-| Inferencia Flux.2 dev (días 1-9, ~5h GPU L4) | $5-15 |
-| Generación dataset sintético cocina mexicana (~2h) | $3-8 |
-| LoRA fine-tune + sweeps (~3h A10G) | $4-5 |
-| Eval pipeline | $1 |
-| Inferencia durante grading de jueces (~10h) | $10-25 |
-| **Subtotal** | **$25-65** |
-| Buffer | $30 |
-| **Total proyectado** | **~$55-95 / $250** |
-### 5.7 Cómo gastar los $100 de OpenAI Codex
-- Codex CLI durante 10 días como pair-programmer: $20-40.
-- Generación de 200 recetas mexicanas estructuradas (Día 8): $10-25.
-- Generación de prompts de Flux para los pasos (Día 4): $5-10.
-- Reserva: $30.
----
-## 6. Riesgos y mitigaciones
-| Riesgo | Impacto | Mitigación |
-|---|---|---|
-| Flux.2 Klein no tiene API/pesos públicos cuando lo necesitas | Bloquea idea | Plan B: Flux.1-schnell o SDXL-Lightning. Pierdes tag sponsor pero idea sobrevive. |
-| MiniCPM-V no identifica ingredientes mexicanos (chile poblano, chayote, nopales) | Recipe Planner falla | Agrega few-shot examples al prompt; eventualmente fine-tune del visión sobre 50 fotos etiquetadas |
-| Flux.2 genera comida poco apetitosa/uncanny | Mata el demo | Itera prompts (style="recipe magazine, warm light, top-down"); usa imagen de plato final como ref para los pasos |
-| Latencia: receta completa tarda más de 30s en generarse | Demo aburrido | Streaming progresivo (muestra opción + plato final primero, pasos después); paraleliza Flux + TTS |
-| Modal cold start ~30-60s en Flux | Primera demo lenta | Pre-warm 30s antes de filmar; `keep_warm=1` el día del demo |
-| Validador de progreso da false positives ("vas bien" cuando no) | Confunde al usuario | Conservador: solo dice "vas bien" si la similitud es muy alta; default es "sigue" sin juicio fuerte |
-| TTS español sin acento mexicano | Suena raro | Si OpenBMB no tiene es-MX, usa Cohere o Kokoro con voz neutra; pre-graba para video |
-| Usuario del demo cocina mal/se quema | Mata el video | Practica la receta una vez antes de filmar; ten 2-3 candidatos de receta listos |
-| Otro equipo presenta "recipe app con AI" | Compite por premios | Diferénciate con: closed-loop visual + español + cocina mexicana específica + dataset publicado + persona real cocinando + Workflow visible |
-| Workflows de Gradio inestable (lanzado ayer) | Rompe app | Versión sin Workflows como backup. Workflows es decoración. |
----
-## 7. Plan B — corte de scope
-Si en Día 7 ves que no llegas, recorta features en este orden:
-| # | Cortar | Pierdes | Conservas |
-|---|---|---|---|
-| 1 | STT (preguntas hands-free por voz) | comodidad demo | input por texto + foto |
-| 2 | 2da voz (Cohere tip-giver) | un sponsor de voz | narrador único |
-| 3 | Progress Validator (closed-loop) | **Best Agent badge** + innovación principal | demo lineal sin loop |
-| 4 | Fine-tune del Planner | **Well-Tuned badge** | base model + prompting |
-| 5 | Gradio Workflows showcase | diferenciador "fresh" | pipeline Python |
-| 6 | UI super-custom | **Off-Brand badge** | UI default |
-**NUNCA cortar:**
-- Vision + Planner + Step Illustrator + Narrator + UI mínima + video con persona real cocinando.
-Eso solo ya entra fuerte a Backyard AI track.
----
-## 8. Métricas de éxito (auto-evaluación pre-submit)
-Antes de mandar:
-- [ ] Una persona real cocinó una receta entera con la app y se la comió.
-- [ ] El video tiene una cara humana y un plato terminado en al menos 30s de los 90s.
-- [ ] La app identifica correctamente ≥4 de 5 ingredientes en una foto típica de refri.
-- [ ] Las imágenes de Flux para los pasos se ven *apetitosas* (test: si las muestras a alguien sin contexto, dice "se ve rico").
-- [ ] Una receta completa se genera en menos de 30s (texto + 5 imágenes + audio).
-- [ ] El Progress Validator funciona en al menos 5 de 10 fotos de progreso reales.
-- [ ] El README tiene un diagrama y la sección "How closed-loop cooking works".
-- [ ] Hay 3 recetas pre-renderizadas listas para que jueces las vean sin esperar.
-- [ ] Total params declarado y verificado ≤ 32B.
-- [ ] Sin secrets hardcoded.
-Si fallas más de 2, no submitas; arregla.
----
-## 9. Lo que NO debes hacer
-- **No** intentes generar video del platillo. Imagen estática se ve mejor que video AI mediocre.
-- **No** hagas más de 7 pasos por receta. Atención del juez = 60-90s.
-- **No** soportes 100 recetas. Soporta 20 recetas mexicanas excelentes y di "más recetas pronto".
-- **No** subas fotos del refri real con productos identificables (marcas, info personal). Borra labels.
-- **No** persigas Off the Grid. Decisión ya tomada.
-- **No** dejes el video de demo para el último día sin practicar la receta antes.
-- **No** publiques tokens en el repo.
-- **No** generes recetas con ingredientes raros que la mayoría no tenga (cocina accesible > cocina chef).
----
-## 10. Pitch del README (esqueleto)
-```markdown
-# Cocina Conmigo
-> A visual sous-chef that sees what's in your fridge,
-> shows you what each step should look like, and walks you through it
-> with voice — hands-free.
-[60-second demo video embed: tu mamá cocinando tinga]
-## Why it shouldn't exist (but does)
-Every recipe app is a list of steps. Cocina Conmigo is a closed-loop assistant:
-it generates the *target image* of each cooking step with Flux.2, listens
-when you ask "¿voy bien?", and adapts when you say "no tengo cilantro."
-## Tech
-- 👁️ MiniCPM-V — sees your fridge + validates your progress
-- 🧠 MiniCPM-4 4B (LoRA fine-tuned on Mexican cuisine) — recipe planner
-- 🎨 Flux.2 Klein 9B (Modal endpoint) — generates target images per step
-- 🔊 OpenBMB voice — sous-chef narrator
-- 🎭 Cohere voice — tip-giver second voice
-- 🎙️ Whisper-tiny — voice input
-- ⚙️ Gradio Workflows — visible pipeline of nodes
-Total params: ~17B (≤ 32B ✓)
-## Badges
-✓ Llama Champion · ✓ Well-Tuned · ✓ Off-Brand · ✓ Sharing is Caring · ✓ Field Notes
-## Built for
-My mom. She makes great mole. She can never remember tinga.
-## Try it
-[HF Space link]
-```
----
-## 11. Apéndice: snippets clave
-### 11.1 Mise en Place agent (vision)
-```python
-def identify_ingredients(image: PIL.Image) -> list[str]:
-    prompt = """Veo esta foto de un refrigerador o despensa.
-    Lista TODOS los ingredientes que se ven, en español, en JSON:
-    {"ingredients": ["pollo", "cebolla", "cilantro", ...]}
-    Solo ingredientes alimentarios, no contenedores."""
-    out = mini_cpm_v.create_chat_completion(messages=[
-        {"role": "user", "content": [
-            {"type": "image_url", "image_url": pil_to_data_url(image)},
-            {"type": "text", "text": prompt}
-        ]}
-    ])
-    return json.loads(out["choices"][0]["message"]["content"])["ingredients"]
-```
-### 11.2 Recipe Planner agent (LLM)
-```python
-SYS = """Eres un chef mexicano. Generas recetas a partir de ingredientes
-disponibles. Prefiere cocina mexicana tradicional, accesible.
-Salida JSON estricta:
-{
-  "name": "...",
-  "options": [{"name": "...", "why": "..."}],
-  "steps": [
-    {"n": 1, "instruction": "...", "duration": "4 min",
-     "visual": "english visual description for image gen",
-     "tip": "optional warning or tip"}
-  ],
-  "missing": ["cilantro"],
-  "substitutes": {"cilantro": ["perejil", "nada"]},
-  "final_dish_visual": "english visual description of the final plated dish"
-}
-"""
-def plan_recipe(ingredients, choice=None):
-    msgs = [{"role": "system", "content": SYS}]
-    msgs.append({"role": "user", "content":
-        f"Tengo: {', '.join(ingredients)}.\n"
-        + (f"Quiero hacer: {choice}." if choice else "Propón 3 opciones.")})
-    raw = llm.create_chat_completion(messages=msgs, temperature=0.7)
-    return json.loads(raw["choices"][0]["message"]["content"])
-```
-### 11.3 Step Illustrator (Flux endpoint)
-```python
-import modal
-app = modal.App("cocina-flux")
-image = modal.Image.debian_slim().pip_install("torch","diffusers","transformers","accelerate","Pillow")
-@app.cls(image=image, gpu="L4", scaledown_window=180, keep_warm=0)
-class FluxKlein:
-    @modal.enter()
-    def load(self):
-        from diffusers import FluxPipeline
-        self.pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.2-klein",
-                                                 torch_dtype="bfloat16").to("cuda")
-    @modal.method()
-    def render_step(self, visual: str, ref_png: bytes | None = None) -> bytes:
-        from PIL import Image; import io
-        prompt = (f"Top-down photo of a kitchen pan or plate showing {visual}. "
-                  f"Mexican home cooking, warm natural lighting, recipe magazine "
-                  f"style, photorealistic, appetizing.")
-        if ref_png:
-            ref = Image.open(io.BytesIO(ref_png)).convert("RGB")
-            img = self.pipe(prompt=prompt, image=ref, strength=0.6,
-                            num_inference_steps=4).images[0]
-        else:
-            img = self.pipe(prompt=prompt, num_inference_steps=4).images[0]
-        buf = io.BytesIO(); img.save(buf, "PNG"); return buf.getvalue()
-```
-### 11.4 Progress Validator (closed-loop)
-```python
-def validate_progress(target_image: PIL.Image, user_photo: PIL.Image,
-                      step_instruction: str) -> dict:
-    prompt = f"""Compara estas dos fotos de cocina.
-    Foto 1 (objetivo): cómo debe verse después del paso "{step_instruction}".
-    Foto 2 (usuario): cómo va el usuario.
-    Responde en JSON:
-    {{"verdict": "go|wait|fix", "feedback_es": "...", "tip": "..." | null}}
-    - "go": va bien, siguiente paso
-    - "wait": le falta tiempo
-    - "fix": algo se ve mal, sugiere ajuste
-    """
-    out = mini_cpm_v.create_chat_completion(messages=[
-        {"role": "user", "content": [
-            {"type": "image_url", "image_url": pil_to_data_url(target_image)},
-            {"type": "image_url", "image_url": pil_to_data_url(user_photo)},
-            {"type": "text", "text": prompt}
-        ]}
-    ])
-    return json.loads(out["choices"][0]["message"]["content"])
-```
-### 11.5 UI Off-Brand (recipe card)
-```python
-import gradio as gr
-CSS = """
-@import url('https://fonts.googleapis.com/css2?family=Lora:wght@400;700&family=Inter:wght@400;600&display=swap');
-.gradio-container {background: #f5ecd9 !important; font-family: 'Inter', sans-serif !important;}
-.recipe-hero {background: #fffbf0; border-radius: 14px; padding: 28px;
-  box-shadow: 0 8px 24px rgba(0,0,0,0.12); border: 1px solid #d8c9ad;}
-.recipe-hero h1 {font-family: 'Lora', serif !important; font-size: 36px !important;
-  margin: 0 0 6px !important; color: #6b4a2a !important;}
-.step-card {background: #fffbf0; border-left: 4px solid #a85c2a;
-  border-radius: 8px; padding: 18px 22px; margin: 12px 0;}
-.step-card h3 {font-family: 'Lora', serif !important; margin: 0 !important;}
-.step-card p {font-size: 17px !important; line-height: 1.6;}
-button.primary {background: #a85c2a !important; font-family: 'Inter', sans-serif !important;
-  font-weight: 600 !important; font-size: 16px !important; padding: 14px 22px !important;}
-"""
-with gr.Blocks(css=CSS, title="Cocina Conmigo") as demo:
-    gr.Markdown("# 👩‍🍳 Cocina Conmigo")
-    fridge = gr.Image(label="📸 Foto de tu refri o despensa", type="pil")
-    btn = gr.Button("¿Qué cocino?", variant="primary")
-    with gr.Column(elem_classes=["recipe-hero"]):
-        title = gr.Markdown()
-        final_img = gr.Image(show_label=False)
-        steps_box = gr.Column()
-    progress = gr.Image(label="📸 Tómame foto de tu progreso", type="pil")
-    feedback = gr.Markdown()
-    # callbacks omitidos
-```
-### 11.6 LoRA fine-tune del Planner en Modal
-```python
-@app.function(image=image_train, gpu="A10G", timeout=60*60*2,
-              volumes={"/cache": modal.Volume.from_name("hf-cache", create_if_missing=True)})
-def train_planner():
-    import os; os.environ["HF_HOME"] = "/cache"
-    from transformers import AutoModelForCausalLM, AutoTokenizer
-    from peft import LoraConfig, get_peft_model
-    from trl import SFTTrainer, SFTConfig
-    from datasets import load_dataset
-    base = "openbmb/MiniCPM-4-Base"
-    tok = AutoTokenizer.from_pretrained(base, trust_remote_code=True)
-    model = AutoModelForCausalLM.from_pretrained(base, trust_remote_code=True,
-                                                 device_map="cuda", torch_dtype="bfloat16")
-    model = get_peft_model(model, LoraConfig(r=16, lora_alpha=32,
-                                              target_modules="all-linear"))
-    ds = load_dataset("tu-usuario/recetas-mexicanas-sft", split="train")
-    SFTTrainer(model=model, tokenizer=tok, train_dataset=ds,
-               args=SFTConfig(output_dir="/cache/out", num_train_epochs=2,
-                              per_device_train_batch_size=4, learning_rate=2e-4,
-                              push_to_hub=True,
-                              hub_model_id="tu-usuario/cocinaconmigo-4b-mx")
-               ).train()
-```
----
-## 12. Lectura recomendada antes del Día 1
-- `Context/guia-tecnologias.md` (sección 3 Modal, sección 4 llama.cpp).
-- HF Black Forest Labs: <https://huggingface.co/black-forest-labs> — confirma versión Flux.2 Klein.
-- HF MiniCPM-V: <https://huggingface.co/openbmb> — versión vision con GGUF.
-- Modal stable-diffusion example: <https://github.com/modal-labs/modal-examples/tree/main/06_gpu_and_ml/stable_diffusion>.
-- Diffusers img2img: <https://huggingface.co/docs/diffusers/using-diffusers/img2img>.
-- Gradio Workflows: <https://www.gradio.app/guides> (busca el guide más reciente).
-- Cohere Labs voice: confirma con sponsor el modelo exacto disponible.
-> Cocina con tu mamá una vez antes de empezar a programar. Te va a aclarar más sobre qué necesita tu app que cualquier brainstorm. Suerte.

Strategy/plan.md DELETED Viewed

@@ -1,245 +0,0 @@
-# Plan ganador — "Cocina Conmigo"
-> Un sous-chef multimodal que ve lo que tienes en el refri, te dice qué cocinar, te muestra cómo debe verse cada paso con Flux.2, y te narra todo por voz mientras cocinas con las manos llenas.
->
-> Hackathon "Small models / Big adventures" — junio 2026.
----
-## TL;DR
-**Idea elegida:** **Cocina Conmigo** — un copiloto de cocina hands-free que combina visión, razonamiento, generación de imagen en tiempo real, y voz, para acompañarte de principio a fin: desde *"¿qué cocino con esto?"* hasta *"¿voy bien?"*.
-**Por qué esta y no otra:** es la única idea que **(1) está fuera de las 11 ideas pre-cocinadas por OpenBMB**, **(2) usa Flux.2 + voces + Workflows como núcleo**, y **(3) tiene utilidad real, diaria y universal**. Nadie cocina como hobby; todos cocinan por necesidad.
----
-## Por qué cambió el plan respecto a iteraciones anteriores
-| Iteración | Idea | Por qué se descartó |
-|---|---|---|
-| v1 | Abuelita (parent phone helper) | **Está en la lista pre-cocinada de OpenBMB para Backyard AI.** 5-15 equipos van a hacer la misma cosa. |
-| v2 | Cuentacuentos (storyteller ilustrado) | **Está en la lista pre-cocinada de OpenBMB para Thousand Token Wood ("voice storyteller").** Mismo problema de saturación. |
-| v3 (ésta) | **Cocina Conmigo** | Refinamiento de **tu propia idea #1**, ahora viable de verdad gracias a Flux.2. **No está en ninguna lista pre-cocinada.** |
-La regla estratégica: **usar los modelos de los sponsors, no copiar sus templates de proyecto.**
----
-## Las 12 ideas en zona prohibida (clúster OpenBMB)
-| Backyard AI | Thousand Token Wood |
-|---|---|
-| Parent phone helper | Voice storyteller |
-| Receipt / bill explainer | Visual mystery box |
-| Shop menu / repair manual | AI museum |
-| Offline personal assistant / voice companion | Doodle creature |
-| | Dream postcard gen |
-| | Omni-modal adventure |
-| | Tiny local NPC / character agent |
-Y de tus 5 ideas originales, también caen:
-- #3 cortes de cabello (tú mismo dijiste "ya está muy trabajado")
-- #4 museum Q&A (choca con "AI museum")
-**Quedan vivas, fuera de zona prohibida:**
-- #1 Recetas (→ **Cocina Conmigo**, esta propuesta)
-- #2 Detector de intenciones (no usa Flux.2, demo aburrida)
-- #5 Outfits con armario (alternativa B, ver final del documento)
----
-## El producto en una frase
-> *"Mi mamá me pidió que le enseñara a hacer ramen. Le construí un sous-chef que vive en su tablet."*
----
-## Las 4 historias del demo
-### 1. *"Tengo esto en el refri"*
-```
-👩 Mamá toma foto del refri abierto.
-🤖 [MiniCPM-V] "Veo: pollo, cebolla, jitomate, cilantro, tortillas, queso."
-🤖 [LLM] "Te puedo proponer: tinga de pollo, enchiladas, o quesadillas. ¿Qué traes ganas?"
-👩 "Tinga."
-🤖 [Flux.2] genera foto del platillo final, hermosa, mexicana.
-🤖 "Perfecto. Te tomará 35 minutos. ¿Empezamos?"
-```
-### 2. *"Cocina paso a paso"* (hands-free)
-```
-🤖 [Flux.2] muestra: olla con cebolla acitronándose
-🤖 [Voz OpenBMB] "Pica la cebolla en cubitos chicos y ponla en aceite caliente."
-👩 (cocinando, manos sucias)
-👩 "¿Cuánto tiempo?"
-🤖 [Voz] "Hasta que se vea transparente. Como 4 minutos."
-```
-### 3. *"¿Voy bien?"* (visión en loop)
-```
-👩 (toma foto del sartén con cebolla)
-🤖 [MiniCPM-V] compara contra imagen objetivo.
-🤖 [Voz Cohere — el "tip-giver"] "Le falta un poquito. Súbele 1 minuto más, está bien."
-```
-### 4. *"No tengo cilantro"* (replan adaptativo)
-```
-👩 "No tengo cilantro."
-🤖 [LLM] re-planea sobre la marcha.
-🤖 [Voz] "No pasa nada. Le ponemos perejil o nada. Sigue siendo tinga."
-🤖 [Flux.2] regenera la foto del plato final, ahora sin cilantro.
-```
-Las 4 historias usan los **mismos 5 modelos**. Una sola pipeline.
----
-## Por qué este plan **gana** este hackathon
-### 1. "Build for someone you actually know" → Backyard AI track
-La descripción literal del track dice: *"Solve a real problem for someone you actually know. Pick a person — a neighbor, a parent, a small-business owner..."*. Tu mamá. Tu hermana. Tu hermano que vive solo. **Todos** cocinan. Pocas apps de hackathon van a tener un usuario tan cercano y tan recurrente.
-### 2. Aprovecha **todos** los assets sponsor sin copiar templates
-| Asset | Cómo se usa |
-|---|---|
-| **Flux.2 Klein 9B** (sponsor) | Genera la imagen-objetivo del platillo + "esto debes ver" en cada paso · i2i para ajustes |
-| **MiniCPM-V** (OpenBMB) | Visión: identifica ingredientes + valida progreso ("¿voy bien?") |
-| **MiniCPM razonamiento** (OpenBMB) | Recipe Planner: arma receta + replan adaptativo |
-| **OpenBMB voice / TTS** | Voz principal del sous-chef (cálida, paciente) |
-| **Cohere Labs voice** (sponsor) | Segunda voz: tips, advertencias ("¡cuidado, se quema!") |
-| **Whisper-tiny** | STT: preguntas hands-free mientras cocinas |
-| **Gradio Workflows** | UI de nodos visible: Vision → Planner → Illustrator → Narrator → Validator |
-| **Modal $250** | Hostea Flux.2 en GPU + dataset sintético + LoRA fine-tune |
-| **OpenAI Codex $100** | Pair-programmer y generador de dataset de recetas |
-Todos los sponsors tocados. Cero ideas copiadas.
-### 3. **Innovación técnica concreta**: el bucle visual cerrado
-La mayoría de "recipe apps" del mundo son listas de pasos. Cocina Conmigo introduce un **closed-loop visual**:
-```
-[Flux.2 muestra paso ideal]  ──▶  [Usuario cocina]
-              ▲                          │
-              │                          ▼
-[LLM ajusta plan]  ◀──  [MiniCPM-V valida foto del usuario]
-```
-Esto es un agente real, no un wrapper. Best Agent badge en juego.
-### 4. Demo apetitoso = video viral
-Persona real cocinando + voz cálida + ilustraciones live + "¡me quedó igual!" + plato final que se come frente a la cámara. Best Demo + Community Choice por inercia. **Nadie va a recordar la submission #14 de "voice storyteller"; van a recordar el video donde tu mamá hace tinga con AI.**
-### 5. Diferenciación cultural sostenible
-- **Español-mexicano-first** — diferenciador en hackathon US-céntrico.
-- **Cocina mexicana** como dataset de fine-tune — territorio que pocos van a tocar.
-- "Para mi mamá" como historia: emocional + universal.
----
-## Arquitectura (resumen — ver `arquitectura.html`)
-5 nodos en un Gradio Workflow visible:
-```
-   [📸/🎙️ Input]  ──▶  [👁️ Vision MiniCPM-V]  ──▶  [🧠 Recipe Planner]  ──▶  [🎨 Step Illustrator Flux.2]
-                                                          │
-                                                          ▼
-                                                   [🔊 Sous-Chef Narrator OpenBMB]  +  [🎭 Tip-Giver Cohere]
-                                                          │
-                                                          ▼
-                                                  [✅ Progress Validator]  ──▶  loop al usuario
-```
-| Nodo | Modelo | Tamaño | Rol |
-|---|---|---|---|
-| Vision In | MiniCPM-V 2.6 / 4 (Q4 GGUF) | ~2-4B | Identifica ingredientes + valida progreso |
-| Planner | MiniCPM-4 4B (LoRA en cocina mexicana) | ~4B | Genera receta JSON estructurado · replan |
-| Illustrator | Flux.2 Klein 9B (Modal GPU) | 9B | Imagen final + paso-a-paso, i2i para consistencia |
-| Narrator | OpenBMB voice / Kokoro | ~1B | Voz principal: instrucciones |
-| Tip-Giver | Cohere Labs voice | ~1B | Segunda voz: warnings, encouragement |
-| STT (opcional) | Whisper-tiny | ~40M | "¿voy bien?" "¿cuánto tiempo?" |
-**Total: ~17B parámetros** (cap 32B ✓)
-**Donde corre:**
-- Vision, Planner, voces, STT → CPU del HF Space (llama.cpp + bindings ligeros)
-- **Flux.2 → endpoint Modal con GPU L4** (no aguanta CPU del Space)
-> Mismo tradeoff que los planes anteriores: **rompemos Off the Grid** intencionalmente para preservar calidad de imagen y latencia. A cambio calificamos para Modal Awards.
----
-## Badges objetivo (5/6)
-| Badge | Cómo |
-|---|---|
-| ✓ **Llama Champion** | Vision + Planner via `llama-cpp-python` con GGUF Q4 |
-| ✓ **Well-Tuned** | LoRA del Planner en dataset de cocina mexicana, publicado en HF |
-| ✓ **Off-Brand** | UI estilo "tarjeta de receta" + modo cocina hands-free, no parece Gradio default |
-| ✓ **Sharing is Caring** | Dataset de recetas mexicanas + agent traces + recetas generadas, todo al Hub |
-| ✓ **Field Notes** | Blog: "Le construí un sous-chef a mi mamá" |
-| ✗ **Off the Grid** | Sacrificio consciente: Flux.2 corre en Modal |
-5 badges + Modal-powered fuerte = competitivo para **Bonus Quest Champion ($2K)**.
----
-## Premios objetivo (proyección)
-| Premio | Probabilidad | Por qué |
-|---|---|---|
-| **Backyard AI Track** ($1K–$4K) | **Alta** | Idea es texto literal del track. Demo emocional. |
-| **Modal Awards** ($3K–$10K credits) | **Alta** | Flux en Modal en runtime + entrenamiento offline. Modal-powered de manual. |
-| **OpenBMB Award** ($1K–$2.5K) | **Alta** | Usa modelos OpenBMB en 3 roles (vision, planner, voice) sin copiar template |
-| **Best Demo** ($1K) | **Alta** | Persona cocinando + comida final + voz = video apetitoso |
-| **Community Choice** ($2K) | **Alta** | Apela a memoria emocional universal (tu mamá cocinando) |
-| **Bonus Quest Champion** ($2K) | Media-alta | 5/6 badges es competitivo |
-| **Best Agent** ($1K) | Media-alta | Closed-loop multi-agent real (5 agentes) |
-| **Off-Brand** ($1.5K) | Media | UI tarjeta-de-receta tiene buenas chances |
-| **Tiny Titan** ($1.5K) | Baja | Flux.2 9B nos saca del rango ≤4B |
-**Cota razonable acumulada:** $5K–$12K cash + $3K–$10K Modal credits.
----
-## Las 3 condiciones que pone Idea.md
-| Condición | Cómo se cumple |
-|---|---|
-| **Innovador** | Closed-loop visual (Flux genera ideal → usuario cocina → vision valida → planner ajusta) — no existe en apps de receta |
-| **Fresco** | Combina Flux.2 (nuevo) + Workflows (lanzado ayer) + voces multi-sponsor + cocina hands-free. Ninguna submission tendrá esa combinación. |
-| **Útil** | Cocinar es diario, universal, recurrente. La app reemplaza Google + YouTube + adivinar. |
----
-## Decisiones que tienes que tomar tú
-| Decisión | Recomendación |
-|---|---|
-| ¿Cocina Conmigo o Mi Espejo (outfits)? | **Cocina.** Menor riesgo técnico (Flux generando platos > generando personas reales con ropa). Más universal. |
-| ¿Cocina mexicana o cocina general? | **Mexicana.** Diferenciador + fine-tune en dataset acotado y rico. |
-| ¿Persona real para el demo? | **Sí, no negociable.** Tu mamá, tu pareja, tu vecina. Que coma frente a la cámara al final. |
-| ¿Empiezas con texto o con voz/foto? | **Empieza con foto del refri + texto.** Voz se agrega en Día 7-9. |
-| ¿Cuántos pasos por receta? | 5-7 pasos. Más es muy largo para el demo, menos no es una receta. |
----
-## Plan B — alternativa "Mi Espejo"
-Si por cualquier razón Cocina Conmigo no avanza (ej. Flux.2 genera platillos feos consistentemente), pivota a **"Mi Espejo"** (refinamiento de tu idea #5):
-- 📸 Subes foto tuya + fotos del armario.
-- 🧠 Stylist LLM combina outfits según ocasión + tendencia.
-- 🎨 **Flux.2 i2i te genera vistiendo cada combinación.**
-- 🔊 Voz comenta el look.
-Mismas badges, mismo track (Backyard), pero más alto wow visual y más alto riesgo (uncanny valley con personas reales). **Es plan B**, no plan A.
----
-## Siguiente paso
-Lee **`estrategia.md`** (timeline 10 días, gasto Modal/Codex, riesgos+mitigaciones, snippets) y **`arquitectura.html`** (diagrama del sistema + las 4 historias del demo + Workflow visual). Luego abre Codex CLI y haz el "hola mundo" del Día 1: un endpoint Modal que devuelve una imagen Flux.2 de un platillo dado un nombre de receta.
-> *"Cocinar es la última cosa que la IA debería poder ayudarte a hacer bien. Y por eso es la mejor cosa que puedes ganar haciendo."*

Strategy/plan_implementacion.md DELETED Viewed

@@ -1,674 +0,0 @@
-# Implementation Plan — "Cook With Me"
-> Step-by-step implementation guide for developers building the multimodal cooking sous-chef Gradio app for Hugging Face Spaces.
->
-> **Hackathon:** Small models / Big adventures — June 2026
-> **Read first:** `plan.md` (the *what* and *why*) and `estrategia.md` (the *how* at a strategic level). This document is the *how* at a tactical level — turn this into code.
----
-## 0. Locked decisions (do not re-discuss)
-| Decision | Value | Reason |
-|---|---|---|
-| UI framework | **Gradio** | Hackathon requirement |
-| Hosting | **Hugging Face Space** | Hackathon requirement |
-| Inference runtime (text + vision) | **llama.cpp** via `llama-cpp-python` | Runs inside the Space CPU, no external APIs needed for now. Future: migrate to Modal |
-| Image generation | **FLUX.2 Klein 9B** (`black-forest-labs/FLUX.2-klein-9B`) | Sponsor model; runs in the Space if a GPU Space is rented (or via `enable_model_cpu_offload()` as fallback). Plan to migrate this specific component to Modal post-hackathon |
-| Recipe planner / reasoning | **`openbmb/MiniCPM-V-4`** (GGUF) | Provided requirement |
-| Vision (ingredient ID + progress validator) | **`openbmb/MiniCPM-V-4.6`** (GGUF) | Provided requirement |
-| Text-to-speech | **OpenBMB VoxCPM2** | Provided requirement |
-| Recipe dataset | **`thedevastator/better-recipes-for-a-better-life`** (Kaggle) — international cuisine | Provided requirement; not limited to Mexican food |
-| App language | **English only** | Provided requirement |
-| Final output | **Recipe + step images + voice + nutritional values** | Provided requirement |
-| External API calls at runtime | **None** | "llama.cpp inside the Space" mandate |
----
-## 1. Architecture (final, English-only, llama.cpp-first)
-```
-                          ┌──────────────────────────────────────┐
-                          │     Hugging Face Space (Gradio)      │
-                          │   (CPU + optional GPU upgrade)       │
-                          ├──────────────────────────────────────┤
-   📸 Fridge photo  ─────▶│  [Vision Agent]                      │
-                          │   MiniCPM-V-4.6 GGUF (llama.cpp)     │
-                          │   → list[ingredient]                  │
-                          │              │                        │
-                          │              ▼                        │
-   🥘 User picks dish ───▶│  [Recipe Planner]                    │
-                          │   MiniCPM-V-4 GGUF (llama.cpp)       │
-                          │   + retrieval over Kaggle dataset    │
-                          │   → Recipe JSON (steps, nutrition)   │
-                          │              │                        │
-                          │              ▼                        │
-                          │  [Step Illustrator]                   │
-                          │   FLUX.2 Klein 9B (diffusers)        │
-                          │   → PNG per step + final dish        │
-                          │              │                        │
-                          │              ▼                        │
-                          │  [Narrator]                           │
-                          │   VoxCPM2 → MP3 per step             │
-                          │              │                        │
-                          │              ▼                        │
-   📸 Progress photo ────▶│  [Progress Validator]                │
-                          │   MiniCPM-V-4.6 (vision compare)     │
-                          │   → "go / wait / fix" + tip          │
-                          └──────────────────────────────────────┘
-```
-**Total parameter count (≤ 32B requirement):**
-- MiniCPM-V-4 (reasoning) ≈ 4B
-- MiniCPM-V-4.6 (vision) ≈ 4.6B
-- FLUX.2 Klein ≈ 9B
-- VoxCPM2 ≈ 1B (estimate)
-- **Total ≈ 18.6B ✓**
----
-## 2. Repository layout
-```
-cook-with-me/
-├── app.py                      # Gradio entrypoint (Space looks for this)
-├── requirements.txt
-├── packages.txt                # apt packages (ffmpeg, libsndfile1)
-├── README.md                   # Space card (HF requires YAML frontmatter)
-├── .gitignore
-├── src/
-│   ├── __init__.py
-│   ├── config.py               # paths, model IDs, constants
-│   ├── models/
-│   │   ├── __init__.py
-│   │   ├── vision.py           # MiniCPM-V-4.6 wrapper (llama-cpp)
-│   │   ├── planner.py          # MiniCPM-V-4 wrapper (llama-cpp)
-│   │   ├── illustrator.py      # FLUX.2 Klein wrapper (diffusers)
-│   │   ├── narrator.py         # VoxCPM2 wrapper
-│   │   └── loader.py           # lazy singletons + GGUF download
-│   ├── agents/
-│   │   ├── mise_en_place.py    # ingredient identification
-│   │   ├── recipe_planner.py   # builds Recipe object
-│   │   ├── step_illustrator.py # per-step image gen
-│   │   ├── narrator.py         # per-step TTS
-│   │   └── progress_validator.py
-│   ├── data/
-│   │   ├── recipe_index.py     # loads Kaggle dataset, builds retrieval
-│   │   └── nutrition.py        # USDA-style nutrition computation
-│   ├── pipeline.py             # Recipe state machine, orchestration
-│   ├── prompts/
-│   │   ├── vision_prompt.txt
-│   │   ├── planner_system.txt
-│   │   └── validator_prompt.txt
-│   └── ui/
-│       ├── theme.py            # custom CSS (Off-Brand badge)
-│       └── components.py       # reusable Gradio Blocks pieces
-├── scripts/
-│   ├── download_models.py      # pre-warms GGUF + Flux weights at build time
-│   ├── build_recipe_index.py   # caches Kaggle dataset locally
-│   └── smoke_test.py           # end-to-end validation before push
-└── assets/
-    ├── sample_fridge_1.jpg
-    └── sample_progress_1.jpg
-```
----
-## 3. Phase-by-phase plan (10 days)
-> Each phase has: **goal**, **tasks**, **deliverable**, **verification check**. Do not move to the next phase if verification fails.
----
-### Phase 0 — Day 0 (½ day): Account + tooling setup
-**Goal:** every credential and CLI is ready before writing code.
-**Tasks**
-1. Create or confirm Hugging Face account; generate a **write token** (Settings → Access Tokens). Store as `HF_TOKEN` env var locally.
-2. Install Hugging Face CLI: `pip install -U huggingface_hub` then `huggingface-cli login`.
-3. Install Kaggle CLI: `pip install kaggle`. Place `kaggle.json` (Account → API → Create New Token) in `~/.kaggle/kaggle.json` with `chmod 600`.
-4. Install OpenAI Codex CLI (pair-programmer) and verify your $100 credit is active.
-5. Install local Python 3.11 venv: `python -m venv .venv && source .venv/bin/activate`.
-6. Create the repo locally: `git init cook-with-me && cd cook-with-me`.
-7. Create an empty Hugging Face Space: huggingface.co → New Space → SDK = **Gradio**, Hardware = **CPU basic** (upgrade later if you need GPU for FLUX). Clone it and copy your repo skeleton into it.
-8. Verify model availability: open in a browser and confirm pages exist:
-   - `huggingface.co/openbmb/MiniCPM-V-4`
-   - `huggingface.co/openbmb/MiniCPM-V-4-6`
-   - `huggingface.co/openbmb/VoxCPM2` (or whatever the exact repo name is — search "VoxCPM" on HF)
-   - `huggingface.co/black-forest-labs/FLUX.2-klein-9B`
-**Deliverable:** empty Space deployed showing "Hello World" Gradio.
-**Verify:** `https://huggingface.co/spaces/<you>/cook-with-me` loads.
----
-### Phase 1 — Day 1: Project skeleton + recipe dataset ingestion
-**Goal:** the Kaggle dataset is downloaded, parsed, and cached as a local artifact ready for retrieval.
-**Tasks**
-1. Write `requirements.txt` (initial version — packages will be added as phases progress):
-   ```text
-   gradio>=4.44
-   huggingface_hub>=0.24
-   llama-cpp-python>=0.3.2
-   numpy
-   pandas
-   Pillow
-   pydantic>=2
-   sentence-transformers
-   ```
-2. Write `packages.txt`:
-   ```text
-   ffmpeg
-   libsndfile1
-   ```
-3. Write `scripts/build_recipe_index.py`:
-   - Use `kagglehub.load_dataset(KaggleDatasetAdapter.PANDAS, "thedevastator/better-recipes-for-a-better-life", file_path)` — discover `file_path` by listing the dataset files first via `kagglehub.dataset_download`.
-   - Normalize columns: `name`, `ingredients` (list[str]), `instructions` (list[str]), `cuisine` (str if present, else "international"), `prep_time`, `servings`.
-   - Drop rows missing critical fields. Lowercase + strip ingredient strings.
-   - Save to `data/recipes.parquet` (~5–50MB depending on dataset size).
-   - Build sentence embeddings of the recipe **name + first 3 ingredients** using `sentence-transformers/all-MiniLM-L6-v2` and save to `data/recipes_emb.npy`.
-   - This script runs **once locally**; commit the parquet + npy files to the repo (or to a private HF Dataset, then download in `app.py`). If files exceed 100MB, push to a HF Dataset repo: `<you>/cook-with-me-recipes`.
-4. Write `src/data/recipe_index.py`:
-   - `class RecipeIndex` with `.search(ingredients: list[str], top_k=5) -> list[RecipeRow]`.
-   - Build a query string from ingredients, embed it, cosine-similarity against the cached embeddings, return top-k.
-**Deliverable:** `python -c "from src.data.recipe_index import RecipeIndex; r=RecipeIndex(); print(r.search(['chicken','onion','tomato']))"` prints 5 sensible recipes.
-**Verify:** at least 3 of the top-5 results contain ≥2 of the input ingredients.
----
-### Phase 2 — Day 2: Vision agent (Mise en Place) — MiniCPM-V-4.6 via llama.cpp
-**Goal:** given a fridge photo, return a clean list of English ingredient names.
-**Background:** llama.cpp supports multimodal models through a vision projector (`mmproj-*.gguf`) plus the language model GGUF. MiniCPM-V family ships both files on the Hub.
-**Tasks**
-1. Find the GGUF release of MiniCPM-V-4.6. Search HF for `MiniCPM-V-4_6-gguf` or `openbmb/MiniCPM-V-4_6-gguf`. You need **two** files:
-   - `Model-Q4_K_M.gguf` (or similar quant)
-   - `mmproj-model-f16.gguf` (the vision projector)
-2. Write `src/models/loader.py`:
-   ```python
-   from huggingface_hub import hf_hub_download
-   from llama_cpp import Llama
-   from llama_cpp.llama_chat_format import MiniCPMv26ChatHandler  # or matching handler
-   _vision = None
-   def get_vision_model():
-       global _vision
-       if _vision is None:
-           model_path = hf_hub_download(
-               repo_id="openbmb/MiniCPM-V-4_6-gguf",  # confirm exact repo
-               filename="Model-Q4_K_M.gguf",
-           )
-           mmproj_path = hf_hub_download(
-               repo_id="openbmb/MiniCPM-V-4_6-gguf",
-               filename="mmproj-model-f16.gguf",
-           )
-           handler = MiniCPMv26ChatHandler(clip_model_path=mmproj_path)
-           _vision = Llama(
-               model_path=model_path,
-               chat_handler=handler,
-               n_ctx=4096,
-               n_threads=4,
-               verbose=False,
-           )
-       return _vision
-   ```
-3. Write `src/agents/mise_en_place.py`:
-   ```python
-   import base64, io, json
-   from PIL import Image
-   from src.models.loader import get_vision_model
-   PROMPT = (
-     "You are an ingredient detector. Look at the fridge/pantry photo and "
-     "list every edible ingredient you can identify. Return strict JSON: "
-     '{"ingredients": ["chicken", "onion", "tomato", ...]} '
-     "Lowercase, English, no brand names, no containers."
-   )
-   def _img_to_data_url(img: Image.Image) -> str:
-       buf = io.BytesIO(); img.save(buf, "JPEG", quality=85)
-       b64 = base64.b64encode(buf.getvalue()).decode()
-       return f"data:image/jpeg;base64,{b64}"
-   def identify_ingredients(image: Image.Image) -> list[str]:
-       llm = get_vision_model()
-       out = llm.create_chat_completion(messages=[
-           {"role": "user", "content": [
-               {"type": "image_url", "image_url": {"url": _img_to_data_url(image)}},
-               {"type": "text", "text": PROMPT},
-           ]}
-       ], temperature=0.2, response_format={"type": "json_object"})
-       data = json.loads(out["choices"][0]["message"]["content"])
-       return [s.lower().strip() for s in data["ingredients"]]
-   ```
-4. Test locally with 5 sample fridge photos.
-**Deliverable:** the function returns a non-empty English list with ≥80% precision on a clean fridge photo.
-**Verify:** stash these 5 results in `tests/vision_smoke.json` for regression checks.
----
-### Phase 3 — Day 3: Recipe Planner — MiniCPM-V-4 via llama.cpp + retrieval
-**Goal:** given a list of ingredients (and optionally a chosen dish), return a fully structured `Recipe` JSON including steps, durations, visual descriptions, and nutritional values.
-**Tasks**
-1. Find or convert MiniCPM-V-4 to GGUF. Likely repo: `openbmb/MiniCPM-V-4-gguf` or community quants. Pick `Q4_K_M`.
-2. Add to `src/models/loader.py` a `get_planner_model()` (same pattern as vision but without `chat_handler`).
-3. Write `src/agents/recipe_planner.py`:
-   - **Step A — propose:** call planner with `Tengo: [ingredients]. Propose 3 dish options that fit. Reply JSON.`
-   - **Step B — retrieve:** for the chosen dish name, call `RecipeIndex.search(...)` and pick the closest match. Use it as a *grounded reference*.
-   - **Step C — restructure:** prompt the planner with both the user's available ingredients and the retrieved reference recipe, asking it to output the canonical `Recipe` JSON schema below. The retrieval grounds the model and prevents hallucinated steps.
-   - **Step D — nutrition:** from the recipe ingredients, compute approximate nutritional values per serving. See Phase 3.5.
-4. Define the canonical schema in `src/pipeline.py` using Pydantic:
-   ```python
-   from pydantic import BaseModel
-   from typing import Optional
-   class Step(BaseModel):
-       n: int
-       instruction: str       # English, imperative
-       duration: str          # "4 minutes"
-       visual: str            # English visual description for FLUX prompt
-       tip: Optional[str] = None
-   class Nutrition(BaseModel):
-       calories: int          # per serving
-       protein_g: float
-       carbs_g: float
-       fat_g: float
-       fiber_g: float
-   class Recipe(BaseModel):
-       name: str
-       cuisine: str
-       servings: int
-       total_time_minutes: int
-       options: list[dict]    # only populated on "propose" call
-       ingredients_have: list[str]
-       ingredients_missing: list[str]
-       substitutes: dict[str, list[str]]
-       steps: list[Step]
-       final_dish_visual: str
-       nutrition_per_serving: Nutrition
-   ```
-5. Write the system prompt (`src/prompts/planner_system.txt`):
-   - Persona: international chef
-   - Hard rule: output JSON only, matching schema
-   - Hard rule: prefer dishes feasible with available ingredients
-   - Hard rule: 5–7 steps, each ≤ 25 words, each with a concrete `visual` field for image generation
-   - Hard rule: include `nutrition_per_serving` (model is allowed to estimate; you'll override with `data/nutrition.py` for accuracy)
-6. Use `response_format={"type": "json_object"}` in the chat completion call. Set `temperature=0.7, top_p=0.95, enable_thinking=True` for the propose step (creative); `temperature=0.4` for the structured-output step (deterministic).
-**Deliverable:** for `["chicken","onion","tomato","tortilla","cheese"]` and chosen dish "chicken tinga", the function returns a valid `Recipe` Pydantic object with 5–7 steps.
-**Verify:** the JSON parses, each step has all required fields, and total inference time on Space CPU < 60 seconds.
----
-### Phase 3.5 — Day 3 (afternoon): Nutritional values
-**Goal:** the recipe ends with reliable per-serving nutrition (not hallucinated by the LLM).
-**Approach:** small, embedded reference table beats LLM math.
-**Tasks**
-1. Bundle `data/nutrition_table.csv` — a 200-row CSV mapping common English ingredient names to per-100g macros (kcal, protein, carbs, fat, fiber). Source: USDA FoodData Central CSV download (free, public domain). Trim columns; commit to repo.
-2. Write `src/data/nutrition.py`:
-   - `parse_quantity(line: str) -> (grams, ingredient_name)` — handle "2 cups flour", "200 g chicken", "1 tbsp olive oil". Use a small regex + a unit-to-grams table (cup=240, tbsp=15, tsp=5, oz=28.35).
-   - `compute_nutrition(ingredient_lines: list[str], servings: int) -> Nutrition` — sum per-100g values weighted by grams, divide by servings.
-   - If a line cannot be parsed, skip it and log; don't crash.
-3. After the planner returns a recipe, **overwrite** `recipe.nutrition_per_serving` with the computed value. Keep the LLM's value only as a fallback when the parser yields zero.
-**Deliverable:** for a known recipe (e.g., spaghetti with tomato sauce, 4 servings), computed calories per serving is within ±25% of online references.
----
-### Phase 4 — Day 4: Step Illustrator — FLUX.2 Klein 9B
-**Goal:** generate an appetizing image for the final dish + one image per step.
-**Constraint:** FLUX.2 Klein on CPU is impractical; on a free Space CPU it would take ~10 minutes per image. Two paths:
-- **Path A (recommended for the hackathon):** upgrade the Space to a GPU instance (T4 or A10G — paid, but $20 HF credits cover it for a week of development). Code stays unchanged.
-- **Path B (fallback):** run FLUX in `enable_model_cpu_offload()` mode with `num_inference_steps=4` and accept ~3 min/image — only feasible for pre-rendered demo recipes, not live runs.
-**Tasks**
-1. Add to `requirements.txt`:
-   ```text
-   diffusers>=0.31
-   transformers>=4.45
-   accelerate
-   torch
-   safetensors
-   ```
-2. Write `src/models/illustrator.py`:
-   ```python
-   import torch
-   from diffusers import Flux2KleinPipeline
-   _pipe = None
-   def get_flux():
-       global _pipe
-       if _pipe is None:
-           dtype = torch.bfloat16
-           _pipe = Flux2KleinPipeline.from_pretrained(
-               "black-forest-labs/FLUX.2-klein-9B",
-               torch_dtype=dtype,
-           )
-           _pipe.enable_model_cpu_offload()
-       return _pipe
-   def render(prompt: str, seed: int = 0) -> "PIL.Image.Image":
-       pipe = get_flux()
-       device = "cuda" if torch.cuda.is_available() else "cpu"
-       img = pipe(
-           prompt=prompt,
-           height=1024, width=1024,
-           guidance_scale=1.0,
-           num_inference_steps=4,
-           generator=torch.Generator(device=device).manual_seed(seed),
-       ).images[0]
-       return img
-   ```
-3. Write `src/agents/step_illustrator.py`:
-   - For each `Step.visual`, build a prompt like:
-     > `f"Top-down photo of a kitchen pan or plate showing {visual}. {cuisine} home cooking, warm natural lighting, recipe magazine style, photorealistic, appetizing."`
-   - Generate the **final dish image first**, then the per-step images, all in **one Python loop** (no parallelism — FLUX holds the GPU).
-   - Cache results on disk keyed by `hash(prompt)` to avoid re-renders on re-runs.
-   - Emit Gradio progress updates so the UI doesn't appear frozen.
-4. **Critical tuning:** keep `num_inference_steps=4` (Klein is distilled). Higher counts blow latency and offer minimal quality gain at this scale.
-**Deliverable:** for a 5-step recipe, all 6 images (final + 5 steps) render in:
-- < 1 minute on T4 GPU Space
-- < 8 minutes on CPU offload (acceptable only for pre-cached demos)
-**Verify:** show the 6 images to an unprompted human; ≥4 should be described as "appetizing".
----
-### Phase 5 — Day 5: Narrator — VoxCPM2
-**Goal:** every step's instruction is rendered to an MP3 in a warm, clear English voice.
-**Tasks**
-1. Confirm the exact VoxCPM2 repo name on HF (`openbmb/VoxCPM2` or similar). Read its README for the inference snippet — TTS APIs vary widely between models.
-2. Add to `requirements.txt`: `soundfile`, `torchaudio`, `numpy`. If VoxCPM2 ships GGUF, use it via `llama-cpp-python` audio extension (if available); otherwise load via `transformers` directly.
-3. Write `src/models/narrator.py`:
-   ```python
-   _tts = None
-   def get_tts():
-       global _tts
-       if _tts is None:
-           # placeholder — replace with the exact VoxCPM2 loading code from its README
-           from transformers import AutoModel, AutoProcessor
-           _tts = ... # load on CPU; VoxCPM2 is small (~1B)
-       return _tts
-   def synthesize(text: str, voice: str = "warm_female_en") -> bytes:
-       """Returns MP3 bytes."""
-       tts = get_tts()
-       wav = tts.generate(text, voice=voice)  # API depends on VoxCPM2
-       # encode wav -> mp3 with soundfile + ffmpeg-python or pydub
-       return mp3_bytes
-   ```
-4. Write `src/agents/narrator.py`:
-   - For each step, synthesize `step.instruction`. If `step.tip` is set, synthesize a separate "tip" clip.
-   - Save MP3 files in a per-recipe temp directory; return file paths to Gradio.
-5. Pre-render all step audio when the recipe is finalized — never stream per-step in the demo (too much UI lag).
-**Deliverable:** clicking "Play" on step 1 in the UI plays clear English narration.
-**Verify:** on a 5-step recipe, total TTS rendering time < 30 seconds on CPU.
----
-### Phase 6 — Day 6: Gradio UI (Off-Brand)
-**Goal:** the Space looks like a recipe magazine, not stock Gradio.
-**Tasks**
-1. Write `src/ui/theme.py`:
-   ```python
-   import gradio as gr
-   theme = gr.themes.Soft(
-       primary_hue="orange",
-       neutral_hue="stone",
-       font=[gr.themes.GoogleFont("Inter"), "sans-serif"],
-       font_mono=[gr.themes.GoogleFont("JetBrains Mono"), "monospace"],
-   )
-   CSS = """
-   .gradio-container { background: #f5ecd9 !important; }
-   .recipe-hero { background:#fffbf0; border-radius:14px; padding:28px; }
-   .recipe-hero h1 { font-family:'Lora',serif!important; font-size:36px!important; color:#6b4a2a!important; }
-   .step-card { background:#fffbf0; border-left:4px solid #a85c2a; border-radius:8px; padding:18px 22px; margin:12px 0; }
-   .nutri-grid { display:grid; grid-template-columns:repeat(5,1fr); gap:12px; margin-top:24px; }
-   .nutri-cell { background:#fffbf0; border:1px solid #d8c9ad; border-radius:10px; padding:12px; text-align:center; }
-   """
-   ```
-2. Write `app.py` with three tabs:
-   - **Tab 1 — Cook**: fridge photo input → ingredient chips → 3 dish options → selected recipe card with hero image, steps (image + text + audio play button each), nutrition grid at the bottom.
-   - **Tab 2 — Check Progress**: upload a progress photo + select active step → validator returns badge (`go/wait/fix`) + tip + audio.
-   - **Tab 3 — About / Tech**: README-style explanation, badges, model list.
-3. Use `gr.Blocks` with `gr.State` to hold the current `Recipe` Pydantic object across UI events. Serialize to/from `dict` since Pydantic objects don't survive Gradio state by default — wrap in `state.value = recipe.model_dump()`.
-4. Wire callbacks:
-   - `btn_propose.click(fn=on_propose, inputs=[fridge_photo], outputs=[ingredient_chips, dish_options, state])`
-   - `dish_options.select(fn=on_pick_dish, inputs=[state, picked_dish], outputs=[recipe_card, hero_img, steps_column, nutrition_grid, state])`
-   - `progress_image.upload(fn=on_validate, inputs=[state, current_step_idx, progress_image], outputs=[verdict_md, tip_audio])`
-**Deliverable:** end-to-end run from a sample fridge photo to a fully rendered recipe card with audio and nutrition. No Gradio default look anywhere.
----
-### Phase 7 — Day 7: Progress Validator (closed loop)
-**Goal:** user uploads a progress photo, app says "go / wait / fix" with a voiced tip.
-**Tasks**
-1. Write `src/agents/progress_validator.py`:
-   ```python
-   PROMPT = """Compare these two cooking photos.
-   Photo 1 (target): how it should look after the step "{instruction}".
-   Photo 2 (user's pan/plate): the user's current progress.
-   Reply strict JSON: {"verdict": "go|wait|fix", "feedback": "...", "tip": "..."}
-   - "go": looks right, move to next step
-   - "wait": needs more time, do not change anything yet
-   - "fix": something is off; suggest a concrete adjustment in one sentence
-   """
-   def validate(target_img, user_img, step_instruction): ...
-   ```
-2. Use the same vision model singleton as Phase 2 — both calls share weights.
-3. Render the verdict as a colored badge (green/amber/red) and play the tip via VoxCPM2.
-**Deliverable:** running the validator on 5 real progress photos returns the correct verdict on ≥3.
----
-### Phase 8 — Day 8: Fine-tune the Planner on the Kaggle dataset (Well-Tuned badge)
-> **Important caveat:** The user instruction says "for now keep inference on llama.cpp inside HF Space, future migration to Modal." Fine-tuning still **requires GPU**, so training itself happens on Modal (one-shot, offline) or on a rented Colab/Lambda GPU. Inference of the resulting model stays on llama.cpp inside the Space (as GGUF). This does **not** violate the runtime constraint — only the build pipeline touches a GPU.
-**Goal:** publish a fine-tuned Planner GGUF to the Hub and load it from the Space.
-**Tasks**
-1. **Build SFT dataset** (`scripts/build_sft_dataset.py`):
-   - Load Kaggle `better-recipes` dataset.
-   - For each recipe, build a `(prompt, completion)` pair where `prompt` is `"Available ingredients: X, Y, Z. Propose recipe."` and `completion` is the full canonical `Recipe` JSON.
-   - Generate ~1000 pairs, push to `<you>/cook-with-me-sft` HF Dataset.
-2. **LoRA training** (`scripts/train_planner.py` — to be run on a GPU machine, not the Space):
-   ```python
-   # peft + trl SFTTrainer, base = openbmb/MiniCPM-V-4
-   # r=16, alpha=32, lr=2e-4, epochs=2, batch=4
-   # push_to_hub=True, hub_model_id="<you>/cook-with-me-planner-4b"
-   ```
-3. **Convert to GGUF** (Day 8 evening):
-   - Use `llama.cpp/convert_hf_to_gguf.py` then `quantize` to `Q4_K_M`.
-   - Push GGUF to `<you>/cook-with-me-planner-4b-gguf`.
-4. Update `src/models/loader.py` to point at your GGUF instead of the base model.
-**Deliverable:** the Space loads your fine-tuned Planner GGUF and produces JSON recipes that are noticeably better-formatted than the base model on a held-out test set.
----
-### Phase 9 — Day 9: End-to-end test, performance pass, pre-warm cache
-**Goal:** the Space loads in <60s and a full recipe (text + 5 images + 5 audios + nutrition) renders in <2 minutes on the chosen hardware.
-**Tasks**
-1. Write `scripts/smoke_test.py` that runs the full pipeline on 3 sample fridge photos and asserts:
-   - Each ingredient list is non-empty
-   - Each recipe has 5–7 steps
-   - Each step has a non-empty image and audio path
-   - Nutrition has all 5 macros set
-2. Implement **on-disk caching** for FLUX outputs (key = SHA256 of prompt) so re-runs of the same recipe are instant. Save to `~/.cache/cook-with-me/flux/`.
-3. Pre-render and commit **3 fully-prepared demo recipes** (chicken tinga, pasta carbonara, chicken tikka) so judges see results in <5s on first click.
-4. Add error handling at every UI boundary: a model failure should display a friendly message, not a stack trace.
-5. Add a "Loading models..." progress bar on first request — first cold start can take 90s.
-**Deliverable:** smoke test passes on the live Space.
----
-### Phase 10 — Day 10: README, demo video, social post, submit
-**Tasks**
-1. Write `README.md` with the required HF Space frontmatter:
-   ```yaml
-   ---
-   title: Cook With Me
-   emoji: 🍲
-   colorFrom: orange
-   colorTo: yellow
-   sdk: gradio
-   sdk_version: 4.44.0
-   app_file: app.py
-   pinned: false
-   license: apache-2.0
-   ---
-   ```
-   Followed by:
-   - One-paragraph pitch
-   - 60-second demo video embed
-   - Architecture diagram (export from `arquitectura.html` as PNG)
-   - Section: "How closed-loop visual cooking guidance works"
-   - Models used (with HF links + total parameter count)
-   - Badges declared
-   - Build / run instructions
-2. Record a 60–90 second demo video: real person cooks a recipe end-to-end with the app guiding via voice, ending with the cooked plate on camera.
-3. Write the Field Notes blog post: one of the engineering surprises (e.g., "FLUX.2 step images at 4 steps look better than 8 — here's why" or "Closed-loop validation needs the same vision model on both sides").
-4. Social post on X / LinkedIn with the demo video.
-5. Submit on the hackathon platform.
----
-## 4. Tools usage matrix (when to reach for what)
-| Phase | Primary tools | Why |
-|---|---|---|
-| 0 — setup | HF CLI, Kaggle CLI, OpenAI Codex CLI | one-shot config |
-| 1 — data | `kagglehub`, `pandas`, `sentence-transformers` | offline dataset prep |
-| 2 — vision | `llama-cpp-python` + `MiniCPMv26ChatHandler` | runs inside Space, badge: Llama Champion |
-| 3 — planner | `llama-cpp-python` + retrieval over local parquet | grounded JSON output |
-| 3.5 — nutrition | local CSV + regex parser | reliable, no LLM math |
-| 4 — illustrator | `diffusers` + `Flux2KleinPipeline` | sponsor model showcase |
-| 5 — narrator | VoxCPM2 via `transformers` (or its native API) | local TTS |
-| 6 — UI | `gradio` + custom CSS theme | Off-Brand badge |
-| 7 — validator | same vision singleton as phase 2 | closed-loop innovation, Best Agent |
-| 8 — fine-tune | `peft`, `trl`, `llama.cpp` convert/quantize, on a GPU machine | Well-Tuned badge |
-| 9 — test/cache | `pytest`, `hashlib`, on-disk FLUX cache | demo reliability |
-| 10 — submit | HF Spaces, video tool, social | shipping |
----
-## 5. Performance budget on the HF Space
-| Operation | Target latency | Hardware needed |
-|---|---|---|
-| Vision: ingredient ID | < 8 s | CPU 4-thread |
-| Planner: propose 3 dishes | < 12 s | CPU 4-thread |
-| Planner: build full recipe JSON | < 20 s | CPU 4-thread |
-| Nutrition computation | < 0.1 s | CPU |
-| FLUX: 1 image (4 steps) | < 12 s on T4 / < 90 s on CPU offload | GPU strongly recommended |
-| FLUX: 6 images (final + 5 steps) | < 80 s on T4 | GPU |
-| VoxCPM2: 1 step narration | < 5 s | CPU |
-| Validator: 1 progress check | < 8 s | CPU |
-| **Full recipe end-to-end** | **< 2 min on T4 Space** | — |
-**Hardware decision:** rent a T4 Space (~$0.40/hr) for the demo week. The $20 HF credits cover ~50 hours.
----
-## 6. Risks and mitigations (delta from `estrategia.md`)
-| Risk | Mitigation |
-|---|---|
-| MiniCPM-V-4 has no public GGUF | Convert yourself with `llama.cpp/convert_hf_to_gguf.py`. Allow a half-day buffer in Phase 2. |
-| llama-cpp-python's MiniCPM-V chat handler version mismatch | Pin `llama-cpp-python==0.3.2` minimum; test the handler import on Day 2. If it fails, fall back to MiniCPM-V-2.6 GGUF (well-supported) for vision and document the swap. |
-| FLUX.2 Klein 9B too slow on free CPU Space | Upgrade to a paid GPU Space (~$10 for the demo week). Document this in the README so judges expect it. |
-| VoxCPM2 docs sparse | Drop to Kokoro-82M or Piper TTS as a backup. Lose the OpenBMB voice angle but keep the audio. |
-| Kaggle dataset has format quirks (HTML in instructions, missing fields) | The Phase 1 normalization step handles this; budget 2 hours. |
-| Nutrition CSV missing exotic ingredients | Skip-and-log strategy already designed; demo-day recipes use common ingredients only. |
-| Total params >32B if VoxCPM2 turns out to be 7B | Check size in Phase 0; if too large, drop to a smaller TTS. |
----
-## 7. "Day-1 hello world" checklist
-Before writing any agent code, get this minimal end-to-end loop working — it proves your stack:
-1. ☐ Empty Gradio Space deployed, shows "Hello"
-2. ☐ `huggingface-cli login` works locally
-3. ☐ `kaggle datasets download thedevastator/better-recipes-for-a-better-life` succeeds
-4. ☐ `from llama_cpp import Llama` runs in your venv
-5. ☐ Download one tiny GGUF (e.g., TinyLlama Q4) and call it from a Gradio textbox round-trip
-6. ☐ Push the round-trip to the Space; confirm it answers in the cloud
-**Only after all 6 are checked, start Phase 1.**
----
-## 8. Where this plan differs from `estrategia.md` (deltas to communicate)
-| Topic | `estrategia.md` (Spanish, Mexican-cuisine focus) | This document (current requirements) |
-|---|---|---|
-| Language | Spanish-first | **English only** |
-| Cuisine | Mexican | **International** (Kaggle dataset) |
-| Voice models | OpenBMB voice + Cohere Labs | **VoxCPM2** only (single voice) |
-| Vision model | MiniCPM-V 2.6 / 4 | **MiniCPM-V-4.6** |
-| Reasoning model | MiniCPM-4 4B | **MiniCPM-V-4** |
-| FLUX runtime | Modal endpoint | **Inside Space (llama.cpp principle)**; Modal kept as a future migration target only |
-| External APIs at runtime | Allowed (Modal, OpenAI optional) | **None** — full local inference inside Space |
-| Nutritional info | Not specified | **Required** at end of recipe |
-| Fine-tune dataset | 200 synthetic Mexican recipes | **Kaggle better-recipes (international)** |
-If anything in `plan.md` or `estrategia.md` conflicts with this document, **this document wins** — it reflects the latest user requirements.
----
-## 9. Definition of done
-The implementation is complete when **all** of these are true:
-- [ ] Public HF Space `https://huggingface.co/spaces/<you>/cook-with-me` loads
-- [ ] App is fully in English
-- [ ] Fridge photo → ingredient list → 3 dish options → full recipe with images, audio, and nutrition works end-to-end
-- [ ] Progress validator returns sensible verdicts on 3+ test photos
-- [ ] All inference (vision, planner, TTS) runs through llama.cpp / local diffusers — **no external API calls at runtime**
-- [ ] Total parameters declared in README ≤ 32B
-- [ ] Fine-tuned Planner GGUF published to HF Hub (Well-Tuned badge)
-- [ ] Demo video (60–90s) recorded with a real person cooking
-- [ ] Field Notes blog post published
-- [ ] Submitted on the hackathon platform before deadline

app.py CHANGED Viewed

@@ -1,5 +1,4 @@
 import logging
-logging.basicConfig(level=logging.INFO)
 log = logging.getLogger(__name__)
 from typing import Any
@@ -7,11 +6,12 @@ from typing import Any
 import gradio as gr
 from PIL import Image
 from src.agents.mise_en_place import identify_ingredients
-from src.agents.progress_validator import validate
-from src.agents.recipe_planner import plan_recipe, propose_dishes
-from src.agents.step_illustrator import illustrate_recipe
-from src.data.nutrition import compute_nutrition
 from src.ui.components import (
     DishOptions,
     IngredientChips,
@@ -19,265 +19,135 @@ from src.ui.components import (
     RecipeHero,
     StepCard,
     VerdictBadge,
 )
 from src.ui.theme import CSS, theme
-from src.ui.components import DishOptions, IngredientChips, NutritionGrid, RecipeHero, StepCard, VerdictBadge
-# ---------------------------------------------------------------------------
-# Callbacks
-# ---------------------------------------------------------------------------
-def _clean_ingredients(items: list | None) -> list[str]:
-    """Normalize a raw ingredient list (dedup, lowercase, strip empties)."""
-    out, seen = [], set()
-    for it in (items or []):
-        name = str(it).strip().lower()
-        if name and name not in seen:
-            seen.add(name)
-            out.append(name)
-    return out
-def on_propose(fridge_image: Image.Image | None, state: dict | None):
-    """Photo → ingredients → 3 dish options (and fill the editable list)."""
     state = state or {}
-    if fridge_image is None:
-        return (
-            IngredientChips.render({}),
-            DishOptions.render({}),
-            gr.update(choices=[], value=None),
-            state,
-            gr.update(choices=[], value=[]),
-        )
     ingredients = identify_ingredients(fridge_image)
-    options = propose_dishes(ingredients)
-    state.update({
-        "ingredients_have": ingredients,
-        "options": [o.model_dump() for o in options],
-    })
-    radio_choices = [o.name for o in options]
-    return (
-        IngredientChips.render({"have": ingredients, "missing": []}),
-        DishOptions.render({"options": state["options"]}),
-        gr.update(choices=radio_choices, value=radio_choices[0] if radio_choices else None),
-        state,
-        gr.update(choices=ingredients, value=ingredients),
-    )
-def on_update_ingredients(state: dict | None, ingredients: list | None):
-    """Manual edit of the ingredient list → refresh chips + re-propose dishes."""
-    state = state or {}
-    ingredients = _clean_ingredients(ingredients)
-    state["ingredients_have"] = ingredients
-    if not ingredients:
-        state["options"] = []
-        return (
-            IngredientChips.render({}),
-            DishOptions.render({}),
-            gr.update(choices=[], value=None),
-            state,
-        )
-    options = propose_dishes(ingredients)
-    state["options"] = [o.model_dump() for o in options]
-    radio_choices = [o.name for o in options]
-    return (
-        IngredientChips.render({"have": ingredients, "missing": []}),
-        DishOptions.render({"options": state["options"]}),
-        gr.update(choices=radio_choices, value=radio_choices[0] if radio_choices else None),
-        state,
-    )
-def on_cook(state: dict | None, dish_name: str | None, illustrate: bool, ingredients: list | None):
-    """Chosen dish → full recipe + nutrition (+ FLUX images if requested)."""
-    state = state or {}
-    if not dish_name:
-        return (
-            RecipeHero.render({}),
-            StepCard.render({}),
-            NutritionGrid.render({"nutrition": {}}),
-            state,
-        )
-    # Prefer the (possibly hand-edited) ingredient list from the editor.
-    ingredients = _clean_ingredients(ingredients) or state.get("ingredients_have", [])
-    state["ingredients_have"] = ingredients
-    recipe = plan_recipe(dish_name, ingredients)
-    nutrition = compute_nutrition(ingredients, recipe.servings)
-    recipe.nutrition = nutrition
-    state["recipe"] = recipe.model_dump()
-    if illustrate:
-        log.info("Generating FLUX step images via Modal...")
-        recipe = illustrate_recipe(recipe)
-        state["recipe"] = recipe.model_dump()
-    return (
-        RecipeHero.render(recipe.model_dump()),
-        StepCard.render({"steps": [s.model_dump() for s in recipe.steps]}),
-        NutritionGrid.render({"nutrition": nutrition}),
-        state,
-    )
-def on_validate(state: dict | None, step_idx: float, progress_image: Image.Image | None):
-    """Progress photo + step number → verdict badge."""
-    state = state or {}
-    recipe = state.get("recipe", {})
-    steps = recipe.get("steps", [])
-    idx = max(0, int(step_idx) - 1)
-    instruction = steps[idx]["instruction"] if idx < len(steps) else "Cook the dish properly."
-    result = validate(progress_image, instruction)
-    return VerdictBadge.render(result)
-# ---------------------------------------------------------------------------
-# UI
-# ---------------------------------------------------------------------------
 def build_ui() -> gr.Blocks:
     initial_state: dict[str, Any] = {}
-    with gr.Blocks(title="Cook With Me", theme=theme, css=CSS) as demo:
         gr.Markdown(
             "# 🍲 Cook With Me\n"
-            "_Snap your fridge · Pick a dish · Cook step by step · Check your progress._"
         )
         state = gr.State(initial_state)
         with gr.Tabs():
-            # ----------------------------------------------------------------
-            # Tab 1 — Cook
-            # ----------------------------------------------------------------
-            with gr.Tab("🍳 Cook"):
                 with gr.Row():
-                    # Left — inputs
                     with gr.Column(scale=1):
                         fridge_input = gr.Image(
                             label="📸 Photo of your fridge or pantry",
                             type="pil",
-                            height=300,
                         )
-                        propose_btn = gr.Button("🔍 What can I cook?", variant="primary")
                         gr.Markdown("### Ingredients I see")
                         chips = gr.HTML(IngredientChips.render({}))
-                        ingredient_editor = gr.Dropdown(
-                            choices=[],
-                            value=[],
-                            multiselect=True,
-                            allow_custom_value=True,
-                            label="✏️ Add or remove ingredients (type + Enter to add, ✕ to remove)",
-                            interactive=True,
-                        )
-                        update_btn = gr.Button("🔄 Update ingredients & dishes")
                         gr.Markdown("### Pick a dish")
-                        dish_options_html = gr.HTML(DishOptions.render({}))
-                        dish_radio = gr.Radio(
-                            choices=[],
-                            label="Choose one",
-                            interactive=True,
-                        )
-                        with gr.Accordion("⚙️ Generation options", open=False):
-                            illustrate_chk = gr.Checkbox(
-                                value=False,
-                                label="🎨 Generate step images with FLUX.2 (requires Modal deployment)",
-                            )
-                        cook_btn = gr.Button("👨‍🍳 Build my recipe", variant="primary")
-                    # Right — recipe output
                     with gr.Column(scale=2):
                         hero = gr.HTML(RecipeHero.render({}))
                         steps_panel = gr.HTML(StepCard.render({}))
                         nutrition_panel = gr.HTML(NutritionGrid.render({"nutrition": {}}))
-            # ----------------------------------------------------------------
-            # Tab 2 — Check Progress
-            # ----------------------------------------------------------------
-            with gr.Tab("📷 Check Progress"):
-                gr.Markdown(
-                    "Upload a photo of your pan or plate. The vision model compares it "
-                    "against the current recipe step and tells you if you can move on."
-                )
                 with gr.Row():
                     with gr.Column():
                         step_idx = gr.Number(value=1, precision=0, label="Active step #")
-                        progress_input = gr.Image(
-                            label="📸 Your pan / plate",
-                            type="pil",
-                            height=300,
-                        )
-                        validate_btn = gr.Button("✅ How am I doing?", variant="primary")
                     with gr.Column():
                         verdict_panel = gr.HTML(VerdictBadge.render({}))
-            # ----------------------------------------------------------------
-            # Tab 3 — About
-            # ----------------------------------------------------------------
-            with gr.Tab("ℹ️ About"):
                 gr.Markdown(
                     """
-### How it works
-1. **Snap** your fridge — the fine-tuned vision model (MiniCPM-V-4.6) identifies every ingredient.
-2. **Pick** one of three AI-suggested dishes tailored to what you have.
-3. **Cook** step by step with a generated recipe, per-serving nutrition, and optional FLUX.2 step images.
-4. **Check** your progress — upload a photo of your pan and get a *go / wait / fix* verdict.
-### Models
-| Role | Model | Params |
-|---|---|---|
-| Vision (ingredients + validator) | `openbmb/MiniCPM-V-4.6` (fine-tuned) | ~4.6B |
-| Recipe Planner | `openbmb/MiniCPM4.1-8B` (fine-tuned on Kaggle recipes) | ~8B |
-| Step Illustrator | `FLUX.2-klein-9B` via Modal | ~9B |
-**Total ≤ 21.6B params** (cap: 32B ✓)
-### Badges targeted
-✓ Well-Tuned · ✓ Off-Brand · ✓ Sharing is Caring · ✓ Field Notes
-### Hackathon
-Hugging Face Small Models / Big Adventures · June 2026 · Track: Backyard AI
                     """
                 )
-        # --------------------------------------------------------------------
-        # Wire callbacks
-        # --------------------------------------------------------------------
         propose_btn.click(
             fn=on_propose,
             inputs=[fridge_input, state],
-            outputs=[chips, dish_options_html, dish_radio, state, ingredient_editor],
-        )
-        update_btn.click(
-            fn=on_update_ingredients,
-            inputs=[state, ingredient_editor],
-            outputs=[chips, dish_options_html, dish_radio, state],
-        )
-        cook_btn.click(
-            fn=on_cook,
-            inputs=[state, dish_radio, illustrate_chk, ingredient_editor],
-            outputs=[hero, steps_panel, nutrition_panel, state],
-        )
-        validate_btn.click(
-            fn=on_validate,
-            inputs=[state, step_idx, progress_input],
-            outputs=[verdict_panel],
         )
     return demo
@@ -289,4 +159,6 @@ if __name__ == "__main__":
         server_port=int(__import__("os").environ.get("PORT", 7860)),
         show_error=True,
         inbrowser=True,
-    )

 import logging
 log = logging.getLogger(__name__)
 from typing import Any
 import gradio as gr
 from PIL import Image
+# from src import config
 from src.agents.mise_en_place import identify_ingredients
+# from src.agents.progress_validator import validate
+# from src.agents.recipe_planner import plan_recipe, propose_dishes
+# from src.data.nutrition import compute_nutrition
+# from src.pipeline import Recipe
 from src.ui.components import (
     DishOptions,
     IngredientChips,
     RecipeHero,
     StepCard,
     VerdictBadge,
+    recipe_to_state,
 )
 from src.ui.theme import CSS, theme
+def on_propose(fridge_image: Image.Image | None, state: dict | None) -> tuple[str, str, list[str], dict]:
+    """Photo → ingredients → 3 dish options."""
     state = state or {}
     ingredients = identify_ingredients(fridge_image)
+    # options = propose_dishes(ingredients)
+    # state.update({
+    #     "ingredients_have": ingredients,
+    #     "ingredients_missing": [],
+    #     "options": [o.model_dump() for o in options],
+    # })
+    chips_html = IngredientChips.render({"have": ingredients, "missing": []})
+    log.info(ingredients)
+    # options_html = DishOptions.render({"options": state["options"]})
+    # radio_choices = [o.name for o in options]
+    # return chips_html, options_html, gr.update(choices=radio_choices, value=radio_choices[0] if radio_choices else None), state
+    return chips_html
+# ----------------
+# UI definition
+# ----------------
 def build_ui() -> gr.Blocks:
     initial_state: dict[str, Any] = {}
+    with gr.Blocks(title="Cook With Me") as demo:
         gr.Markdown(
             "# 🍲 Cook With Me\n"
+            "_A multimodal sous-chef. See it. Plan it. Show it. Cook it._"
         )
         state = gr.State(initial_state)
         with gr.Tabs():
+            # --- Tab 1: Cook ------------------------------------------------
+            with gr.Tab("Cook"):
                 with gr.Row():
                     with gr.Column(scale=1):
                         fridge_input = gr.Image(
                             label="📸 Photo of your fridge or pantry",
                             type="pil",
+                            height=320,
                         )
+                        propose_btn = gr.Button("What can I cook?", variant="primary")
                         gr.Markdown("### Ingredients I see")
                         chips = gr.HTML(IngredientChips.render({}))
                         gr.Markdown("### Pick a dish")
+                        options = gr.HTML(DishOptions.render({}))
+                        dish_radio = gr.Radio(choices=[], label="Choose one", interactive=True)
+                        with gr.Accordion("Generation options", open=False):
+                            illustrate_chk = gr.Checkbox(value=False, label="Render step images (FLUX, slow on CPU)")
+                            narrate_chk = gr.Checkbox(value=False, label="Generate voice narration (VoxCPM2)")
+                        cook_btn = gr.Button("Build recipe", variant="primary")
                     with gr.Column(scale=2):
                         hero = gr.HTML(RecipeHero.render({}))
                         steps_panel = gr.HTML(StepCard.render({}))
                         nutrition_panel = gr.HTML(NutritionGrid.render({"nutrition": {}}))
+            # --- Tab 2: Check Progress -------------------------------------
+            with gr.Tab("Check Progress"):
+                gr.Markdown("Upload a photo of your pan or plate; the same vision model that planned your recipe will compare it against the target step.")
                 with gr.Row():
                     with gr.Column():
                         step_idx = gr.Number(value=1, precision=0, label="Active step #")
+                        progress_input = gr.Image(label="📸 Your pan / plate", type="pil", height=320)
+                        validate_btn = gr.Button("How am I doing?", variant="primary")
                     with gr.Column():
                         verdict_panel = gr.HTML(VerdictBadge.render({}))
+                        verdict_audio = gr.Audio(label="Tip (voice)", autoplay=False)
+            # --- Tab 3: About ----------------------------------------------
+            with gr.Tab("About"):
                 gr.Markdown(
                     """
+                    ### Models
+                    - **Vision** — `openbmb/MiniCPM-V-4_6-gguf` via `llama-cpp-python` (~4.6B)
+                    - **Planner** — `openbmb/MiniCPM-V-4-gguf` via `llama-cpp-python` (~4B)
+                    - **Illustrator** — `black-forest-labs/FLUX.2-klein-9B` via `diffusers` (9B)
+                    - **Narrator** — `openbmb/VoxCPM2` via `transformers` (~1B)
+                    - **Retrieval** — `sentence-transformers/all-MiniLM-L6-v2` (22M)
+                    **Total ≈ 18.6B params** (≤ 32B requirement ✓).
+                    ### Pipeline
+                    ```
+                    Fridge photo → Vision → ingredients
+                                            │
+                                            ▼
+                                    Planner (+ Kaggle retrieval) → Recipe JSON
+                                            │
+                                            ▼
+                                    Illustrator (FLUX) → hero + per-step images
+                                            │
+                                            ▼
+                                    Narrator (VoxCPM2) → MP3 per step
+                                            │
+                                            ▼
+                    Progress photo → Validator (same vision model) → go|wait|fix
+                    ```
+                    ### Badges targeted
+                    ✓ Llama Champion · ✓ Well-Tuned · ✓ Off-Brand · ✓ Sharing is Caring · ✓ Field Notes
                     """
                 )
+        # Wire callbacks ----------------------------------------------------
         propose_btn.click(
             fn=on_propose,
             inputs=[fridge_input, state],
+            # outputs=[chips, options, dish_radio, state],
+            outputs=[chips],
         )
+        # cook_btn.click(
+        #     fn=on_pick_dish,
+        #     inputs=[state, dish_radio, illustrate_chk, narrate_chk],
+        #     outputs=[hero, steps_panel, nutrition_panel, chips, state],
+        # )
+        # validate_btn.click(
+        #     fn=on_validate,
+        #     inputs=[state, step_idx, progress_input],
+        #     outputs=[verdict_panel, verdict_audio],
+        # )
     return demo
         server_port=int(__import__("os").environ.get("PORT", 7860)),
         show_error=True,
         inbrowser=True,
+        theme=theme,
+        css=CSS
+    )

modal_app/__init__.py DELETED Viewed

File without changes

modal_app/flux_endpoint.py DELETED Viewed

@@ -1,124 +0,0 @@
-"""Modal FLUX.2 Klein endpoint.
-Deploy once with:
-    modal deploy modal_app/flux_endpoint.py
-Then the HF Space calls it via modal.Function.lookup().
-"""
-import io
-import modal
-# ---------------------------------------------------------------------------
-# App & image
-# ---------------------------------------------------------------------------
-app = modal.App("cook-with-me-flux")
-image = (
-    modal.Image.debian_slim(python_version="3.12")
-    .pip_install(
-        "torch==2.7.0",          # >=2.5 needed: diffusers custom-op schema uses PEP604 unions
-        "torchvision==0.22.0",   # matches torch 2.7.0; silences diffusers image-processor fallback
-        "diffusers>=0.38",       # FLUX.2 support
-        "transformers>=4.45",
-        "accelerate",
-        "safetensors",
-        "Pillow",
-        "huggingface_hub>=1.17",
-        "sentencepiece",
-    )
-)
-# HF token secret so Modal can pull gated/private model weights
-hf_secret = modal.Secret.from_name("huggingface-secret")
-# Tried in order. FLUX models are gated (need license acceptance on HF);
-# SDXL-Turbo is public and always works, so it's the guaranteed fallback.
-FLUX_MODEL = "black-forest-labs/FLUX.2-klein-9B"
-FLUX_FALLBACK = "black-forest-labs/FLUX.1-schnell"
-SDXL_TURBO = "stabilityai/sdxl-turbo"   # non-gated, fast (1-2 steps)
-# ---------------------------------------------------------------------------
-# GPU class
-# ---------------------------------------------------------------------------
-@app.cls(
-    image=image,
-    gpu="L4",
-    scaledown_window=180,   # keep warm 3 min after last request
-    secrets=[hf_secret],
-)
-class FluxKlein:
-    @modal.enter()
-    def load(self):
-        import torch
-        dtype = torch.bfloat16
-        self.steps = 4
-        # 1) FLUX.2-klein (gated) ------------------------------------------------
-        try:
-            from diffusers import FluxPipeline
-            self.pipe = FluxPipeline.from_pretrained(FLUX_MODEL, torch_dtype=dtype).to("cuda")
-            self.guidance, self.steps, self.backend = 1.0, 4, "FLUX.2-klein-9B"
-            print(f"Loaded {self.backend}")
-            return
-        except Exception as e:
-            print(f"FLUX.2-klein unavailable ({type(e).__name__}); trying FLUX.1-schnell...")
-        # 2) FLUX.1-schnell (gated) ---------------------------------------------
-        try:
-            from diffusers import FluxPipeline
-            self.pipe = FluxPipeline.from_pretrained(FLUX_FALLBACK, torch_dtype=dtype).to("cuda")
-            self.guidance, self.steps, self.backend = 0.0, 4, "FLUX.1-schnell"
-            print(f"Loaded {self.backend}")
-            return
-        except Exception as e:
-            print(f"FLUX.1-schnell unavailable ({type(e).__name__}); falling back to SDXL-Turbo...")
-        # 3) SDXL-Turbo (public, always works) ----------------------------------
-        from diffusers import AutoPipelineForText2Image
-        self.pipe = AutoPipelineForText2Image.from_pretrained(
-            SDXL_TURBO, torch_dtype=torch.float16, variant="fp16"
-        ).to("cuda")
-        self.guidance, self.steps, self.backend = 0.0, 2, "SDXL-Turbo"
-        print(f"Loaded {self.backend}")
-    @modal.method()
-    def render_step(self, prompt: str, seed: int = 42) -> bytes:
-        """Generate a 512×512 PNG and return its raw bytes."""
-        import torch
-        img = self.pipe(
-            prompt=prompt,
-            height=512,
-            width=512,
-            guidance_scale=self.guidance,
-            num_inference_steps=self.steps,
-            generator=torch.Generator(device="cuda").manual_seed(seed),
-        ).images[0]
-        buf = io.BytesIO()
-        img.save(buf, format="PNG")
-        return buf.getvalue()
-# ---------------------------------------------------------------------------
-# Local test entrypoint
-# ---------------------------------------------------------------------------
-@app.local_entrypoint()
-def test():
-    import os
-    flux = FluxKlein()
-    png = flux.render_step.remote(
-        "Top-down photo of a kitchen pan with sautéed onions. "
-        "Mexican cooking. Warm lighting. Photorealistic.",
-        seed=0,
-    )
-    out = os.path.join(os.path.dirname(__file__), "..", "data", "test_flux.png")
-    out = os.path.abspath(out)
-    os.makedirs(os.path.dirname(out), exist_ok=True)
-    with open(out, "wb") as f:
-        f.write(png)
-    print(f"Saved {out} ({len(png)} bytes)")

modal_app/planner_endpoint.py DELETED Viewed

@@ -1,117 +0,0 @@
-"""Modal endpoint for the fine-tuned MiniCPM4.1-8B recipe planner.
-Runs in its OWN container because MiniCPM4.1's custom code requires
-transformers 4.x (CacheLayerMixin + is_torch_fx_available), which conflicts
-with the MiniCPM-V-4.6 vision model in the main app (needs transformers 5.x).
-Deploy:
-    modal deploy modal_app/planner_endpoint.py
-The Gradio app calls it via modal.Cls.from_name("cook-with-me-planner",
-"Planner").infer.remote(prompt, ...).
-"""
-from __future__ import annotations
-import os
-import modal
-app = modal.App("cook-with-me-planner")
-# 8B bf16 weights cached on a volume so cold starts don't re-download ~16GB.
-hf_cache = modal.Volume.from_name("cook-with-me-planner-cache", create_if_missing=True)
-hf_secret = modal.Secret.from_name("huggingface-secret")
-image = (
-    modal.Image.debian_slim(python_version="3.12")
-    .pip_install(
-        "torch==2.4.0",
-        # MiniCPM4.1 custom code needs BOTH CacheLayerMixin (>=4.54) and
-        # is_torch_fx_available (removed in 5.0) — only 4.54..4.x has both.
-        "transformers>=4.54,<5.0",
-        "huggingface_hub>=0.26,<1.0",
-        "accelerate",
-        "sentencepiece",
-        "safetensors",
-    )
-    .env({"HF_HOME": "/cache/hf"})
-)
-# Fine-tuned weights; tokenizer pulled from base (FT tokenizer_config was saved
-# by transformers 5.x and is not readable by 4.x).
-PLANNER_REPO = os.environ.get("COOK_WITH_ME_PLANNER_FT_REPO", "eldinosaur/cook-with-me-planner-8b")
-BASE_REPO = "openbmb/MiniCPM4.1-8B"
-@app.cls(
-    image=image,
-    gpu="L4",
-    volumes={"/cache": hf_cache},
-    secrets=[hf_secret],
-    scaledown_window=240,
-    timeout=600,
-)
-class Planner:
-    @modal.enter()
-    def load(self):
-        import torch
-        from transformers import AutoModelForCausalLM, AutoTokenizer
-        print(f"Loading planner weights from {PLANNER_REPO}...")
-        self.tokenizer = AutoTokenizer.from_pretrained(BASE_REPO, trust_remote_code=True)
-        if self.tokenizer.pad_token is None:
-            self.tokenizer.pad_token = self.tokenizer.eos_token
-        self.model = AutoModelForCausalLM.from_pretrained(
-            PLANNER_REPO,
-            torch_dtype=torch.bfloat16,
-            trust_remote_code=True,
-            device_map="cuda",
-        ).eval()
-        print("Planner ready.")
-    @modal.method()
-    def infer(self, prompt: str, max_new_tokens: int = 1024, temperature: float = 0.0) -> str:
-        import torch
-        messages = [{"role": "user", "content": prompt}]
-        # enable_thinking=False -> direct JSON, no <think> reasoning preamble
-        try:
-            enc = self.tokenizer.apply_chat_template(
-                messages,
-                add_generation_prompt=True,
-                tokenize=True,
-                return_tensors="pt",
-                return_dict=True,
-                enable_thinking=False,
-            )
-        except TypeError:
-            enc = self.tokenizer.apply_chat_template(
-                messages, add_generation_prompt=True, tokenize=True,
-                return_tensors="pt", return_dict=True,
-            )
-        input_ids = enc["input_ids"].to(self.model.device)
-        input_len = input_ids.shape[1]
-        gen_inputs = {"input_ids": input_ids}
-        if enc.get("attention_mask") is not None:
-            gen_inputs["attention_mask"] = enc["attention_mask"].to(self.model.device)
-        gen_kwargs = dict(max_new_tokens=max_new_tokens, repetition_penalty=1.05)
-        if temperature and temperature > 0:
-            gen_kwargs.update(do_sample=True, temperature=temperature, top_p=0.9)
-        else:
-            gen_kwargs.update(do_sample=False)
-        with torch.no_grad():
-            out = self.model.generate(**gen_inputs, **gen_kwargs)
-        return self.tokenizer.decode(out[0][input_len:], skip_special_tokens=True)
-@app.local_entrypoint()
-def test():
-    prompt = (
-        "You are a creative chef. Available ingredients: tomato, onion, garlic, pasta, olive oil.\n"
-        'Respond ONLY with JSON: {"options": [{"name": "...", "why": "..."}, {"name": "...", "why": "..."}, {"name": "...", "why": "..."}]}'
-    )
-    out = Planner().infer.remote(prompt, max_new_tokens=400)
-    print("OUTPUT:\n", out)

modal_app/serve_app.py DELETED Viewed

@@ -1,102 +0,0 @@
-"""Serve the full Cook With Me Gradio app on Modal GPU.
-This gives a permanent public URL (*.modal.run) that runs the real models:
-  - MiniCPM-V-4.6  (vision: ingredients + progress validation)
-  - MiniCPM4.1-8B  (planner: dish proposals + recipes)
-  - FLUX.2-klein   (step images, via the separate cook-with-me-flux endpoint)
-Deploy with:
-    modal deploy modal_app/serve_app.py
-Or run a temporary dev session (auto-stops on Ctrl-C):
-    modal serve modal_app/serve_app.py
-Both models live in one A100-40GB container (~25GB VRAM total).
-Set the fine-tuned planner repo via the COOK_WITH_ME_PLANNER_FT_REPO env
-on the Modal function once training finishes.
-"""
-from __future__ import annotations
-from pathlib import Path
-import modal
-LOCAL_ROOT = Path(__file__).resolve().parent.parent
-REMOTE_ROOT = "/root/cook"
-app = modal.App("cook-with-me-app")
-# HF model cache persisted across restarts (avoids re-downloading ~25GB)
-hf_cache = modal.Volume.from_name("cook-with-me-hf-cache", create_if_missing=True)
-hf_secret = modal.Secret.from_name("huggingface-secret")
-image = (
-    modal.Image.debian_slim(python_version="3.12")
-    .pip_install(
-        "torch==2.4.0",
-        "torchvision==0.19.0",
-        "transformers>=5.0",
-        "accelerate",
-        "safetensors",
-        "sentencepiece",
-        "Pillow",
-        "av",
-        "pydantic>=2",
-        "gradio==6.15.2",
-        "huggingface_hub>=1.17",
-        "modal",
-    )
-    .env({
-        "COOK_WITH_ME_CACHE": "/cache/cook",
-        # Use the fine-tuned planner pushed by scripts/train_planner.py
-        "COOK_WITH_ME_PLANNER_FT_REPO": "eldinosaur/cook-with-me-planner-8b",
-    })
-    .add_local_dir(
-        str(LOCAL_ROOT),
-        REMOTE_ROOT,
-        ignore=[
-            "data/*", ".git/*", "**/__pycache__", "**/*.pyc",
-            "assets/*", ".venv/*", "venv/*",
-        ],
-    )
-)
-@app.function(
-    image=image,
-    gpu="L40S",
-    secrets=[hf_secret],
-    volumes={"/cache": hf_cache},
-    timeout=3600,
-    scaledown_window=300,   # stay warm 5 min after last request
-    max_containers=1,
-)
-@modal.concurrent(max_inputs=20)
-@modal.asgi_app()
-def serve():
-    import os
-    import sys
-    import types
-    # --- env: cache model downloads on the volume, before any HF import ---
-    os.environ["HF_HOME"] = "/cache/hf"
-    os.environ.setdefault("HF_HUB_ENABLE_HF_TRANSFER", "0")
-    # --- mock `spaces` so @spaces.GPU becomes a no-op (we're already on GPU) ---
-    spaces_mock = types.ModuleType("spaces")
-    spaces_mock.GPU = lambda *a, **k: (lambda fn: fn)
-    sys.modules["spaces"] = spaces_mock
-    # --- make the mounted project importable ---
-    sys.path.insert(0, REMOTE_ROOT)
-    import gradio as gr
-    from fastapi import FastAPI
-    # Importing app triggers the vision model load (module-level singleton).
-    from app import build_ui
-    demo = build_ui()
-    demo.queue(max_size=20)
-    fastapi_app = FastAPI()
-    return gr.mount_gradio_app(app=fastapi_app, blocks=demo, path="/")

packages.txt DELETED Viewed

	@@ -1,2 +0,0 @@
1	- ffmpeg
2	- libsndfile1

requirements.txt CHANGED Viewed

@@ -1,22 +1,15 @@
-# --- UI ---
 gradio==6.15.2
 huggingface_hub>=1.17
-pydantic>=2
-# --- Vision model (MiniCPM-V-4.6 runs on the Space's ZeroGPU) ---
-# transformers>=5.0 required: MiniCPMV4_6ForConditionalGeneration is a native
-# class added in the 5.x line.
 torch
 torchvision
-transformers>=5.0
-accelerate
-safetensors
-sentencepiece
-av
 spaces
 Pillow
-# --- Remote model calls ---
-# The recipe planner (MiniCPM4.1-8B) and step illustrator (FLUX/SDXL) run on
-# separate Modal apps; the Space calls them via the `modal` client.
-modal

+# --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
+# llama-cpp-python
 gradio==6.15.2
 huggingface_hub>=1.17
+# --- Librerías añadidas y desbloqueadas para MiniCPM-V-4.6 ---
 torch
 torchvision
 spaces
 Pillow
+transformers>=4.45
+accelerate
+safetensors
+av

scripts/build_recipe_dataset.py DELETED Viewed

@@ -1,281 +0,0 @@
-"""Build the SFT dataset for the MiniCPM4.1-8B recipe planner.
-Reads the Kaggle "better-recipes-for-a-better-life" dataset and produces
-supervised fine-tuning pairs for BOTH planner tasks, matching the exact
-prompt formats the app uses (src/prompts/planner_propose.txt and
-planner_recipe.txt):
-  1. propose  : ingredients -> {"options": [{name, why} x3]}
-  2. recipe   : dish + ingredients -> {"name", "cuisine", "servings",
-                "total_time_minutes", "final_dish_visual", "steps":[...]}
-Run locally (once) before fine-tuning:
-    python scripts/build_recipe_dataset.py
-Requires:
-    pip install kagglehub pandas pyarrow datasets huggingface_hub tqdm
-    ~/.kaggle/kaggle.json with your credentials
-"""
-from __future__ import annotations
-import json
-import random
-import re
-import sys
-from pathlib import Path
-ROOT = Path(__file__).resolve().parent.parent
-sys.path.insert(0, str(ROOT))
-import pandas as pd
-from tqdm import tqdm
-from src import config
-random.seed(42)
-HF_DATASET_REPO = "eldinosaur/cook-with-me-recipes-sft"
-# ---------------------------------------------------------------------------
-# 1. Download (use ONLY recipes.csv — test_recipes.csv has a different schema
-#    whose capitalized columns shadowed the real data in the old version)
-# ---------------------------------------------------------------------------
-print("Pulling Kaggle dataset…")
-import kagglehub
-raw_path = Path(kagglehub.dataset_download(config.KAGGLE_DATASET))
-main_csv = raw_path / "recipes.csv"
-print(f"Reading {main_csv}")
-# cp1252 decodes the fraction/symbol bytes that show up as � under utf-8
-try:
-    raw_df = pd.read_csv(main_csv, encoding="cp1252", on_bad_lines="skip")
-except Exception:
-    raw_df = pd.read_csv(main_csv, encoding="utf-8", on_bad_lines="skip")
-print(f"Rows: {len(raw_df)}  columns: {list(raw_df.columns)}")
-# ---------------------------------------------------------------------------
-# 2. Cleaning helpers
-# ---------------------------------------------------------------------------
-_UNIT = (
-    r"(cups?|tablespoons?|tbsps?|teaspoons?|tsps?|pounds?|lbs?|ounces?|ozs?|"
-    r"grams?|kgs?|mls?|liters?|pinch(?:es)?|dash(?:es)?|cloves?|cans?|"
-    r"packages?|pkgs?|sheets?|slices?|sticks?|quarts?|pints?|jars?|bunch(?:es)?|"
-    r"heads?|stalks?|sprigs?|pieces?|fillets?)"
-)
-_PREP_WORDS = {
-    "peeled", "chopped", "diced", "sliced", "minced", "cored", "thawed",
-    "drained", "rinsed", "softened", "melted", "beaten", "divided", "cubed",
-    "to taste", "optional", "or more", "plus more", "for garnish", "for serving",
-    "lightly beaten", "room temperature", "at room temperature", "finely chopped",
-    "thinly sliced", "cut into", "more", "and", "or other", "such as",
-}
-def _clean_text(val: str) -> str:
-    if not isinstance(val, str):
-        return ""
-    # drop any remaining replacement chars and collapse whitespace
-    val = val.replace("�", " ")
-    return re.sub(r"[ \t]+", " ", val).strip()
-def _simplify_ingredient(raw: str) -> str:
-    s = re.sub(r"\([^)]*\)", "", raw)             # remove parentheticals
-    s = _clean_text(s).lower()
-    s = re.sub(r"^[\d\s./¼½¾⅓⅔⅛+-]+", "", s)       # leading quantities
-    s = re.sub(rf"^{_UNIT}\b\.?\s*", "", s)         # leading unit word
-    s = re.sub(r"^(of|the|a|an)\s+", "", s)
-    s = s.split(",")[0]                              # drop trailing prep clause
-    s = re.sub(r"[^a-z\s-]", "", s)                  # keep letters only
-    s = re.sub(r"\s+", " ", s).strip()
-    return s
-def _ingredient_list(raw: str) -> list[str]:
-    if not isinstance(raw, str):
-        return []
-    out, seen = [], set()
-    for part in raw.split(","):
-        name = _simplify_ingredient(part)
-        if not name or len(name) < 3 or len(name.split()) > 4:
-            continue
-        if name in _PREP_WORDS or name in seen:
-            continue
-        seen.add(name)
-        out.append(name)
-    return out
-def _steps_from_directions(raw: str) -> list[str]:
-    if not isinstance(raw, str):
-        return []
-    raw = _clean_text(raw.replace("\r", "\n"))
-    # Prefer explicit newlines; otherwise split into sentences.
-    parts = [p.strip() for p in raw.split("\n") if p.strip()]
-    if len(parts) < 2:
-        parts = [p.strip() for p in re.split(r"(?<=[.!?])\s+(?=[A-Z])", raw) if p.strip()]
-    # merge very short fragments into the previous step
-    steps: list[str] = []
-    for p in parts:
-        if steps and len(p) < 25:
-            steps[-1] = steps[-1] + " " + p
-        else:
-            steps.append(p)
-    return [s for s in steps if len(s) > 15]
-def _minutes(row) -> int:
-    for col in ("total_time", "cook_time", "prep_time"):
-        v = row.get(col)
-        if isinstance(v, str):
-            h = re.search(r"(\d+)\s*hr", v)
-            m = re.search(r"(\d+)\s*min", v)
-            total = (int(h.group(1)) * 60 if h else 0) + (int(m.group(1)) if m else 0)
-            if total:
-                return total
-    return 0
-def _cuisine(row) -> str:
-    cp = row.get("cuisine_path")
-    if isinstance(cp, str):
-        segs = [s for s in cp.split("/") if s]
-        if segs:
-            return segs[0].replace("-", " ").strip().title()
-    return "International"
-def _distribute(total: int, n: int) -> list[int]:
-    if n <= 0:
-        return []
-    if total <= 0:
-        total = n * 6
-    base = max(2, total // n)
-    durs = [base] * n
-    durs[-1] = max(2, total - base * (n - 1))
-    return durs
-# ---------------------------------------------------------------------------
-# 3. Normalize into clean recipe records
-# ---------------------------------------------------------------------------
-recipes: list[dict] = []
-for _, r in tqdm(raw_df.iterrows(), total=len(raw_df), desc="Normalizing"):
-    name = _clean_text(r.get("recipe_name", ""))
-    ings = _ingredient_list(r.get("ingredients", ""))
-    steps = _steps_from_directions(r.get("directions", ""))
-    if not name or len(ings) < 3 or len(steps) < 2:
-        continue
-    steps = steps[:7]
-    if len(steps) < 4 and len(steps) >= 2:
-        pass  # keep short recipes too, 2-3 steps is fine
-    minutes = _minutes(r) or len(steps) * 6
-    try:
-        servings = int(float(str(r.get("servings", "2")).split()[0]))
-    except Exception:
-        servings = 2
-    servings = min(max(servings, 1), 12)
-    recipes.append({
-        "name": name,
-        "ingredients": ings[:14],
-        "steps": steps,
-        "cuisine": _cuisine(r),
-        "minutes": int(minutes),
-        "servings": servings,
-    })
-print(f"\nClean recipes: {len(recipes)}")
-config.DATA_DIR.mkdir(parents=True, exist_ok=True)
-pd.DataFrame(recipes).to_parquet(config.RECIPES_PARQUET, index=False)
-print(f"Saved -> {config.RECIPES_PARQUET}")
-# ---------------------------------------------------------------------------
-# 4. Build SFT pairs matching the app's exact prompt formats
-# ---------------------------------------------------------------------------
-PROPOSE_TMPL = (config.PROMPTS_DIR / "planner_propose.txt").read_text(encoding="utf-8")
-RECIPE_TMPL = (config.PROMPTS_DIR / "planner_recipe.txt").read_text(encoding="utf-8")
-_WHY = [
-    "Uses your {a} and {b} for a quick, satisfying result.",
-    "A fresh way to combine {a} with {b}.",
-    "Turns {a} and {b} into a comforting classic.",
-    "Light and flavorful, built around {a} and {b}.",
-    "Makes the most of {a}, {b} and a few pantry staples.",
-]
-def _recipe_json(rec: dict) -> str:
-    durs = _distribute(rec["minutes"], len(rec["steps"]))
-    steps = [
-        {"n": i + 1, "instruction": s, "duration": f"{d} min", "tip": None}
-        for i, (s, d) in enumerate(zip(rec["steps"], durs))
-    ]
-    obj = {
-        "name": rec["name"],
-        "cuisine": rec["cuisine"],
-        "servings": rec["servings"],
-        "total_time_minutes": rec["minutes"],
-        "final_dish_visual": f"A beautifully plated {rec['name'].lower()}, ready to serve.",
-        "steps": steps,
-    }
-    return json.dumps(obj, ensure_ascii=False)
-def _propose_json(rec: dict, others: list[dict]) -> str:
-    a = rec["ingredients"][0] if rec["ingredients"] else "your ingredients"
-    b = rec["ingredients"][1] if len(rec["ingredients"]) > 1 else "pantry staples"
-    options = [{"name": rec["name"], "why": random.choice(_WHY).format(a=a, b=b)}]
-    for o in others:
-        oa = o["ingredients"][0] if o["ingredients"] else a
-        ob = o["ingredients"][1] if len(o["ingredients"]) > 1 else b
-        options.append({"name": o["name"], "why": random.choice(_WHY).format(a=oa, b=ob)})
-    return json.dumps({"options": options}, ensure_ascii=False)
-sft_path = config.DATA_DIR / "recipes_sft.jsonl"
-n_recipe = n_propose = 0
-with open(sft_path, "w", encoding="utf-8") as f:
-    for idx, rec in enumerate(tqdm(recipes, desc="Building SFT")):
-        ing_str = ", ".join(rec["ingredients"])
-        # --- recipe task ---
-        user_recipe = RECIPE_TMPL.replace("{dish_name}", rec["name"]).replace("{ingredients}", ing_str)
-        f.write(json.dumps({"messages": [
-            {"role": "user", "content": user_recipe},
-            {"role": "assistant", "content": _recipe_json(rec)},
-        ]}, ensure_ascii=False) + "\n")
-        n_recipe += 1
-        # --- propose task (use two other recipes as alternative options) ---
-        others = [recipes[(idx + 7) % len(recipes)], recipes[(idx + 53) % len(recipes)]]
-        user_propose = PROPOSE_TMPL.replace("{ingredients}", ing_str)
-        f.write(json.dumps({"messages": [
-            {"role": "user", "content": user_propose},
-            {"role": "assistant", "content": _propose_json(rec, others)},
-        ]}, ensure_ascii=False) + "\n")
-        n_propose += 1
-print(f"\nSFT pairs: {n_recipe} recipe + {n_propose} propose = {n_recipe + n_propose} -> {sft_path}")
-# ---------------------------------------------------------------------------
-# 5. Push to HF Hub
-# ---------------------------------------------------------------------------
-if HF_DATASET_REPO:
-    from datasets import load_dataset
-    ds = load_dataset("json", data_files=str(sft_path), split="train")
-    ds.push_to_hub(HF_DATASET_REPO)
-    print(f"Pushed {len(ds)} rows to {HF_DATASET_REPO}")
-print("\nDone.")

scripts/diag_planner.py DELETED Viewed

@@ -1,73 +0,0 @@
-"""Diagnose why the fine-tuned planner produces empty generations.
-    modal run scripts/diag_planner.py
-"""
-import modal
-app = modal.App("cook-with-me-diag")
-image = (
-    modal.Image.debian_slim(python_version="3.12")
-    .pip_install(
-        "torch==2.4.0",
-        "transformers>=4.54,<5.0",        # window with BOTH CacheLayerMixin and is_torch_fx_available
-        "huggingface_hub>=0.26,<1.0",
-        "accelerate",
-        "sentencepiece",
-    )
-)
-hf_secret = modal.Secret.from_name("huggingface-secret")
-MODEL_ID = "eldinosaur/cook-with-me-planner-8b"   # fine-tuned model under transformers 4.x
-@app.function(image=image, gpu="L4", secrets=[hf_secret], timeout=900)
-def diag():
-    import torch
-    import transformers
-    print("transformers version:", transformers.__version__)
-    from transformers import AutoModelForCausalLM, AutoTokenizer
-    print("Loading tokenizer (from base) + model (from FT)...")
-    tok = AutoTokenizer.from_pretrained("openbmb/MiniCPM4.1-8B", trust_remote_code=True)
-    model = AutoModelForCausalLM.from_pretrained(
-        MODEL_ID, torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="cuda"
-    ).eval()
-    print("has generate:", hasattr(model, "generate"))
-    print("class mro:", [c.__name__ for c in type(model).__mro__])
-    prompt = (
-        "You are a chef. Given ingredients: tomato, onion, garlic, pasta, olive oil.\n"
-        'Return ONLY JSON: {"options": [{"name": "...", "why": "..."}, ...]} with 3 dish ideas.'
-    )
-    messages = [{"role": "user", "content": prompt}]
-    # Mirror the fixed planner.py path
-    try:
-        enc = tok.apply_chat_template(
-            messages, add_generation_prompt=True, tokenize=True,
-            return_tensors="pt", return_dict=True,
-        )
-        input_ids = enc["input_ids"].to("cuda")
-        input_len = input_ids.shape[1]
-        gen_inputs = {"input_ids": input_ids}
-        if enc.get("attention_mask") is not None:
-            gen_inputs["attention_mask"] = enc["attention_mask"].to("cuda")
-        print("input length:", input_len)
-        with torch.no_grad():
-            out = model.generate(**gen_inputs, max_new_tokens=400, do_sample=False)
-        text = tok.decode(out[0][input_len:], skip_special_tokens=True)
-        print("=== GENERATION OK (transformers 4.x, cache on) ===")
-        print("OUTPUT:", repr(text[:1000]))
-    except Exception as e:
-        import traceback
-        print("=== GENERATION FAILED ===")
-        print("Exception type:", type(e).__name__)
-        print("Exception repr:", repr(e))
-        traceback.print_exc()
-@app.local_entrypoint()
-def main():
-    diag.remote()

scripts/train_planner.py DELETED Viewed

@@ -1,172 +0,0 @@
-"""Fine-tune MiniCPM4.1-8B on the recipe SFT dataset via Modal (A10G GPU).
-Usage:
-    modal run scripts/train_planner.py
-After training, the adapter is merged and the full model is pushed to HF Hub
-as   <HF_USERNAME>/cook-with-me-planner-8b
-Set HF_USERNAME below (or export HF_TOKEN env var before running).
-"""
-from __future__ import annotations
-import modal
-# ---------------------------------------------------------------------------
-# Config — change these two values
-# ---------------------------------------------------------------------------
-HF_USERNAME = "eldinosaur"
-SFT_DATASET_REPO = f"{HF_USERNAME}/cook-with-me-recipes-sft"
-OUTPUT_REPO = f"{HF_USERNAME}/cook-with-me-planner-8b"
-BASE_MODEL = "openbmb/MiniCPM4.1-8B"
-# ---------------------------------------------------------------------------
-app = modal.App("cook-with-me-train")
-volume = modal.Volume.from_name("cook-with-me-train-vol", create_if_missing=True)
-train_image = (
-    modal.Image.debian_slim(python_version="3.12")
-    .pip_install(
-        "torch==2.4.0",
-        "transformers>=5.0",
-        "peft>=0.12",
-        "trl>=0.10",
-        "accelerate",
-        "datasets",
-        "huggingface_hub>=1.17",
-        "bitsandbytes",
-        "sentencepiece",
-        "safetensors",
-    )
-)
-hf_secret = modal.Secret.from_name("huggingface-secret")
-@app.function(
-    image=train_image,
-    gpu="A10G",
-    timeout=60 * 60 * 3,          # 3-hour hard cap
-    secrets=[hf_secret],
-    volumes={"/vol": volume},
-)
-def train():
-    import os
-    import torch
-    from datasets import load_dataset
-    from peft import LoraConfig, get_peft_model, TaskType
-    from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
-    from trl import SFTTrainer, SFTConfig
-    os.environ.setdefault("HF_HOME", "/vol/hf_cache")
-    # MiniCPM4.1-8B custom code references is_torch_fx_available which was
-    # removed in transformers 5.x. Patch it back before loading the model.
-    import transformers.utils.import_utils as _iutils
-    if not hasattr(_iutils, "is_torch_fx_available"):
-        def _is_torch_fx_available():
-            try:
-                import torch.fx  # noqa: F401
-                return True
-            except ImportError:
-                return False
-        _iutils.is_torch_fx_available = _is_torch_fx_available
-    # ---- Load tokenizer & model ----
-    print(f"Loading {BASE_MODEL}…")
-    tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)
-    if tokenizer.pad_token is None:
-        tokenizer.pad_token = tokenizer.eos_token
-    model = AutoModelForCausalLM.from_pretrained(
-        BASE_MODEL,
-        torch_dtype=torch.bfloat16,
-        trust_remote_code=True,
-        device_map="cuda",
-    )
-    # ---- LoRA config ----
-    lora_cfg = LoraConfig(
-        task_type=TaskType.CAUSAL_LM,
-        r=16,
-        lora_alpha=32,
-        lora_dropout=0.05,
-        target_modules="all-linear",
-        bias="none",
-    )
-    model = get_peft_model(model, lora_cfg)
-    model.print_trainable_parameters()
-    # ---- Dataset ----
-    print(f"Loading dataset {SFT_DATASET_REPO}…")
-    ds = load_dataset(SFT_DATASET_REPO, split="train")
-    def _format(example):
-        return {"text": tokenizer.apply_chat_template(
-            example["messages"], tokenize=False, add_generation_prompt=False
-        )}
-    ds = ds.map(_format, remove_columns=ds.column_names)
-    # ---- Training ----
-    output_dir = "/vol/planner_out"
-    trainer = SFTTrainer(
-        model=model,
-        processing_class=tokenizer,
-        train_dataset=ds,
-        args=SFTConfig(
-            output_dir=output_dir,
-            num_train_epochs=3,   # 2046 examples — 3 epochs converges without overfitting
-            per_device_train_batch_size=2,
-            gradient_accumulation_steps=4,
-            learning_rate=2e-4,
-            lr_scheduler_type="cosine",
-            warmup_ratio=0.05,
-            bf16=True,
-            logging_steps=20,
-            save_steps=200,
-            max_length=2048,
-            dataset_text_field="text",
-        ),
-    )
-    trainer.train()
-    trainer.save_model(output_dir)
-    # ---- Merge LoRA + push ----
-    print("Merging LoRA adapter…")
-    from peft import PeftModel
-    base = AutoModelForCausalLM.from_pretrained(
-        BASE_MODEL, torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="cpu"
-    )
-    merged = PeftModel.from_pretrained(base, output_dir)
-    merged = merged.merge_and_unload()
-    # MiniCPM custom code declares `_tied_weights_keys` as a list, but
-    # transformers 5.x's save path calls `.keys()` on it. Patch the walker
-    # to tolerate both list and dict formats before saving/pushing.
-    import transformers.modeling_utils as _mu
-    def _safe_get_tied_weight_keys(model, *args, **kwargs):
-        keys = []
-        for module_name, module in model.named_modules():
-            tied = getattr(module, "_tied_weights_keys", None)
-            if not tied:
-                continue
-            names = tied.keys() if isinstance(tied, dict) else tied
-            for k in names:
-                keys.append(f"{module_name}.{k}" if module_name else k)
-        return keys
-    _mu._get_tied_weight_keys = _safe_get_tied_weight_keys
-    print(f"Pushing merged model to {OUTPUT_REPO}…")
-    merged.push_to_hub(OUTPUT_REPO, private=False)
-    tokenizer.push_to_hub(OUTPUT_REPO, private=False)
-    print("Done.")
-@app.local_entrypoint()
-def main():
-    train.remote()

src/agents/progress_validator.py DELETED Viewed

@@ -1,84 +0,0 @@
-"""Progress validation agent: compare cooking photo against target step."""
-from __future__ import annotations
-import logging
-from typing import Optional
-import spaces
-import torch
-from PIL import Image
-from src import config
-from src.agents.mise_en_place import model, processor
-from src.agents.recipe_planner import _extract_json
-log = logging.getLogger(__name__)
-_VALIDATOR_PROMPT = (config.PROMPTS_DIR / "validator_prompt.txt").read_text(encoding="utf-8")
-@spaces.GPU(duration=45)
-def validate(image: Optional[Image.Image], step_instruction: str) -> dict:
-    """Compare a cooking-progress photo to the target step description.
-    Returns a dict with keys: verdict ('go'|'wait'|'fix'), feedback, tip.
-    """
-    if image is None:
-        return {
-            "verdict": "wait",
-            "feedback": "No image provided.",
-            "tip": "Upload a photo of your cooking progress to get feedback.",
-        }
-    try:
-        img = image.convert("RGB")
-        prompt = _VALIDATOR_PROMPT.replace("{step_instruction}", step_instruction)
-        messages = [{"role": "user", "content": [
-            {"type": "image", "image": img},
-            {"type": "text", "text": prompt},
-        ]}]
-        inputs = processor.apply_chat_template(
-            messages,
-            add_generation_prompt=True,
-            tokenize=True,
-            return_dict=True,
-            return_tensors="pt",
-            enable_thinking=False,
-            processor_kwargs={"downsample_mode": "16x", "max_slice_nums": 9, "use_image_id": True},
-        )
-        device = model.device
-        inputs = {k: v.to(device) if isinstance(v, torch.Tensor) else v for k, v in inputs.items()}
-        for k, v in inputs.items():
-            if isinstance(v, torch.Tensor) and torch.is_floating_point(v):
-                inputs[k] = v.to(dtype=torch.bfloat16)
-        with torch.no_grad():
-            generated_ids = model.generate(
-                **inputs,
-                max_new_tokens=256,
-                do_sample=False,
-                downsample_mode="16x",
-            )
-        trimmed = [out[len(inp):] for inp, out in zip(inputs["input_ids"], generated_ids)]
-        raw = processor.batch_decode(trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-        log.info("validate raw: %s", raw[:400])
-        data = _extract_json(raw)
-        verdict = str(data.get("verdict", "wait"))
-        if verdict not in ("go", "wait", "fix"):
-            verdict = "wait"
-        return {
-            "verdict": verdict,
-            "feedback": str(data.get("feedback", "")),
-            "tip": str(data.get("tip", "")),
-        }
-    except Exception as exc:
-        log.warning("validate failed: %s", exc)
-        return {
-            "verdict": "wait",
-            "feedback": "Could not analyse the photo.",
-            "tip": "Make sure the image is well-lit and in focus.",
-        }

src/agents/recipe_planner.py DELETED Viewed

@@ -1,167 +0,0 @@
-"""Recipe planner agent: propose dishes + generate step-by-step recipe.
-Uses openbmb/MiniCPM4.1-8B (text-only) as the primary planner.
-Falls back to the shared vision model (MiniCPM-V-4.6) when the planner
-model is unavailable (e.g. insufficient RAM on the Space).
-"""
-from __future__ import annotations
-import json
-import logging
-import re
-import spaces
-import torch
-from src import config
-from src.pipeline import DishOption, Recipe, RecipeStep
-log = logging.getLogger(__name__)
-_PROPOSE_PROMPT = (config.PROMPTS_DIR / "planner_propose.txt").read_text(encoding="utf-8")
-_RECIPE_PROMPT = (config.PROMPTS_DIR / "planner_recipe.txt").read_text(encoding="utf-8")
-# ---------------------------------------------------------------------------
-# JSON extraction helpers
-# ---------------------------------------------------------------------------
-def _extract_json(text: str) -> dict:
-    """Robustly extract the first JSON object from raw model output."""
-    text = text.strip()
-    try:
-        return json.loads(text)
-    except Exception:
-        pass
-    # Markdown code-block
-    m = re.search(r"```(?:json)?\s*(\{.*?\})\s*```", text, re.DOTALL)
-    if m:
-        try:
-            return json.loads(m.group(1))
-        except Exception:
-            pass
-    # First {...} block with minor auto-fixes
-    m = re.search(r"\{.*\}", text, re.DOTALL)
-    if m:
-        candidate = m.group(0)
-        candidate = candidate.replace("'", '"')
-        candidate = re.sub(r",\s*([}\]])", r"\1", candidate)
-        try:
-            return json.loads(candidate)
-        except Exception:
-            pass
-    log.warning("Could not extract JSON from output (first 300 chars): %.300s", text)
-    return {}
-# ---------------------------------------------------------------------------
-# Inference dispatcher
-# ---------------------------------------------------------------------------
-def _infer(prompt: str, max_new_tokens: int = 1024, temperature: float = 0.0) -> str:
-    """Run text inference.
-    Primary: the dedicated MiniCPM4.1-8B planner Modal endpoint (transformers
-    4.x). Falls back to the local vision model (text-only) if the endpoint is
-    unavailable or returns nothing.
-    """
-    try:
-        import modal
-        cls = modal.Cls.from_name(config.PLANNER_MODAL_APP, config.PLANNER_MODAL_CLS)
-        out = cls().infer.remote(prompt, max_new_tokens=max_new_tokens, temperature=temperature)
-        if out and out.strip():
-            return out
-        log.warning("Planner endpoint returned empty — falling back to vision model.")
-    except Exception as exc:
-        log.warning("Planner endpoint call failed: %s — falling back to vision model.", exc)
-    # Fallback: use the vision model in text-only mode
-    log.warning("Using vision model as text fallback.")
-    from src.agents.mise_en_place import model as vis_model, processor as vis_proc
-    messages = [{"role": "user", "content": [{"type": "text", "text": prompt}]}]
-    inputs = vis_proc.apply_chat_template(
-        messages,
-        add_generation_prompt=True,
-        tokenize=True,
-        return_dict=True,
-        return_tensors="pt",
-        enable_thinking=False,
-    )
-    device = vis_model.device
-    inputs = {k: v.to(device) if isinstance(v, torch.Tensor) else v for k, v in inputs.items()}
-    for k, v in inputs.items():
-        if isinstance(v, torch.Tensor) and torch.is_floating_point(v):
-            inputs[k] = v.to(dtype=torch.bfloat16)
-    with torch.no_grad():
-        generated_ids = vis_model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False)
-    trimmed = [out[len(inp):] for inp, out in zip(inputs["input_ids"], generated_ids)]
-    return vis_proc.batch_decode(trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-# ---------------------------------------------------------------------------
-# Public agent functions
-# ---------------------------------------------------------------------------
-@spaces.GPU(duration=90)
-def propose_dishes(ingredients: list[str]) -> list[DishOption]:
-    """Given detected ingredients, return up to 3 dish proposals."""
-    try:
-        prompt = _PROPOSE_PROMPT.replace("{ingredients}", ", ".join(ingredients))
-        raw = _infer(prompt, max_new_tokens=512, temperature=0.7)
-        log.info("propose_dishes raw: %.500s", raw)
-        data = _extract_json(raw)
-        options = data.get("options", [])
-        return [
-            DishOption(name=str(o.get("name", "Dish")), why=str(o.get("why", "")))
-            for o in options[:3]
-            if o.get("name")
-        ] or [DishOption(name="Simple Stir-fry", why="Quick and adaptable to most ingredients.")]
-    except Exception as exc:
-        log.warning("propose_dishes failed: %s", exc)
-        return [DishOption(name="Simple Stir-fry", why="Quick and adaptable to most ingredients.")]
-@spaces.GPU(duration=120)
-def plan_recipe(dish_name: str, ingredients: list[str]) -> Recipe:
-    """Generate a full step-by-step recipe for the chosen dish."""
-    try:
-        prompt = (
-            _RECIPE_PROMPT
-            .replace("{dish_name}", dish_name)
-            .replace("{ingredients}", ", ".join(ingredients))
-        )
-        raw = _infer(prompt, max_new_tokens=1024, temperature=0.0)
-        log.info("plan_recipe raw: %.800s", raw)
-        data = _extract_json(raw)
-        raw_steps = data.get("steps", [])
-        steps = []
-        for i, s in enumerate(raw_steps, start=1):
-            if not s.get("instruction"):
-                continue
-            tip_val = s.get("tip")
-            steps.append(RecipeStep(
-                n=int(s.get("n", i)),
-                instruction=str(s["instruction"]),
-                duration=str(s.get("duration", "5 min")),
-                tip=str(tip_val) if tip_val and str(tip_val).lower() not in ("null", "none") else None,
-                visual=str(s.get("visual", "")),
-            ))
-        return Recipe(
-            name=str(data.get("name", dish_name)),
-            cuisine=str(data.get("cuisine", "International")),
-            servings=int(data.get("servings", 2)),
-            total_time_minutes=int(data.get("total_time_minutes", 30)),
-            final_dish_visual=str(data.get("final_dish_visual", "")),
-            steps=steps or [RecipeStep(n=1, instruction="Prepare and cook ingredients to taste.", duration="20 min")],
-        )
-    except Exception as exc:
-        log.warning("plan_recipe failed: %s", exc)
-        return Recipe(
-            name=dish_name,
-            steps=[RecipeStep(n=1, instruction="Prepare and cook ingredients to taste.", duration="20 min")],
-        )

src/agents/step_illustrator.py DELETED Viewed

@@ -1,81 +0,0 @@
-"""Step image generator — delegates to the deployed Modal FLUX.2 endpoint."""
-from __future__ import annotations
-import base64
-import logging
-from typing import Optional
-from src import config
-from src.pipeline import Recipe, RecipeStep
-log = logging.getLogger(__name__)
-# ---------------------------------------------------------------------------
-# Helpers
-# ---------------------------------------------------------------------------
-def _b64(png_bytes: bytes) -> str:
-    return base64.b64encode(png_bytes).decode()
-def _step_prompt(visual: str, cuisine: str, n: int) -> str:
-    desc = visual.strip() or f"cooking step {n}"
-    return (
-        f"Top-down photo of a kitchen pan or plate showing {desc}. "
-        f"{cuisine} home cooking. Warm natural lighting. "
-        "Recipe magazine style. Photorealistic. Appetizing."
-    )
-def _dish_prompt(visual: str, cuisine: str) -> str:
-    desc = visual.strip() or "the finished plated dish, garnished and beautifully presented"
-    return (
-        f"Top-down photo of a {desc} on a rustic wooden table. "
-        f"{cuisine} home cooking. Warm natural lighting. "
-        "Recipe magazine style. Photorealistic. Appetizing."
-    )
-# ---------------------------------------------------------------------------
-# Modal call
-# ---------------------------------------------------------------------------
-def _call_modal(prompt: str, seed: int = 42) -> Optional[bytes]:
-    """Call the deployed Modal FLUX endpoint. Returns PNG bytes or None."""
-    try:
-        import modal
-        cls = modal.Cls.from_name(config.MODAL_APP_NAME, config.MODAL_CLS_NAME)
-        return cls().render_step.remote(prompt, seed=seed)
-    except Exception as exc:
-        log.warning("Modal FLUX call failed: %s", exc)
-        return None
-# ---------------------------------------------------------------------------
-# Public function
-# ---------------------------------------------------------------------------
-def illustrate_recipe(recipe: Recipe) -> Recipe:
-    """Generate FLUX images for every step + final dish.
-    Mutates and returns the same Recipe with image_b64 fields populated
-    (or left as None when Modal is unavailable).
-    """
-    cuisine = recipe.cuisine or "International"
-    # Final dish hero image
-    final_bytes = _call_modal(_dish_prompt(recipe.final_dish_visual, cuisine), seed=0)
-    if final_bytes:
-        recipe.final_dish_image_b64 = _b64(final_bytes)
-        log.info("Generated final dish image.")
-    # Per-step images (sequential to respect GPU limits on Modal)
-    for step in recipe.steps:
-        prompt = _step_prompt(step.visual, cuisine, step.n)
-        step_bytes = _call_modal(prompt, seed=step.n)
-        if step_bytes:
-            step.image_b64 = _b64(step_bytes)
-            log.info("Generated image for step %d.", step.n)
-    return recipe

src/config.py CHANGED Viewed

@@ -21,21 +21,10 @@ VISION_REPO = "openbmb/MiniCPM-V-4_6-GGUF"
 VISION_MODEL_FILE = "MiniCPM-V-4_6-Q4_K_M.gguf"
 VISION_MMPROJ_FILE = "mmproj-model-f16.gguf"
-# Base model; set COOK_WITH_ME_PLANNER_REPO to point at a fine-tuned HF repo
-PLANNER_REPO = os.environ.get("COOK_WITH_ME_PLANNER_REPO", "openbmb/MiniCPM4.1-8B")
-PLANNER_FINETUNED_REPO = os.environ.get("COOK_WITH_ME_PLANNER_FT_REPO", "")  # set after fine-tune
-# Modal app names
-MODAL_APP_NAME = "cook-with-me-flux"
-MODAL_CLS_NAME = "FluxKlein"
-# Planner runs in its own Modal app (transformers 4.x, conflicts with the
-# vision model's transformers 5.x — so it can't live in the same container).
-PLANNER_MODAL_APP = "cook-with-me-planner"
-PLANNER_MODAL_CLS = "Planner"
-FLUX_REPO = os.environ.get("COOK_WITH_ME_FLUX_REPO", "black-forest-labs/FLUX.2-klein-9B")
-FLUX_FALLBACK_REPO = "black-forest-labs/FLUX.1-schnell"
 NARRATOR_REPO = "openbmb/VoxCPM2"
 EMBED_MODEL = "sentence-transformers/all-MiniLM-L6-v2"

 VISION_MODEL_FILE = "MiniCPM-V-4_6-Q4_K_M.gguf"
 VISION_MMPROJ_FILE = "mmproj-model-f16.gguf"
+PLANNER_REPO = "openbmb/MiniCPM-V-4-gguf"
+PLANNER_MODEL_FILE = "Model-Q4_K_M.gguf"
+FLUX_REPO = "black-forest-labs/FLUX.2-klein-9B"
 NARRATOR_REPO = "openbmb/VoxCPM2"
 EMBED_MODEL = "sentence-transformers/all-MiniLM-L6-v2"

src/data/__init__.py DELETED Viewed

File without changes

src/data/nutrition.py DELETED Viewed

@@ -1,112 +0,0 @@
-"""Per-serving macro estimator — ingredient lookup, no extra model call needed."""
-from __future__ import annotations
-# (calories kcal, protein g, carbs g, fat g, fiber g) per 100 g
-_MACROS: dict[str, tuple[float, float, float, float, float]] = {
-    # proteins
-    "chicken": (165, 31, 0, 3.6, 0),
-    "beef": (250, 26, 0, 16, 0),
-    "pork": (242, 27, 0, 14, 0),
-    "fish": (130, 20, 0, 5, 0),
-    "salmon": (208, 20, 0, 13, 0),
-    "tuna": (130, 29, 0, 0.5, 0),
-    "shrimp": (99, 24, 0, 0.3, 0),
-    "egg": (155, 13, 1.1, 11, 0),
-    "eggs": (155, 13, 1.1, 11, 0),
-    "tofu": (76, 8, 1.9, 4.8, 0.3),
-    # dairy
-    "milk": (61, 3.2, 4.8, 3.3, 0),
-    "cheese": (402, 25, 1.3, 33, 0),
-    "butter": (717, 0.9, 0.1, 81, 0),
-    "yogurt": (59, 3.5, 4.7, 3.3, 0),
-    "cream": (340, 2.1, 2.8, 36, 0),
-    # starches
-    "rice": (130, 2.7, 28, 0.3, 0.4),
-    "pasta": (158, 5.8, 31, 0.9, 1.8),
-    "bread": (265, 9, 49, 3.2, 2.7),
-    "potato": (77, 2, 17, 0.1, 2.2),
-    "potatoes": (77, 2, 17, 0.1, 2.2),
-    "flour": (364, 10, 76, 1, 2.7),
-    "oats": (389, 17, 66, 7, 10.6),
-    "quinoa": (120, 4.1, 21, 1.9, 2.8),
-    "lentils": (116, 9, 20, 0.4, 7.9),
-    "beans": (347, 21, 60, 1.2, 15),
-    "chickpeas": (164, 8.9, 27, 2.6, 7.6),
-    # vegetables
-    "tomato": (18, 0.9, 3.9, 0.2, 1.2),
-    "tomatoes": (18, 0.9, 3.9, 0.2, 1.2),
-    "onion": (40, 1.1, 9.3, 0.1, 1.7),
-    "onions": (40, 1.1, 9.3, 0.1, 1.7),
-    "garlic": (149, 6.4, 33, 0.5, 2.1),
-    "carrot": (41, 0.9, 10, 0.2, 2.8),
-    "carrots": (41, 0.9, 10, 0.2, 2.8),
-    "broccoli": (34, 2.8, 7, 0.4, 2.6),
-    "spinach": (23, 2.9, 3.6, 0.4, 2.2),
-    "pepper": (31, 1, 6, 0.3, 2.1),
-    "peppers": (31, 1, 6, 0.3, 2.1),
-    "mushroom": (22, 3.1, 3.3, 0.3, 1),
-    "mushrooms": (22, 3.1, 3.3, 0.3, 1),
-    "zucchini": (17, 1.2, 3.1, 0.3, 1),
-    "corn": (86, 3.3, 19, 1.4, 2.7),
-    "lettuce": (15, 1.4, 2.9, 0.2, 1.3),
-    "cucumber": (16, 0.7, 3.6, 0.1, 0.5),
-    "eggplant": (25, 1, 5.9, 0.2, 3),
-    "cabbage": (25, 1.3, 5.8, 0.1, 2.5),
-    "celery": (16, 0.7, 3, 0.2, 1.6),
-    "leek": (61, 1.5, 14, 0.3, 1.8),
-    # fruits
-    "apple": (52, 0.3, 14, 0.2, 2.4),
-    "banana": (89, 1.1, 23, 0.3, 2.6),
-    "lemon": (29, 1.1, 9.3, 0.3, 2.8),
-    "lime": (30, 0.7, 10.5, 0.2, 2.8),
-    "orange": (47, 0.9, 12, 0.1, 2.4),
-    # fats & condiments
-    "olive oil": (884, 0, 0, 100, 0),
-    "oil": (884, 0, 0, 100, 0),
-    "soy sauce": (53, 8.1, 4.9, 0.1, 0.8),
-    "honey": (304, 0.3, 82, 0, 0.2),
-    "sugar": (387, 0, 100, 0, 0),
-    "salt": (0, 0, 0, 0, 0),
-    "vinegar": (18, 0, 0.9, 0, 0),
-}
-# Typical portion weight per ingredient (grams)
-_GRAMS: dict[str, int] = {
-    "egg": 50, "eggs": 100,
-    "butter": 15,
-    "olive oil": 14, "oil": 14,
-    "soy sauce": 15,
-    "salt": 3,
-    "garlic": 10,
-    "honey": 21,
-    "sugar": 12,
-    "lemon": 30, "lime": 30,
-}
-_DEFAULT_GRAMS = 80
-def compute_nutrition(ingredients: list[str], servings: int = 2) -> dict[str, float]:
-    """Return per-serving macro estimates keyed to the NutritionGrid format."""
-    cal = prot = carb = fat = fib = 0.0
-    for ing in ingredients:
-        key = ing.lower().strip()
-        row = _MACROS.get(key) or _MACROS.get(key.split()[0]) if key else None
-        if row is None:
-            continue
-        grams = _GRAMS.get(key, _DEFAULT_GRAMS)
-        f = grams / 100
-        c, p, cb, ft, fb = row
-        cal += c * f
-        prot += p * f
-        carb += cb * f
-        fat += ft * f
-        fib += fb * f
-    sv = max(servings, 1)
-    return {
-        "calories": round(cal / sv),
-        "protein_g": round(prot / sv, 1),
-        "carbs_g": round(carb / sv, 1),
-        "fat_g": round(fat / sv, 1),
-        "fiber_g": round(fib / sv, 1),
-    }

src/models/planner.py DELETED Viewed

@@ -1,103 +0,0 @@
-"""MiniCPM4.1-8B text-only planner — lazy singleton."""
-from __future__ import annotations
-import logging
-import os
-from typing import Any, Optional, Tuple
-import torch
-from src import config
-log = logging.getLogger(__name__)
-_model: Any = None
-_tokenizer: Any = None
-def get_planner() -> Tuple[Optional[Any], Optional[Any]]:
-    """Return (model, tokenizer).  Loads once; returns (None, None) on failure."""
-    global _model, _tokenizer
-    if _model is not None:
-        return _model, _tokenizer
-    # Prefer fine-tuned repo when available
-    model_id = config.PLANNER_FINETUNED_REPO or config.PLANNER_REPO
-    try:
-        # MiniCPM4.1 custom code imports is_torch_fx_available, which was
-        # removed in transformers 5.x. Patch it back before loading.
-        import transformers.utils.import_utils as _iutils
-        if not hasattr(_iutils, "is_torch_fx_available"):
-            def _is_torch_fx_available():
-                try:
-                    import torch.fx  # noqa: F401
-                    return True
-                except ImportError:
-                    return False
-            _iutils.is_torch_fx_available = _is_torch_fx_available
-        from transformers import AutoModelForCausalLM, AutoTokenizer
-        device_map = "auto" if os.environ.get("SPACE_ID") else (
-            "cuda" if torch.cuda.is_available() else "cpu"
-        )
-        log.info("Loading planner model %s (device_map=%s)...", model_id, device_map)
-        _tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
-        _model = AutoModelForCausalLM.from_pretrained(
-            model_id,
-            torch_dtype=torch.bfloat16,
-            trust_remote_code=True,
-            device_map=device_map,
-        ).eval()
-        log.info("Planner model ready.")
-    except Exception as exc:
-        log.error("Could not load planner model '%s': %s", model_id, exc)
-        _model = None
-        _tokenizer = None
-    return _model, _tokenizer
-def infer(prompt: str, max_new_tokens: int = 1024, temperature: float = 0.0) -> str:
-    """Run text inference with the planner model.
-    Returns empty string if the model is unavailable.
-    """
-    model, tokenizer = get_planner()
-    if model is None or tokenizer is None:
-        return ""
-    try:
-        messages = [{"role": "user", "content": prompt}]
-        # return_dict=True yields a BatchEncoding (dict-like) with input_ids +
-        # attention_mask. NOTE: BatchEncoding is NOT a `dict` instance, so we
-        # must access it via mapping keys, never via tensor attrs like .shape.
-        enc = tokenizer.apply_chat_template(
-            messages,
-            add_generation_prompt=True,
-            tokenize=True,
-            return_tensors="pt",
-            return_dict=True,
-        )
-        input_ids = enc["input_ids"].to(model.device)
-        input_len = input_ids.shape[1]
-        gen_inputs = {"input_ids": input_ids}
-        attn = enc.get("attention_mask")
-        if attn is not None:
-            gen_inputs["attention_mask"] = attn.to(model.device)
-        gen_kwargs: dict = dict(max_new_tokens=max_new_tokens, do_sample=False)
-        if temperature > 0:
-            gen_kwargs.update(do_sample=True, temperature=temperature, top_p=0.95)
-        with torch.no_grad():
-            output = model.generate(**gen_inputs, **gen_kwargs)
-        token_ids = output[0][input_len:]
-        return tokenizer.decode(token_ids, skip_special_tokens=True)
-    except Exception as exc:
-        log.error("Planner inference error: %r", exc, exc_info=True)
-        return ""

src/pipeline.py DELETED Viewed

@@ -1,32 +0,0 @@
-"""Shared data models for the Cook-with-Me pipeline."""
-from __future__ import annotations
-from typing import Optional
-from pydantic import BaseModel, Field
-class DishOption(BaseModel):
-    name: str
-    why: str = ""
-class RecipeStep(BaseModel):
-    n: int = 1
-    instruction: str
-    duration: str = "5 min"
-    tip: Optional[str] = None
-    visual: str = ""
-    image_path: Optional[str] = None
-    image_b64: Optional[str] = None  # base64 PNG from FLUX
-class Recipe(BaseModel):
-    name: str
-    cuisine: str = "International"
-    servings: int = 2
-    total_time_minutes: int = 30
-    steps: list[RecipeStep] = Field(default_factory=list)
-    nutrition: dict = Field(default_factory=dict)
-    final_dish_visual: str = ""
-    final_dish_image_path: Optional[str] = None
-    final_dish_image_b64: Optional[str] = None  # base64 PNG from FLUX

src/prompts/planner_propose.txt DELETED Viewed

@@ -1,11 +0,0 @@
-You are a creative chef assistant. Given a list of available ingredients, suggest exactly 3 diverse and delicious dishes.
-Available ingredients: {ingredients}
-Rules:
-- Each dish must be realistic to make with the listed ingredients
-- Vary the style: aim for different cuisines or preparations
-- Be specific with dish names (e.g., "Garlic Butter Shrimp Pasta" not "Pasta")
-Respond ONLY with valid JSON and nothing else — no explanation, no markdown fences:
-{"options": [{"name": "Dish Name 1", "why": "One sentence on why this works with the ingredients"}, {"name": "Dish Name 2", "why": "..."}, {"name": "Dish Name 3", "why": "..."}]}

src/prompts/planner_recipe.txt DELETED Viewed

@@ -1,11 +0,0 @@
-You are a professional chef writing a clear, detailed recipe.
-Dish to prepare: {dish_name}
-Available ingredients: {ingredients}
-Create a complete recipe with 4 to 7 steps. Each step must be specific and actionable.
-Respond ONLY with valid JSON and nothing else — no explanation, no markdown fences:
-{"name": "Full Recipe Title", "cuisine": "Cuisine type", "servings": 2, "total_time_minutes": 30, "final_dish_visual": "One evocative sentence describing how the finished dish looks and smells", "steps": [{"n": 1, "instruction": "Detailed step description.", "duration": "5 min", "tip": "Optional chef tip or null"}, {"n": 2, "instruction": "...", "duration": "3 min", "tip": null}]}
-Important: tip must be a string or null, never omit it.

src/prompts/validator_prompt.txt DELETED Viewed

@@ -1,14 +0,0 @@
-You are a supportive cooking coach reviewing a student's progress photo.
-The step they are working on:
-"{step_instruction}"
-Look carefully at the photo and decide:
-- "go"   → the step is correctly completed, they can move on
-- "wait" → it's progressing but needs more time (undercooked, still mixing, etc.)
-- "fix"  → there is a clear mistake that needs correction right now
-Respond ONLY with valid JSON and nothing else:
-{"verdict": "go", "feedback": "One sentence describing exactly what you see in the photo.", "tip": "One specific, actionable piece of advice for the cook."}
-verdict must be exactly one of: go, wait, fix.

src/ui/components.py CHANGED Viewed

@@ -80,7 +80,7 @@ class TemplatedHTML(gr.HTML):
 class RecipeHero(TemplatedHTML):
     css_template = """
 .cwm-hero {
-  background: #fffbf0 !important;
   border: 1px solid #d8c9ad;
   border-radius: 16px;
   padding: 32px;
@@ -94,15 +94,15 @@ class RecipeHero(TemplatedHTML):
   background: #efe3c8;
 }
 .cwm-hero h1 {
-  font-family: 'Lora', serif; font-size: 38px; color: #6b4a2a !important;
   margin: 0 0 8px;
 }
 .cwm-hero .meta {
-  color: #8a6a3a !important; font-size: 14px; letter-spacing: 0.04em;
   text-transform: uppercase; margin-bottom: 18px;
 }
 .cwm-hero .visual {
-  font-family: 'Lora', serif; font-style: italic; color: #6b4a2a !important;
   font-size: 17px; line-height: 1.55;
 }
 @media (max-width: 720px) { .cwm-hero { grid-template-columns: 1fr; } }
@@ -115,14 +115,11 @@ class RecipeHero(TemplatedHTML):
         servings = state.get("servings") or 0
         time = state.get("total_time_minutes") or 0
         visual = html.escape(state.get("final_dish_visual") or "")
-        img_b64 = state.get("final_dish_image_b64") or ""
-        img_path = state.get("final_dish_image_path") or ""
-        if img_b64:
-            img_tag = f'<img src="data:image/png;base64,{img_b64}" alt="final dish"/>'
-        elif img_path:
-            img_tag = f'<img src="/file={html.escape(img_path)}" alt="final dish"/>'
-        else:
-            img_tag = '<div style="background:#efe3c8;border-radius:12px;height:320px;display:flex;align-items:center;justify-content:center;color:#8a6a3a;font-family:\'Lora\',serif;font-style:italic;">Image will appear here</div>'
         return f"""
 <div class="cwm-hero">
   <div>{img_tag}</div>
@@ -189,15 +186,15 @@ class IngredientChips(TemplatedHTML):
 class DishOptions(TemplatedHTML):
     css_template = """
 .cwm-options { display: grid; grid-template-columns: repeat(3, 1fr); gap: 14px; }
-.cwm-options .cwm-option {
-  background: #fffbf0 !important; border: 1px solid #d8c9ad; border-radius: 12px;
   padding: 18px; text-align: left;
 }
-.cwm-options .cwm-option h3 {
-  font-family: 'Lora', serif; font-size: 19px; color: #6b4a2a !important;
   margin: 0 0 6px;
 }
-.cwm-options .cwm-option p { color: #7a5a35 !important; font-size: 14px; line-height: 1.45; margin: 0; }
 @media (max-width: 720px) { .cwm-options { grid-template-columns: 1fr; } }
 """
@@ -220,32 +217,32 @@ class DishOptions(TemplatedHTML):
 class StepCard(TemplatedHTML):
     css_template = """
 .cwm-steps { display: flex; flex-direction: column; gap: 16px; }
-.cwm-steps .cwm-step {
   display: grid; grid-template-columns: 220px 1fr; gap: 22px;
-  background: #fffbf0 !important; border-left: 4px solid #a85c2a; border-radius: 10px;
   padding: 18px 22px;
 }
-.cwm-steps .cwm-step img {
   width: 220px; height: 160px; object-fit: cover; border-radius: 8px;
   background: #efe3c8;
 }
-.cwm-steps .cwm-step .placeholder {
   width: 220px; height: 160px; border-radius: 8px;
   background: linear-gradient(135deg,#efe3c8,#dccaa3);
   display:flex; align-items:center; justify-content:center;
-  color: #8a6a3a !important; font-family: 'Lora', serif; font-size: 14px;
 }
-.cwm-steps .cwm-step h3 {
-  font-family: 'Lora', serif; color: #6b4a2a !important; margin: 0 0 6px; font-size: 22px;
 }
-.cwm-steps .cwm-step p { font-size: 16px; line-height: 1.55; color: #4a3722 !important; margin: 0 0 8px; }
-.cwm-steps .cwm-step .duration {
-  display: inline-block; background: #a85c2a !important; color: #fffbf0 !important;
   border-radius: 999px; padding: 3px 10px; font-size: 12px; letter-spacing: 0.04em;
 }
-.cwm-steps .cwm-step .tip {
-  margin-top: 10px; padding: 10px 12px; background: #fff3d8 !important;
-  border-radius: 8px; font-size: 14px; color: #6b4a2a !important;
 }
 .cwm-step .tip::before { content: "💡 "; }
 @media (max-width: 720px) { .cwm-step { grid-template-columns: 1fr; } .cwm-step img, .cwm-step .placeholder { width: 100%; } }
@@ -263,14 +260,11 @@ class StepCard(TemplatedHTML):
             dur = html.escape(s.get("duration", ""))
             tip = s.get("tip")
             visual = html.escape(s.get("visual", ""))
-            img_b64 = s.get("image_b64") or ""
-            img_path = s.get("image_path") or ""
-            if img_b64:
-                img_block = f'<img src="data:image/png;base64,{img_b64}" alt="step {n}"/>'
-            elif img_path:
-                img_block = f'<img src="/file={html.escape(img_path)}" alt="step {n}"/>'
-            else:
-                img_block = f'<div class="placeholder">{visual[:80] if visual else f"Step {n}"}</div>'
             tip_block = f'<div class="tip">{html.escape(tip)}</div>' if tip else ""
             cards.append(f"""
 <div class="cwm-step">
@@ -293,22 +287,22 @@ class NutritionGrid(TemplatedHTML):
     css_template = """
 .cwm-nutri-wrap { margin-top: 10px; }
 .cwm-nutri-title {
-  font-family: 'Lora', serif; color: #6b4a2a !important; font-size: 22px; margin: 0 0 14px;
 }
 .cwm-nutri {
   display: grid; grid-template-columns: repeat(5, 1fr); gap: 12px;
 }
-.cwm-nutri .cwm-nutri-cell {
-  background: #fffbf0 !important; border: 1px solid #d8c9ad; border-radius: 10px;
   padding: 14px 10px; text-align: center;
 }
-.cwm-nutri .cwm-nutri-cell .v {
-  font-family: 'Lora', serif; font-size: 24px; font-weight: 700; color: #6b4a2a !important;
   display: block;
 }
-.cwm-nutri .cwm-nutri-cell .l {
   font-size: 11px; letter-spacing: 0.08em; text-transform: uppercase;
-  color: #8a6a3a !important; margin-top: 4px;
 }
 @media (max-width: 720px) { .cwm-nutri { grid-template-columns: repeat(2, 1fr); } }
 """
@@ -343,7 +337,7 @@ class VerdictBadge(TemplatedHTML):
     css_template = """
 .cwm-verdict {
   display: flex; align-items: center; gap: 18px;
-  background: #fffbf0 !important; border-radius: 12px; padding: 18px 22px;
   border: 1px solid #d8c9ad;
 }
 .cwm-verdict.go    { border-left: 6px solid #4f8b4a; }
@@ -357,8 +351,8 @@ class VerdictBadge(TemplatedHTML):
 .cwm-verdict.go    .cwm-verdict-pill { background: #4f8b4a; }
 .cwm-verdict.wait  .cwm-verdict-pill { background: #d4a23c; }
 .cwm-verdict.fix   .cwm-verdict-pill { background: #b94a3a; }
-.cwm-verdict-text  { font-size: 16px; color: #4a3722 !important; line-height: 1.5; }
-.cwm-verdict-text small { color: #8a6a3a !important; display: block; margin-top: 4px; }
 .cwm-verdict-empty {
   color: #b39870; font-style: italic; padding: 14px 0;
 }

 class RecipeHero(TemplatedHTML):
     css_template = """
 .cwm-hero {
+  background: #fffbf0;
   border: 1px solid #d8c9ad;
   border-radius: 16px;
   padding: 32px;
   background: #efe3c8;
 }
 .cwm-hero h1 {
+  font-family: 'Lora', serif; font-size: 38px; color: #6b4a2a;
   margin: 0 0 8px;
 }
 .cwm-hero .meta {
+  color: #8a6a3a; font-size: 14px; letter-spacing: 0.04em;
   text-transform: uppercase; margin-bottom: 18px;
 }
 .cwm-hero .visual {
+  font-family: 'Lora', serif; font-style: italic; color: #6b4a2a;
   font-size: 17px; line-height: 1.55;
 }
 @media (max-width: 720px) { .cwm-hero { grid-template-columns: 1fr; } }
         servings = state.get("servings") or 0
         time = state.get("total_time_minutes") or 0
         visual = html.escape(state.get("final_dish_visual") or "")
+        img = state.get("final_dish_image_path") or ""
+        img_tag = (
+            f'<img src="/file={html.escape(img)}" alt="final dish"/>'
+            if img else '<div class="cwm-hero" style="background:#efe3c8;border-radius:12px;height:320px;"></div>'
+        )
         return f"""
 <div class="cwm-hero">
   <div>{img_tag}</div>
 class DishOptions(TemplatedHTML):
     css_template = """
 .cwm-options { display: grid; grid-template-columns: repeat(3, 1fr); gap: 14px; }
+.cwm-option {
+  background: #fffbf0; border: 1px solid #d8c9ad; border-radius: 12px;
   padding: 18px; text-align: left;
 }
+.cwm-option h3 {
+  font-family: 'Lora', serif; font-size: 19px; color: #6b4a2a;
   margin: 0 0 6px;
 }
+.cwm-option p { color: #7a5a35; font-size: 14px; line-height: 1.45; margin: 0; }
 @media (max-width: 720px) { .cwm-options { grid-template-columns: 1fr; } }
 """
 class StepCard(TemplatedHTML):
     css_template = """
 .cwm-steps { display: flex; flex-direction: column; gap: 16px; }
+.cwm-step {
   display: grid; grid-template-columns: 220px 1fr; gap: 22px;
+  background: #fffbf0; border-left: 4px solid #a85c2a; border-radius: 10px;
   padding: 18px 22px;
 }
+.cwm-step img {
   width: 220px; height: 160px; object-fit: cover; border-radius: 8px;
   background: #efe3c8;
 }
+.cwm-step .placeholder {
   width: 220px; height: 160px; border-radius: 8px;
   background: linear-gradient(135deg,#efe3c8,#dccaa3);
   display:flex; align-items:center; justify-content:center;
+  color: #8a6a3a; font-family: 'Lora', serif; font-size: 14px;
 }
+.cwm-step h3 {
+  font-family: 'Lora', serif; color: #6b4a2a; margin: 0 0 6px; font-size: 22px;
 }
+.cwm-step p { font-size: 16px; line-height: 1.55; color: #4a3722; margin: 0 0 8px; }
+.cwm-step .duration {
+  display: inline-block; background: #a85c2a; color: #fffbf0;
   border-radius: 999px; padding: 3px 10px; font-size: 12px; letter-spacing: 0.04em;
 }
+.cwm-step .tip {
+  margin-top: 10px; padding: 10px 12px; background: #fff3d8;
+  border-radius: 8px; font-size: 14px; color: #6b4a2a;
 }
 .cwm-step .tip::before { content: "💡 "; }
 @media (max-width: 720px) { .cwm-step { grid-template-columns: 1fr; } .cwm-step img, .cwm-step .placeholder { width: 100%; } }
             dur = html.escape(s.get("duration", ""))
             tip = s.get("tip")
             visual = html.escape(s.get("visual", ""))
+            img = s.get("image_path")
+            img_block = (
+                f'<img src="/file={html.escape(img)}" alt="step {n}"/>'
+                if img else f'<div class="placeholder">{visual[:80]}</div>'
+            )
             tip_block = f'<div class="tip">{html.escape(tip)}</div>' if tip else ""
             cards.append(f"""
 <div class="cwm-step">
     css_template = """
 .cwm-nutri-wrap { margin-top: 10px; }
 .cwm-nutri-title {
+  font-family: 'Lora', serif; color: #6b4a2a; font-size: 22px; margin: 0 0 14px;
 }
 .cwm-nutri {
   display: grid; grid-template-columns: repeat(5, 1fr); gap: 12px;
 }
+.cwm-nutri-cell {
+  background: #fffbf0; border: 1px solid #d8c9ad; border-radius: 10px;
   padding: 14px 10px; text-align: center;
 }
+.cwm-nutri-cell .v {
+  font-family: 'Lora', serif; font-size: 24px; font-weight: 700; color: #6b4a2a;
   display: block;
 }
+.cwm-nutri-cell .l {
   font-size: 11px; letter-spacing: 0.08em; text-transform: uppercase;
+  color: #8a6a3a; margin-top: 4px;
 }
 @media (max-width: 720px) { .cwm-nutri { grid-template-columns: repeat(2, 1fr); } }
 """
     css_template = """
 .cwm-verdict {
   display: flex; align-items: center; gap: 18px;
+  background: #fffbf0; border-radius: 12px; padding: 18px 22px;
   border: 1px solid #d8c9ad;
 }
 .cwm-verdict.go    { border-left: 6px solid #4f8b4a; }
 .cwm-verdict.go    .cwm-verdict-pill { background: #4f8b4a; }
 .cwm-verdict.wait  .cwm-verdict-pill { background: #d4a23c; }
 .cwm-verdict.fix   .cwm-verdict-pill { background: #b94a3a; }
+.cwm-verdict-text  { font-size: 16px; color: #4a3722; line-height: 1.5; }
+.cwm-verdict-text small { color: #8a6a3a; display: block; margin-top: 4px; }
 .cwm-verdict-empty {
   color: #b39870; font-style: italic; padding: 14px 0;
 }

src/ui/components.pyi CHANGED Viewed

@@ -63,14 +63,11 @@ class RecipeHero(TemplatedHTML):
         servings = state.get("servings") or 0
         time = state.get("total_time_minutes") or 0
         visual = html.escape(state.get("final_dish_visual") or "")
-        img_b64 = state.get("final_dish_image_b64") or ""
-        img_path = state.get("final_dish_image_path") or ""
-        if img_b64:
-            img_tag = f'<img src="data:image/png;base64,{img_b64}" alt="final dish"/>'
-        elif img_path:
-            img_tag = f'<img src="/file={html.escape(img_path)}" alt="final dish"/>'
-        else:
-            img_tag = '<div style="background:#efe3c8;border-radius:12px;height:320px;display:flex;align-items:center;justify-content:center;color:#8a6a3a;font-family:\'Lora\',serif;font-style:italic;">Image will appear here</div>'
         return f"""
 <div class="cwm-hero">
   <div>{img_tag}</div>

         servings = state.get("servings") or 0
         time = state.get("total_time_minutes") or 0
         visual = html.escape(state.get("final_dish_visual") or "")
+        img = state.get("final_dish_image_path") or ""
+        img_tag = (
+            f'<img src="/file={html.escape(img)}" alt="final dish"/>'
+            if img else '<div class="cwm-hero" style="background:#efe3c8;border-radius:12px;height:320px;"></div>'
+        )
         return f"""
 <div class="cwm-hero">
   <div>{img_tag}</div>

src/ui/theme.py CHANGED Viewed

@@ -1,138 +1,33 @@
 import gradio as gr
-# A soft, modern theme overriding Gradio's default harsh borders
 theme = gr.themes.Soft(
-    primary_hue="emerald",
-    neutral_hue="slate",
-    font=[gr.themes.GoogleFont("Inter"), "ui-sans-serif", "system-ui", "sans-serif"],
-    radius_size=gr.themes.sizes.radius_xxl,
-).set(
-    body_background_fill="#f7f9fc",
-    block_background_fill="#ffffff",
-    block_border_width="0px",
-    block_shadow="0 4px 6px -1px rgb(0 0 0 / 0.05), 0 2px 4px -2px rgb(0 0 0 / 0.05)",
-    button_primary_background_fill="*primary_500",
-    button_primary_background_fill_hover="*primary_600"
 )
-# Custom CSS for your injected HTML components to match the mockups
-CSS = """
-/* Global Container Resets */
-.gradio-container {
-    max-width: 1200px !important;
-    margin: auto;
-}
-/* Safe way to add button shadows without breaking Python */
-button.primary {
-    box-shadow: 0 4px 6px -1px rgb(0 0 0 / 0.1) !important;
-}
-/* Card Styling (Inspired by the pastel app mockup) */
-.glass-card {
-    background: #ffffff;
-    border-radius: 24px;
-    padding: 20px;
-    box-shadow: 0 10px 25px rgba(0,0,0,0.03);
-    margin-bottom: 16px;
-    transition: transform 0.2s ease;
-}
-/* Nutrition Pills (Inspired by the green salad mockup) */
-.nutrition-row {
-    display: flex;
-    gap: 12px;
-    justify-content: space-around;
-    flex-wrap: wrap;
-    margin-top: 10px;
-}
-.pill-badge {
-    display: flex;
-    flex-direction: column;
-    align-items: center;
-    justify-content: center;
-    background: #e6f4ea; /* Light sage green */
-    color: #1e4620; /* Dark green text */
-    border-radius: 40px;
-    padding: 16px 20px;
-    min-width: 80px;
-    box-shadow: 0 4px 10px rgba(0,0,0,0.02);
-}
-.pill-value {
-    font-size: 24px;
-    font-weight: 800;
-    line-height: 1;
-    margin-bottom: 4px;
-}
-.pill-label {
-    font-size: 12px;
-    font-weight: 600;
-    text-transform: uppercase;
-    letter-spacing: 0.5px;
-}
-/* Dish Proposal Cards */
-.dish-grid {
-    display: grid;
-    grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
-    gap: 16px;
-    margin-top: 10px;
-}
-.dish-card {
-    background: linear-gradient(135deg, #f0fff4 0%, #e6fffa 100%);
-    border-radius: 20px;
-    padding: 16px;
-    text-align: center;
-    border: 1px solid #c6f6d5;
-}
-.dish-card h4 {
-    margin: 0 0 8px 0;
-    font-size: 18px;
-    color: #2d3748;
-}
-/* Hero Image Header (Inspired by the spaghetti mockup) */
-.hero-container {
-    position: relative;
-    border-radius: 24px;
-    overflow: hidden;
-    margin-bottom: 24px;
-    box-shadow: 0 10px 30px rgba(0,0,0,0.08);
-}
-.hero-image {
-    width: 100%;
-    height: 300px;
-    object-fit: cover;
-    display: block;
-}
-.hero-overlay {
-    position: absolute;
-    bottom: 0;
-    left: 0;
-    right: 0;
-    background: linear-gradient(to top, rgba(0,0,0,0.8), transparent);
-    padding: 30px 20px 20px;
-    color: white;
-}
-.hero-overlay h2 {
-    margin: 0;
-    font-size: 32px;
-    font-weight: 800;
-    color: white;
-}
-/* Ingredient Chips */
-.chip-container {
-    display: flex;
-    flex-wrap: wrap;
-    gap: 8px;
-}
-.chip {
-    background: #edf2f7;
-    color: #4a5568;
-    padding: 6px 14px;
-    border-radius: 20px;
-    font-size: 14px;
-    font-weight: 500;
-}
-"""

+"""Gradio theme + global CSS for the recipe-card look."""
+from __future__ import annotations
 import gradio as gr
 theme = gr.themes.Soft(
+    primary_hue="orange",
+    neutral_hue="stone",
+    font=[gr.themes.GoogleFont("Inter"), "sans-serif"],
+    font_mono=[gr.themes.GoogleFont("JetBrains Mono"), "monospace"],
 )
+CSS = """
+@import url('https://fonts.googleapis.com/css2?family=Lora:wght@400;700&display=swap');
+.gradio-container { background: #f5ecd9 !important; }
+.gradio-container .prose h1,
+.gradio-container .prose h2,
+.gradio-container .prose h3 { font-family: 'Lora', serif !important; color: #6b4a2a; }
+/* Generic container shared by every HTMLComponent */
+.cwm-card {
+  border: 1px solid #d8c9ad;
+  border-radius: 14px;
+  padding: 22px 26px;
+  box-shadow: 0 8px 24px rgba(107, 74, 42, 0.08);
+}
+button.primary, .gr-button-primary {
+  background: #a85c2a !important;
+  font-weight: 600 !important;
+  font-size: 16px !important;
+  padding: 12px 22px !important;
+}
+"""