Spaces:

build-small-hackathon
/

Cook_with_a_LLM

Running on Zero

App Files Files Community

README: add hackathon tags + links section, correct to 3-model architecture

by eldinosaur - opened about 17 hours ago

base: refs/heads/main

←

from: refs/pr/7

Discussion Files changed

+108

-86

Files changed (1) hide show

README.md +108 -86

README.md CHANGED Viewed

@@ -1,86 +1,108 @@
----
-title: Cook With A LLM
-emoji: 🍲
-colorFrom: red
-colorTo: yellow
-sdk: gradio
-sdk_version: 6.15.2
-python_version: '3.12'
-app_file: app.py
-pinned: false
-license: apache-2.0
----
-# 🍲 Cook With Me — Multimodal Sous-Chef
-> *Snap your fridge. Pick a dish. Cook step by step. Check your progress with a photo.*
-A closed-loop multimodal cooking assistant built for the **Hugging Face Small Models / Big Adventures Hackathon (June 2026)**.
----
-# Contributors
-1. **eldinosaur** - Carlos Castañeda Mora
-1. **Fred1e4** - Fredin Vazquez
----
-## How it works
-```
-📸 Fridge photo  ──▶  [Vision Agent]          identify ingredients
-                            │
-                            ▼
-                      [Recipe Planner]         propose 3 dishes → full recipe JSON
-                            │
-                            ▼
-                      [Nutrition Engine]       per-serving macros (lookup, no hallucination)
-                            │
-                            ▼
-📸 Progress photo ──▶  [Progress Validator]    go / wait / fix verdict
-```
-1. **Snap** your fridge or pantry — the fine-tuned vision model identifies every ingredient.
-2. **Pick** one of three AI-suggested dishes tailored to what you have.
-3. **Cook** step by step with a generated recipe and per-serving nutrition info.
-4. **Check** your progress by uploading a photo of your pan — the model tells you *go*, *wait*, or *fix*.
----
-## Models
-| Role | Model | Params | Runtime |
-|---|---|---|---|
-| Vision + Planner + Validator | `openbmb/MiniCPM-V-4.6` (fine-tuned) | ~4.6B | `transformers` / ZeroGPU |
-**Total: ~4.6B parameters** (≤ 32B cap ✓ — significant headroom)
-The ingredient-identification model is **fine-tuned** on fridge/pantry photos for higher precision.
----
-## Badges targeted
-| Badge | Status | How |
-|---|---|---|
-| 🎯 Well-Tuned | ✓ | Fine-tuned MiniCPM-V-4.6 for ingredient detection, published to Hub |
-| 🎨 Off-Brand | ✓ | Recipe-card UI with custom CSS — Lora serif, warm parchment palette |
-| 📡 Sharing is Caring | ✓ | Agent traces shared on Hub |
-| 📓 Field Notes | ✓ | Blog post: "Building a closed-loop visual cooking coach" |
----
-## Architecture highlights
-- **Single model, three roles:** MiniCPM-V-4.6 handles vision (ingredients + progress) *and* text planning (recipe JSON generation) — no redundant model downloads.
-- **Closed-loop visual validation:** Flux generates step targets → user cooks → vision model compares — a real agent loop, not a wrapper.
-- **Hallucination-free nutrition:** macros come from a lookup table, not LLM arithmetic.
-- **Robust JSON extraction:** multi-strategy parser handles markdown fences, single quotes, and trailing commas so generation failures degrade gracefully.
----
-## Track
-**Chapter One — Backyard AI** · "Build something for someone you actually know."
-Submission for the Hugging Face Hackathon · June 5–15, 2026.

+---
+title: Cook With A LLM
+emoji: 🍲
+colorFrom: red
+colorTo: yellow
+sdk: gradio
+sdk_version: 6.15.2
+python_version: '3.12'
+app_file: app.py
+pinned: false
+license: apache-2.0
+tags:
+  - backyard-ai
+  - well-tuned
+  - off-brand
+  - sharing-is-caring
+  - field-notes
+---
+# 🍲 Cook With Me — Multimodal Sous-Chef
+> *Snap your fridge. Pick a dish. Cook step by step. Check your progress with a photo.*
+A closed-loop multimodal cooking assistant built for the **Hugging Face Small Models / Big Adventures Hackathon (June 2026)**.
+---
+# Contributors
+1. **eldinosaur** - Carlos Castañeda Mora
+1. **Fred1e4** - Fredin Vazquez
+---
+## 🔗 Links
+- 🎥 **Demo video:** <!-- TODO: replace with your YouTube/public video URL --> `[ADD DEMO VIDEO URL]`
+- 📱 **Social post:** https://www.instagram.com/fd_albert14/p/DZnz-oaGorr/
+- 🤗 **Live Space:** https://huggingface.co/spaces/build-small-hackathon/Cook_with_a_LLM
+- 🧠 **Fine-tuned planner:** https://huggingface.co/eldinosaur/cook-with-me-planner-8b
+- 📊 **SFT dataset:** https://huggingface.co/datasets/eldinosaur/cook-with-me-recipes-sft
+---
+## How it works
+```
+📸 Fridge photo  ──▶  [Vision Agent]          identify ingredients
+                            │
+                            ▼
+                      [Recipe Planner]         propose 3 dishes → full recipe JSON
+                            │
+                            ▼
+                      [Nutrition Engine]       per-serving macros (lookup, no hallucination)
+                            │
+                            ▼
+📸 Progress photo ──▶  [Progress Validator]    go / wait / fix verdict
+```
+1. **Snap** your fridge or pantry — the fine-tuned vision model identifies every ingredient.
+2. **Pick** one of three AI-suggested dishes tailored to what you have.
+3. **Cook** step by step with a generated recipe and per-serving nutrition info.
+4. **Check** your progress by uploading a photo of your pan — the model tells you *go*, *wait*, or *fix*.
+---
+## Models
+| Role | Model | Params | Runtime |
+|---|---|---|---|
+| Vision — ingredients + progress validation | `openbmb/MiniCPM-V-4.6` (fine-tuned) | ~4.6B | `transformers` / ZeroGPU |
+| Recipe planner — dishes + recipe JSON | `openbmb/MiniCPM4.1-8B` → [`eldinosaur/cook-with-me-planner-8b`](https://huggingface.co/eldinosaur/cook-with-me-planner-8b) (fine-tuned) | ~8B | Modal (transformers 4.x) |
+| Step illustrator — per-step images | `FLUX.2-klein-9B` (SDXL-Turbo fallback) | ~9B | Modal (L4) |
+**Total: ~21.6B parameters** (≤ 32B cap ✓)
+**Two models are fine-tuned:** the vision model on fridge/pantry photos for ingredient
+detection, and the planner on **2,046 recipe pairs** for reliable recipe-JSON generation.
+The planner and illustrator run on dedicated **Modal** GPU endpoints (the planner needs
+`transformers` 4.x while the vision model needs 5.x, so they live in separate containers).
+---
+## Badges targeted
+| Badge | Status | How |
+|---|---|---|
+| 🎯 Well-Tuned | ✓ | **Two** fine-tuned models on Hub: MiniCPM-V-4.6 (ingredient detection) + MiniCPM4.1-8B (recipe planner, SFT on 2,046 pairs) |
+| 🎨 Off-Brand | ✓ | Custom recipe-card UI with bespoke CSS components (chips, dish cards, step cards, nutrition pills) |
+| 📡 Sharing is Caring | ✓ | Agent traces shared on Hub |
+| 📓 Field Notes | ✓ | Blog post: "Building a closed-loop visual cooking coach" |
+---
+## Architecture highlights
+- **Specialized small models, one pipeline:** a fine-tuned vision model for ingredients/progress, a separately fine-tuned 8B planner for recipe JSON, and a diffusion model for step images — each on the runtime it needs (ZeroGPU + Modal endpoints).
+- **Closed-loop visual validation:** the planner writes the steps → the illustrator renders each step → user cooks → the vision model compares the pan photo and returns *go / wait / fix* — a real agent loop, not a wrapper.
+- **Hallucination-free nutrition:** macros come from a lookup table, not LLM arithmetic.
+- **Robust JSON extraction:** multi-strategy parser handles markdown fences, single quotes, and trailing commas so generation failures degrade gracefully.
+---
+## Track
+**Chapter One — Backyard AI** · "Build something for someone you actually know."
+Submission for the Hugging Face Hackathon · June 5–15, 2026.