README: add hackathon tags + links section, correct to 3-model architecture

#7
by eldinosaur - opened
Files changed (1) hide show
  1. README.md +108 -86
README.md CHANGED
@@ -1,86 +1,108 @@
1
- ---
2
- title: Cook With A LLM
3
- emoji: 🍲
4
- colorFrom: red
5
- colorTo: yellow
6
- sdk: gradio
7
- sdk_version: 6.15.2
8
- python_version: '3.12'
9
- app_file: app.py
10
- pinned: false
11
- license: apache-2.0
12
- ---
13
-
14
- # 🍲 Cook With Me β€” Multimodal Sous-Chef
15
-
16
- > *Snap your fridge. Pick a dish. Cook step by step. Check your progress with a photo.*
17
-
18
- A closed-loop multimodal cooking assistant built for the **Hugging Face Small Models / Big Adventures Hackathon (June 2026)**.
19
-
20
- ---
21
-
22
- # Contributors
23
-
24
- 1. **eldinosaur** - Carlos CastaΓ±eda Mora
25
- 1. **Fred1e4** - Fredin Vazquez
26
- ---
27
-
28
- ## How it works
29
-
30
- ```
31
- πŸ“Έ Fridge photo ──▢ [Vision Agent] identify ingredients
32
- β”‚
33
- β–Ό
34
- [Recipe Planner] propose 3 dishes β†’ full recipe JSON
35
- β”‚
36
- β–Ό
37
- [Nutrition Engine] per-serving macros (lookup, no hallucination)
38
- β”‚
39
- β–Ό
40
- πŸ“Έ Progress photo ──▢ [Progress Validator] go / wait / fix verdict
41
- ```
42
-
43
- 1. **Snap** your fridge or pantry β€” the fine-tuned vision model identifies every ingredient.
44
- 2. **Pick** one of three AI-suggested dishes tailored to what you have.
45
- 3. **Cook** step by step with a generated recipe and per-serving nutrition info.
46
- 4. **Check** your progress by uploading a photo of your pan β€” the model tells you *go*, *wait*, or *fix*.
47
-
48
- ---
49
-
50
- ## Models
51
-
52
- | Role | Model | Params | Runtime |
53
- |---|---|---|---|
54
- | Vision + Planner + Validator | `openbmb/MiniCPM-V-4.6` (fine-tuned) | ~4.6B | `transformers` / ZeroGPU |
55
-
56
- **Total: ~4.6B parameters** (≀ 32B cap βœ“ β€” significant headroom)
57
-
58
- The ingredient-identification model is **fine-tuned** on fridge/pantry photos for higher precision.
59
-
60
- ---
61
-
62
- ## Badges targeted
63
-
64
- | Badge | Status | How |
65
- |---|---|---|
66
- | 🎯 Well-Tuned | βœ“ | Fine-tuned MiniCPM-V-4.6 for ingredient detection, published to Hub |
67
- | 🎨 Off-Brand | βœ“ | Recipe-card UI with custom CSS β€” Lora serif, warm parchment palette |
68
- | πŸ“‘ Sharing is Caring | βœ“ | Agent traces shared on Hub |
69
- | πŸ““ Field Notes | βœ“ | Blog post: "Building a closed-loop visual cooking coach" |
70
-
71
- ---
72
-
73
- ## Architecture highlights
74
-
75
- - **Single model, three roles:** MiniCPM-V-4.6 handles vision (ingredients + progress) *and* text planning (recipe JSON generation) β€” no redundant model downloads.
76
- - **Closed-loop visual validation:** Flux generates step targets β†’ user cooks β†’ vision model compares β€” a real agent loop, not a wrapper.
77
- - **Hallucination-free nutrition:** macros come from a lookup table, not LLM arithmetic.
78
- - **Robust JSON extraction:** multi-strategy parser handles markdown fences, single quotes, and trailing commas so generation failures degrade gracefully.
79
-
80
- ---
81
-
82
- ## Track
83
-
84
- **Chapter One β€” Backyard AI** Β· "Build something for someone you actually know."
85
-
86
- Submission for the Hugging Face Hackathon Β· June 5–15, 2026.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Cook With A LLM
3
+ emoji: 🍲
4
+ colorFrom: red
5
+ colorTo: yellow
6
+ sdk: gradio
7
+ sdk_version: 6.15.2
8
+ python_version: '3.12'
9
+ app_file: app.py
10
+ pinned: false
11
+ license: apache-2.0
12
+ tags:
13
+ - backyard-ai
14
+ - well-tuned
15
+ - off-brand
16
+ - sharing-is-caring
17
+ - field-notes
18
+ ---
19
+
20
+ # 🍲 Cook With Me β€” Multimodal Sous-Chef
21
+
22
+ > *Snap your fridge. Pick a dish. Cook step by step. Check your progress with a photo.*
23
+
24
+ A closed-loop multimodal cooking assistant built for the **Hugging Face Small Models / Big Adventures Hackathon (June 2026)**.
25
+
26
+ ---
27
+
28
+ # Contributors
29
+
30
+ 1. **eldinosaur** - Carlos CastaΓ±eda Mora
31
+ 1. **Fred1e4** - Fredin Vazquez
32
+
33
+ ---
34
+
35
+ ## πŸ”— Links
36
+
37
+ - πŸŽ₯ **Demo video:** <!-- TODO: replace with your YouTube/public video URL --> `[ADD DEMO VIDEO URL]`
38
+ - πŸ“± **Social post:** https://www.instagram.com/fd_albert14/p/DZnz-oaGorr/
39
+ - πŸ€— **Live Space:** https://huggingface.co/spaces/build-small-hackathon/Cook_with_a_LLM
40
+ - 🧠 **Fine-tuned planner:** https://huggingface.co/eldinosaur/cook-with-me-planner-8b
41
+ - πŸ“Š **SFT dataset:** https://huggingface.co/datasets/eldinosaur/cook-with-me-recipes-sft
42
+
43
+ ---
44
+
45
+ ## How it works
46
+
47
+ ```
48
+ πŸ“Έ Fridge photo ──▢ [Vision Agent] identify ingredients
49
+ β”‚
50
+ β–Ό
51
+ [Recipe Planner] propose 3 dishes β†’ full recipe JSON
52
+ β”‚
53
+ β–Ό
54
+ [Nutrition Engine] per-serving macros (lookup, no hallucination)
55
+ β”‚
56
+ β–Ό
57
+ πŸ“Έ Progress photo ──▢ [Progress Validator] go / wait / fix verdict
58
+ ```
59
+
60
+ 1. **Snap** your fridge or pantry β€” the fine-tuned vision model identifies every ingredient.
61
+ 2. **Pick** one of three AI-suggested dishes tailored to what you have.
62
+ 3. **Cook** step by step with a generated recipe and per-serving nutrition info.
63
+ 4. **Check** your progress by uploading a photo of your pan β€” the model tells you *go*, *wait*, or *fix*.
64
+
65
+ ---
66
+
67
+ ## Models
68
+
69
+ | Role | Model | Params | Runtime |
70
+ |---|---|---|---|
71
+ | Vision β€” ingredients + progress validation | `openbmb/MiniCPM-V-4.6` (fine-tuned) | ~4.6B | `transformers` / ZeroGPU |
72
+ | Recipe planner β€” dishes + recipe JSON | `openbmb/MiniCPM4.1-8B` β†’ [`eldinosaur/cook-with-me-planner-8b`](https://huggingface.co/eldinosaur/cook-with-me-planner-8b) (fine-tuned) | ~8B | Modal (transformers 4.x) |
73
+ | Step illustrator β€” per-step images | `FLUX.2-klein-9B` (SDXL-Turbo fallback) | ~9B | Modal (L4) |
74
+
75
+ **Total: ~21.6B parameters** (≀ 32B cap βœ“)
76
+
77
+ **Two models are fine-tuned:** the vision model on fridge/pantry photos for ingredient
78
+ detection, and the planner on **2,046 recipe pairs** for reliable recipe-JSON generation.
79
+ The planner and illustrator run on dedicated **Modal** GPU endpoints (the planner needs
80
+ `transformers` 4.x while the vision model needs 5.x, so they live in separate containers).
81
+
82
+ ---
83
+
84
+ ## Badges targeted
85
+
86
+ | Badge | Status | How |
87
+ |---|---|---|
88
+ | 🎯 Well-Tuned | βœ“ | **Two** fine-tuned models on Hub: MiniCPM-V-4.6 (ingredient detection) + MiniCPM4.1-8B (recipe planner, SFT on 2,046 pairs) |
89
+ | 🎨 Off-Brand | βœ“ | Custom recipe-card UI with bespoke CSS components (chips, dish cards, step cards, nutrition pills) |
90
+ | πŸ“‘ Sharing is Caring | βœ“ | Agent traces shared on Hub |
91
+ | πŸ““ Field Notes | βœ“ | Blog post: "Building a closed-loop visual cooking coach" |
92
+
93
+ ---
94
+
95
+ ## Architecture highlights
96
+
97
+ - **Specialized small models, one pipeline:** a fine-tuned vision model for ingredients/progress, a separately fine-tuned 8B planner for recipe JSON, and a diffusion model for step images β€” each on the runtime it needs (ZeroGPU + Modal endpoints).
98
+ - **Closed-loop visual validation:** the planner writes the steps β†’ the illustrator renders each step β†’ user cooks β†’ the vision model compares the pan photo and returns *go / wait / fix* β€” a real agent loop, not a wrapper.
99
+ - **Hallucination-free nutrition:** macros come from a lookup table, not LLM arithmetic.
100
+ - **Robust JSON extraction:** multi-strategy parser handles markdown fences, single quotes, and trailing commas so generation failures degrade gracefully.
101
+
102
+ ---
103
+
104
+ ## Track
105
+
106
+ **Chapter One β€” Backyard AI** Β· "Build something for someone you actually know."
107
+
108
+ Submission for the Hugging Face Hackathon Β· June 5–15, 2026.