Spaces:

build-small-hackathon
/

Cook_with_a_LLM

Running on Zero

App Files Files Community

Cook_with_a_LLM / README.md

Fred1e4

README: add demo video link (#8)

3b5d6e3 about 5 hours ago

preview code

raw

history blame contribute delete

4.42 kB

	---
	title: Cook With A LLM
	emoji: 🍲
	colorFrom: red
	colorTo: yellow
	sdk: gradio
	sdk_version: 6.15.2
	python_version: '3.12'
	app_file: app.py
	pinned: false
	license: apache-2.0
	tags:
	- backyard-ai
	- well-tuned
	- off-brand
	- sharing-is-caring
	- field-notes
	---

	# 🍲 Cook With Me — Multimodal Sous-Chef

	> Snap your fridge. Pick a dish. Cook step by step. Check your progress with a photo.

	A closed-loop multimodal cooking assistant built for the Hugging Face Small Models / Big Adventures Hackathon (June 2026).

	---

	# Contributors

	1. eldinosaur - Carlos Castañeda Mora
	1. Fred1e4 - Fredin Vazquez

	---

	## 🔗 Links

	- 🎥 Demo video: https://youtube.com/shorts/c3PikNvKAjQ
	- 📱 Social post: https://www.instagram.com/fd_albert14/p/DZnz-oaGorr/
	- 🤗 Live Space: https://huggingface.co/spaces/build-small-hackathon/Cook_with_a_LLM
	- 🧠 Fine-tuned planner: https://huggingface.co/eldinosaur/cook-with-me-planner-8b
	- 📊 SFT dataset: https://huggingface.co/datasets/eldinosaur/cook-with-me-recipes-sft

	---

	## How it works

	```
	📸 Fridge photo ──▶ [Vision Agent] identify ingredients
	│
	▼
	[Recipe Planner] propose 3 dishes → full recipe JSON
	│
	▼
	[Nutrition Engine] per-serving macros (lookup, no hallucination)
	│
	▼
	📸 Progress photo ──▶ [Progress Validator] go / wait / fix verdict
	```

	1. Snap your fridge or pantry — the fine-tuned vision model identifies every ingredient.
	2. Pick one of three AI-suggested dishes tailored to what you have.
	3. Cook step by step with a generated recipe and per-serving nutrition info.
	4. Check your progress by uploading a photo of your pan — the model tells you go, wait, or fix.

	---

	## Models

	\| Role \| Model \| Params \| Runtime \|
	\|---\|---\|---\|---\|
	\| Vision — ingredients + progress validation \| `openbmb/MiniCPM-V-4.6` (fine-tuned) \| ~4.6B \| `transformers` / ZeroGPU \|
	\| Recipe planner — dishes + recipe JSON \| `openbmb/MiniCPM4.1-8B` → [`eldinosaur/cook-with-me-planner-8b`](https://huggingface.co/eldinosaur/cook-with-me-planner-8b) (fine-tuned) \| ~8B \| Modal (transformers 4.x) \|
	\| Step illustrator — per-step images \| `FLUX.2-klein-9B` (SDXL-Turbo fallback) \| ~9B \| Modal (L4) \|

	Total: ~21.6B parameters (≤ 32B cap ✓)

	Two models are fine-tuned: the vision model on fridge/pantry photos for ingredient
	detection, and the planner on 2,046 recipe pairs for reliable recipe-JSON generation.
	The planner and illustrator run on dedicated Modal GPU endpoints (the planner needs
	`transformers` 4.x while the vision model needs 5.x, so they live in separate containers).

	---

	## Badges targeted

	\| Badge \| Status \| How \|
	\|---\|---\|---\|
	\| 🎯 Well-Tuned \| ✓ \| Two fine-tuned models on Hub: MiniCPM-V-4.6 (ingredient detection) + MiniCPM4.1-8B (recipe planner, SFT on 2,046 pairs) \|
	\| 🎨 Off-Brand \| ✓ \| Custom recipe-card UI with bespoke CSS components (chips, dish cards, step cards, nutrition pills) \|
	\| 📡 Sharing is Caring \| ✓ \| Agent traces shared on Hub \|
	\| 📓 Field Notes \| ✓ \| Blog post: "Building a closed-loop visual cooking coach" \|

	---

	## Architecture highlights

	- Specialized small models, one pipeline: a fine-tuned vision model for ingredients/progress, a separately fine-tuned 8B planner for recipe JSON, and a diffusion model for step images — each on the runtime it needs (ZeroGPU + Modal endpoints).
	- Closed-loop visual validation: the planner writes the steps → the illustrator renders each step → user cooks → the vision model compares the pan photo and returns go / wait / fix — a real agent loop, not a wrapper.
	- Hallucination-free nutrition: macros come from a lookup table, not LLM arithmetic.
	- Robust JSON extraction: multi-strategy parser handles markdown fences, single quotes, and trailing commas so generation failures degrade gracefully.

	---

	## Track

	Chapter One — Backyard AI · "Build something for someone you actually know."

	Submission for the Hugging Face Hackathon · June 5–15, 2026.