Cook_with_a_LLM / README.md
Fred1e4's picture
README: add demo video link (#8)
3b5d6e3

A newer version of the Gradio SDK is available: 6.18.0

Upgrade
metadata
title: Cook With A LLM
emoji: 🍲
colorFrom: red
colorTo: yellow
sdk: gradio
sdk_version: 6.15.2
python_version: '3.12'
app_file: app.py
pinned: false
license: apache-2.0
tags:
  - backyard-ai
  - well-tuned
  - off-brand
  - sharing-is-caring
  - field-notes

🍲 Cook With Me β€” Multimodal Sous-Chef

Snap your fridge. Pick a dish. Cook step by step. Check your progress with a photo.

A closed-loop multimodal cooking assistant built for the Hugging Face Small Models / Big Adventures Hackathon (June 2026).


Contributors

  1. eldinosaur - Carlos CastaΓ±eda Mora
  2. Fred1e4 - Fredin Vazquez

πŸ”— Links


How it works

πŸ“Έ Fridge photo  ──▢  [Vision Agent]          identify ingredients
                            β”‚
                            β–Ό
                      [Recipe Planner]         propose 3 dishes β†’ full recipe JSON
                            β”‚
                            β–Ό
                      [Nutrition Engine]       per-serving macros (lookup, no hallucination)
                            β”‚
                            β–Ό
πŸ“Έ Progress photo ──▢  [Progress Validator]    go / wait / fix verdict
  1. Snap your fridge or pantry β€” the fine-tuned vision model identifies every ingredient.
  2. Pick one of three AI-suggested dishes tailored to what you have.
  3. Cook step by step with a generated recipe and per-serving nutrition info.
  4. Check your progress by uploading a photo of your pan β€” the model tells you go, wait, or fix.

Models

Role Model Params Runtime
Vision β€” ingredients + progress validation openbmb/MiniCPM-V-4.6 (fine-tuned) ~4.6B transformers / ZeroGPU
Recipe planner β€” dishes + recipe JSON openbmb/MiniCPM4.1-8B β†’ eldinosaur/cook-with-me-planner-8b (fine-tuned) ~8B Modal (transformers 4.x)
Step illustrator β€” per-step images FLUX.2-klein-9B (SDXL-Turbo fallback) ~9B Modal (L4)

Total: ~21.6B parameters (≀ 32B cap βœ“)

Two models are fine-tuned: the vision model on fridge/pantry photos for ingredient detection, and the planner on 2,046 recipe pairs for reliable recipe-JSON generation. The planner and illustrator run on dedicated Modal GPU endpoints (the planner needs transformers 4.x while the vision model needs 5.x, so they live in separate containers).


Badges targeted

Badge Status How
🎯 Well-Tuned βœ“ Two fine-tuned models on Hub: MiniCPM-V-4.6 (ingredient detection) + MiniCPM4.1-8B (recipe planner, SFT on 2,046 pairs)
🎨 Off-Brand βœ“ Custom recipe-card UI with bespoke CSS components (chips, dish cards, step cards, nutrition pills)
πŸ“‘ Sharing is Caring βœ“ Agent traces shared on Hub
πŸ““ Field Notes βœ“ Blog post: "Building a closed-loop visual cooking coach"

Architecture highlights

  • Specialized small models, one pipeline: a fine-tuned vision model for ingredients/progress, a separately fine-tuned 8B planner for recipe JSON, and a diffusion model for step images β€” each on the runtime it needs (ZeroGPU + Modal endpoints).
  • Closed-loop visual validation: the planner writes the steps β†’ the illustrator renders each step β†’ user cooks β†’ the vision model compares the pan photo and returns go / wait / fix β€” a real agent loop, not a wrapper.
  • Hallucination-free nutrition: macros come from a lookup table, not LLM arithmetic.
  • Robust JSON extraction: multi-strategy parser handles markdown fences, single quotes, and trailing commas so generation failures degrade gracefully.

Track

Chapter One β€” Backyard AI Β· "Build something for someone you actually know."

Submission for the Hugging Face Hackathon Β· June 5–15, 2026.