mealgraph / README.md
moazeldegwy's picture
Simplify topology to 3 agents + 2 tools
1933348

A newer version of the Gradio SDK is available: 6.15.1

Upgrade
metadata
title: Nutrition Multi-Agent System
emoji: πŸ₯—
sdk: gradio
sdk_version: 5.50.0
app_file: app.py
license: cc-by-nc-4.0
short_description: Multi-agent nutrition planner with LangGraph + Gemini

πŸ₯— MealGraph β€” Nutrition Multi-Agent System

Live demo: https://huggingface.co/spaces/moazeldegwy/mealgraph

A clinical-nutrition planner built on LangGraph and Gemini 3.x (gemini-pro-latest Β· gemini-flash-latest Β· gemini-flash-lite-latest).

Three agents β€” Coach, MedicalAssessmentAgent, and PlannerAgent β€” sit on top of two safe-by-construction tools (a PuLP linear-program meal solver and a Gemini-grounded web search). Clinical math runs through closed-form Python formulas (Mifflin-St Jeor BMR, ACSM activity multipliers); the LLM interprets the numbers but never recomputes them. The Planner runs a deterministic plan check after the LP solver and self-revises on allergy / calorie / macro violations before returning. The Coach does an LLM-graded self-review (medical flag respect, citation presence, cultural fit) before composing.

                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                β”‚                  Coach                    β”‚
                β”‚  one typed action per turn (LangGraph)    β”‚
                β”‚  + self-review of the Planner's output    β”‚
                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚ call_agent / ask_user
                               β”‚ write_memory / compose_response
                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                β–Ό                             β–Ό
       MedicalAssessment                  Planner
       deterministic math                draft -> LP -> check_plan
       + LLM enrichment                  (≀ 2 internal revisions)
                β”‚                             β”‚
                β–Ό                             β–Ό
         nutrition_formulas        QuantitiesFinder (PuLP)
         (BMI / BMR / TDEE)        WebSearchTool (grounded)

Quick start

pip install -r requirements.txt
python app.py

Open the Gradio UI, paste a Gemini API key, fill in the profile sidebar, and ask for a plan.

The same repository deploys directly as a Hugging Face Space β€” the YAML front-matter above is the Space manifest, and app.py is the auto-detected entry point.

Architecture

Component Role Key file
CoachAgent Orchestrator. Picks one action per turn (call_agent / ask_user / write_memory / compose_response). After the Planner returns, runs an LLM-graded self-review (medical-flag respect, citation presence, cultural fit) and triggers a revision when needed. agents.py
MedicalAssessmentAgent Deterministic clinical math first (full_assessment β†’ BMI / BMR / TDEE / macros), then an LLM step that emits flags / recommendations / evidence. The agent overwrites the LLM's calculations with the deterministic values so the math is exact by construction. agents.py
PlannerAgent Drafts meals, batches nutrition lookups via the grounded WebSearchTool, runs the QuantitiesFinder LP, then runs check_plan (allergy / calorie / macro tolerances) inline. Up to two internal revisions resolve any blocking issue before returning. agents.py
check_plan Deterministic post-LP critic. Allergy β†’ severity high (hard block); calorie Β±3 % / macro Β±5 % β†’ severity medium; disliked food β†’ severity low. Same code path the Planner uses internally and the eval harness asserts against. agents.py
QuantitiesFinder PuLP linear-program meal-quantity solver. Default per-food bounds min = max(20, est Γ— 0.3), max = min(400, est Γ— 2.5) keep the LP from suggesting 1 g of butter or 900 g of broccoli. Estimate-anchor weight is 0.3. tools.py
WebSearchTool Single round-trip wrapper around Gemini's built-in google_search grounding. Returns answer + citations + queries from grounding_metadata. Prompt biases toward USDA / WHO / ADA / EFSA / NICE / FDA / MedlinePlus. tools.py
LongTermMemory SQLite-backed semantic / procedural / episodic tiers. memory.py
Guardrails Prompt-injection sniff, PII redaction, HITL escalation marker (<<HITL:CLINICIAN_REVIEW_REQUIRED>>). guardrails.py
MCP server Exposes QuantitiesFinder and assess_user to Claude Desktop, Cursor, and any MCP-aware client. mcp_server.py
Agent cards A2A capability descriptors (three cards) with an in-process registry. agent_cards.py
Observability LangSmith passthrough + in-process metrics surface. observability.py
Eval harness Three fixture personas; runs offline (no Gemini calls) against check_plan. evals/

Models and rate limits

Three Gemini 3.x rolling aliases, mapped per role. The free-tier RPM / RPD limits below are conservative defaults; override with enable_rate_limiting=False (or pass a paid quota) if you have one.

Alias RPM RPD Default role
gemini-pro-latest 5 100 Coach, Medical, Planner
gemini-flash-latest 10 250 Available for overrides
gemini-flash-lite-latest 15 500 Tools (WebSearch), simulator

Safety guarantees

Guarantee Where it lives
Allergies never appear in the plan Planner's check_plan β€” severity high β†’ hard block β†’ internal revision.
Calorie target hit within Β±3 % Planner's check_plan β€” severity medium β†’ revision.
Each macro hit within Β±5 % Planner's check_plan β€” severity medium β†’ revision.
Medical flags respected Coach's self-review turn (LLM-graded).
Clinical claims carry citations WebSearchTool returns grounding_chunks natively; Coach checks for them in self-review.
Serious cases escalate Medical sets requires_professional_consultation=True; Coach appends <<HITL:CLINICIAN_REVIEW_REQUIRED>>.
No RCE via LLM-generated code No code-from-LLM path exists at all. nutrition_formulas is closed-form Python; QuantitiesFinder is a pure LP.
Deterministic math full_assessment() runs server-side; the Medical agent overwrites whatever the LLM emitted for calculations.

Run the offline eval harness

Three persona fixtures (athlete, diabetic, vegan-budget) exercise the deterministic surface β€” no Gemini calls needed:

python -m evals.runner

Run the test suite

pytest -ra

Coverage: schemas, solver behaviour, safety surface, rate-limit pool, memory tiers, and full Coach ↔ specialist loops via a mock LLM. The post-LP allergy revision and deterministic-calculation overwrite are both unit-tested.


Library usage

The same code runs as a library. Import the mealgraph module, provide API keys, and call a few setup functions.

1. Import

import mealgraph

2. API keys

Provide a list of keys; the system rotates through them and respects each model's RPM / RPD limit. A single key is enough for evaluation.

api_keys = [
    "your_api_key1",
    "your_api_key2",
]

3. (Optional) Model overrides

Override model_name or params per role. Other configuration is fixed in the module.

model_overrides = {
    "main":       {"model_name": "gemini-pro-latest", "params": {"temperature": 0.5}},
    "agents_llm": {"model_name": "gemini-flash-latest", "params": {"max_tokens": 6000}},
}

4. Initialise

mealgraph.create_llm_instances(api_keys, model_overrides, enable_rate_limiting=True)
mealgraph.initialize_tools()
mealgraph.initialize_agents()
mealgraph.setup_workflow()

5. Run

Either interactive mode (collects user data via stdin) or simulation mode (drives one or more synthetic users through a fixed question list):

mealgraph.run(simulate=False)
# or
mealgraph.run(simulate=True, simulated_users=[...])

Behaviour notes

  • User mode output β€” high-level progress lines, one per agent / tool action.
  • Debug mode output β€” raw LLM input / output (or output only), scoped per agent / tool. Enable with mealgraph.debug(level='full', scopes={'agents': ['CoachAgent'], 'tools': ['all']}).
  • API-key pooling β€” the manager rotates keys and (when rate limiting is on) enforces per-model RPM / RPD. Keys that exhaust their daily quota are dropped from the pool until the next UTC day.
  • Interactive mode β€” prompts for profile fields, then accepts free questions; type exit to quit.
  • Simulation mode β€” each entry in simulated_users is a dict with user_profile, medical_history, and questions; the loop drives each user's questions sequentially.
  • Error handling β€” provide at least one API key (else ValueError). Each initialisation function checks that its predecessor has been called.