--- library_name: transformers datasets: - Codatta/MM-Food-100K language: - en base_model: - Qwen/Qwen2.5-VL-3B-Instruct pipeline_tag: image-to-text --- # Model Card for Model ID --- # 🥗 NutriScan-3B (MedGemma Edition) **NutriScan-3B** is a specialized Vision-Language Model (VLM) designed to analyze food images and output structured nutritional data. Built for the **MedGemma Impact Challenge**, it acts as the intelligent "Vision Layer" for AI health pipelines. It is fine-tuned on **Qwen2.5-VL-3B-Instruct**, bridging the gap between raw culinary images and medical-grade nutritional analysis. --- ### 🚀 Key Features * **Food Recognition:** Identifies specific dishes (e.g., "Cheeseburger") rather than generic labels. * **Ingredient Breakdown:** Detects visible components (e.g., "lentils, cream, cilantro garnish"). * **Structured Output:** Generates clean, parsable **JSON** containing calories, macronutrients, and portion estimates. * **Efficient:** Optimized for consumer hardware (Runs on T4/RTX 3050) using 4-bit quantization. --- ### 📊 Dataset & Transparency This model was fine-tuned on the **Codatta/MM-Food-100K** dataset. To ensure high data quality and download reliability during the hackathon, we curated a specific subset: * **Total Training Images:** **9,281** high-quality samples. * **Filename Note:** Image filenames (e.g., `food_099996.jpg`) preserve their **original index** from the source dataset. * *Clarification:* You may see filenames with high numbers (like 99k) despite the dataset size being ~9.2k. This is normal and represents the original Global ID of the image, not a missing file error. --- ### 🐍 Quick Start You must install the latest transformers libraries to support Qwen2.5-VL. ```bash pip install git+https://github.com/huggingface/transformers pip install peft accelerate bitsandbytes qwen-vl-utils ``` #### **Inference Code** ```python import torch from PIL import Image from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor from peft import PeftModel from qwen_vl_utils import process_vision_info # 1. Load Model & Adapter base_model = "Qwen/Qwen2.5-VL-3B-Instruct" adapter_model = "HackerAditya56/NutriScan-3B" model = Qwen2_5_VLForConditionalGeneration.from_pretrained( base_model, torch_dtype=torch.float16, device_map="auto" ) model = PeftModel.from_pretrained(model, adapter_model) processor = AutoProcessor.from_pretrained(base_model, min_pixels=256*28*28, max_pixels=1024*28*28) # 2. Run Analysis def scan_food(image_path): image = Image.open(image_path).convert("RGB") # We use a specific prompt to force JSON output messages = [{ "role": "user", "content": [ {"type": "image", "image": image}, {"type": "text", "text": "You are a nutritionist. Identify this dish, list ingredients, and estimate nutrition in JSON format."} ] }] text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) image_inputs, video_inputs = process_vision_info(messages) inputs = processor( text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt" ).to("cuda") generated_ids = model.generate(**inputs, max_new_tokens=512) return processor.batch_decode(generated_ids, skip_special_tokens=True)[0] # Test print(scan_food("my_lunch.jpg")) ``` --- ### 📈 Example Output **Input:** Image of a pepperoni pizza. **Model Output:** ```json { "dish_name": "Pepperoni Pizza", "ingredients": ["pizza dough", "tomato sauce", "mozzarella cheese", "pepperoni slices", "oregano"], "nutritional_profile": { "calories_per_slice": 280, "protein": "12g", "fat": "10g", "carbs": "35g" }, "health_note": "Contains processed meat and high sodium." } ``` --- ### 🔧 Technical Specs * **Base Architecture:** Qwen2.5-VL (Vision-Language) * **Fine-Tuning Method:** QLoRA (Rank 16, Alpha 16) * **Precision:** 4-bit NF4 (Normal Float 4) * **Training Hardware:** NVIDIA T4 GPUs (Kaggle) --- ### ⚠️ Disclaimer **Not Medical Advice.** This AI estimates nutrition based on visual features. It cannot detect hidden ingredients (sugar, salt, oils) or allergens with 100% accuracy. Use for educational and tracking purposes only. --- ### 👨‍💻 Author **Aditya Nandan** (HackerAditya56) *Developed for the MedGemma Hackathon 2026*