---
library_name: transformers
datasets:
- Codatta/MM-Food-100K
language:
- en
base_model:
- Qwen/Qwen2.5-VL-3B-Instruct
pipeline_tag: image-to-text
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->

---

# 🥗 NutriScan-3B (MedGemma Edition)

**NutriScan-3B** is a specialized Vision-Language Model (VLM) designed to analyze food images and output structured nutritional data. Built for the **MedGemma Impact Challenge**, it acts as the intelligent "Vision Layer" for AI health pipelines.

It is fine-tuned on **Qwen2.5-VL-3B-Instruct**, bridging the gap between raw culinary images and medical-grade nutritional analysis.

---

### 🚀 Key Features

* **Food Recognition:** Identifies specific dishes (e.g., "Cheeseburger") rather than generic labels.
* **Ingredient Breakdown:** Detects visible components (e.g., "lentils, cream, cilantro garnish").
* **Structured Output:** Generates clean, parsable **JSON** containing calories, macronutrients, and portion estimates.
* **Efficient:** Optimized for consumer hardware (Runs on T4/RTX 3050) using 4-bit quantization.

---

### 📊 Dataset & Transparency

This model was fine-tuned on the **Codatta/MM-Food-100K** dataset. To ensure high data quality and download reliability during the hackathon, we curated a specific subset:

* **Total Training Images:** **9,281** high-quality samples.
* **Filename Note:** Image filenames (e.g., `food_099996.jpg`) preserve their **original index** from the source dataset.
* *Clarification:* You may see filenames with high numbers (like 99k) despite the dataset size being ~9.2k. This is normal and represents the original Global ID of the image, not a missing file error.


---

### 🐍 Quick Start

You must install the latest transformers libraries to support Qwen2.5-VL.

```bash
pip install git+https://github.com/huggingface/transformers
pip install peft accelerate bitsandbytes qwen-vl-utils

```

#### **Inference Code**

```python
import torch
from PIL import Image
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from peft import PeftModel
from qwen_vl_utils import process_vision_info

# 1. Load Model & Adapter
base_model = "Qwen/Qwen2.5-VL-3B-Instruct"
adapter_model = "HackerAditya56/NutriScan-3B"

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    base_model, torch_dtype=torch.float16, device_map="auto"
)
model = PeftModel.from_pretrained(model, adapter_model)
processor = AutoProcessor.from_pretrained(base_model, min_pixels=256*28*28, max_pixels=1024*28*28)

# 2. Run Analysis
def scan_food(image_path):
    image = Image.open(image_path).convert("RGB")
    
    # We use a specific prompt to force JSON output
    messages = [{
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "You are a nutritionist. Identify this dish, list ingredients, and estimate nutrition in JSON format."}
        ]
    }]
    
    text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    image_inputs, video_inputs = process_vision_info(messages)
    inputs = processor(
        text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt"
    ).to("cuda")

    generated_ids = model.generate(**inputs, max_new_tokens=512)
    return processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

# Test
print(scan_food("my_lunch.jpg"))

```

---

### 📈 Example Output

**Input:** Image of a pepperoni pizza.
**Model Output:**

```json
{
  "dish_name": "Pepperoni Pizza",
  "ingredients": ["pizza dough", "tomato sauce", "mozzarella cheese", "pepperoni slices", "oregano"],
  "nutritional_profile": {
    "calories_per_slice": 280,
    "protein": "12g",
    "fat": "10g",
    "carbs": "35g"
  },
  "health_note": "Contains processed meat and high sodium."
}

```

---

### 🔧 Technical Specs

* **Base Architecture:** Qwen2.5-VL (Vision-Language)
* **Fine-Tuning Method:** QLoRA (Rank 16, Alpha 16)
* **Precision:** 4-bit NF4 (Normal Float 4)
* **Training Hardware:** NVIDIA T4 GPUs (Kaggle)

---

### ⚠️ Disclaimer

**Not Medical Advice.** This AI estimates nutrition based on visual features. It cannot detect hidden ingredients (sugar, salt, oils) or allergens with 100% accuracy. Use for educational and tracking purposes only.

---

### 👨‍💻 Author

**Aditya Nandan** (HackerAditya56)
*Developed for the MedGemma Hackathon 2026*