|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- vision |
|
|
- food-recognition |
|
|
- ingredients |
|
|
- utensils |
|
|
- portion-size |
|
|
- computer-vision |
|
|
- mobile |
|
|
- ug-food-dataset |
|
|
--- |
|
|
|
|
|
# UG Food Detection Model |
|
|
|
|
|
This model identifies food ingredients, utensils, and estimates portion sizes from images. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This Vision Transformer (ViT) model is trained on the UG Food Dataset to recognize: |
|
|
- Food ingredients: Various food items and ingredients |
|
|
- Kitchen utensils: Cooking tools and equipment |
|
|
- Portion sizes: Measurement estimates |
|
|
|
|
|
## Classes |
|
|
The model can identify 40 classes. |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import ViTImageProcessor, ViTForImageClassification |
|
|
from PIL import Image |
|
|
import torch |
|
|
|
|
|
# Load model and processor |
|
|
processor = ViTImageProcessor.from_pretrained("ssevan/ug-food-detector") |
|
|
model = ViTForImageClassification.from_pretrained("ssevan/ug-food-detector") |
|
|
|
|
|
# Process image |
|
|
image = Image.open('food_image.jpg') |
|
|
inputs = processor(image, return_tensors='pt') |
|
|
|
|
|
# Get predictions |
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1) |
|
|
predicted_class_idx = torch.argmax(probabilities, dim=1).item() |
|
|
|
|
|
print(f'Predicted class index: {predicted_class_idx}') |
|
|
``` |
|
|
|
|
|
## Mobile Usage |
|
|
This model is optimized for mobile deployment. |
|
|
|