whis-22/bankai-image

This is a fine-tuned Vision Transformer (ViT) model for food image classification.

Model Details

  • Model type: Vision Transformer (ViT)
  • License: MIT
  • Finetuned from: google/vit-base-patch16-224
  • Dataset: Food-101 (subset)

Intended Uses & Limitations

This model is intended for classifying food images into various food categories.

How to Use

from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import torch

processor = AutoImageProcessor.from_pretrained("whis-22/bankai-image")
model = AutoModelForImageClassification.from_pretrained("whis-22/bankai-image")

# Load and preprocess the image
image = Image.open("path_to_your_image.jpg")
inputs = processor(images=image, return_tensors="pt")

# Get predictions
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    predicted_class_idx = logits.argmax(-1).item()
    
# Get the predicted class
print(f"Predicted class: {model.config.id2label[predicted_class_idx]}")
Downloads last month
4
Safetensors
Model size
85.9M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support