BLIP-IPVQA (Agricultural Visual Question Answering)

This model is a fine-tuned version of Salesforce/blip-vqa-base trained on the Agri-VQA Plus (IPVQA) dataset.

The model answers natural language questions about agricultural images, such as pest identification, crop diseases, and field conditions.


Dataset

Agri-VQA Plus (IPVQA)

  • Image–Question–Answer pairs
  • Agricultural domain (pests, crops, diseases)
  • Optional image descriptions used during training

Training Details

  • Base Model: Salesforce/blip-vqa-base
  • Framework: PyTorch + Hugging Face Transformers
  • GPU: NVIDIA P100 (Kaggle)
  • Epochs: 3
  • Optimizer: AdamW
  • Learning Rate: 5e-5
  • Mixed Precision Training (AMP)
  • Gradient Accumulation used for memory efficiency

How to Use

from transformers import BlipProcessor, BlipForQuestionAnswering
from PIL import Image
import torch

processor = BlipProcessor.from_pretrained("Ananta025/blip-ipvqa-final")
model = BlipForQuestionAnswering.from_pretrained("Ananta025/blip-ipvqa-final")

image = Image.open("sample.jpg").convert("RGB")
question = "What pest is shown in the image?"

inputs = processor(image, question, return_tensors="pt")

with torch.no_grad():
    output = model.generate(**inputs)

answer = processor.decode(output[0], skip_special_tokens=True)
print(answer)
Downloads last month
57
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ananta025/blip-ipvqa-final

Finetuned
(18)
this model