BLIP-IPVQA (Agricultural Visual Question Answering)
This model is a fine-tuned version of Salesforce/blip-vqa-base trained on the Agri-VQA Plus (IPVQA) dataset.
The model answers natural language questions about agricultural images, such as pest identification, crop diseases, and field conditions.
Dataset
Agri-VQA Plus (IPVQA)
- Image–Question–Answer pairs
- Agricultural domain (pests, crops, diseases)
- Optional image descriptions used during training
Training Details
- Base Model:
Salesforce/blip-vqa-base - Framework: PyTorch + Hugging Face Transformers
- GPU: NVIDIA P100 (Kaggle)
- Epochs: 3
- Optimizer: AdamW
- Learning Rate: 5e-5
- Mixed Precision Training (AMP)
- Gradient Accumulation used for memory efficiency
How to Use
from transformers import BlipProcessor, BlipForQuestionAnswering
from PIL import Image
import torch
processor = BlipProcessor.from_pretrained("Ananta025/blip-ipvqa-final")
model = BlipForQuestionAnswering.from_pretrained("Ananta025/blip-ipvqa-final")
image = Image.open("sample.jpg").convert("RGB")
question = "What pest is shown in the image?"
inputs = processor(image, question, return_tensors="pt")
with torch.no_grad():
output = model.generate(**inputs)
answer = processor.decode(output[0], skip_special_tokens=True)
print(answer)
- Downloads last month
- 57
Model tree for Ananta025/blip-ipvqa-final
Base model
Salesforce/blip-vqa-base