blip-ipvqa-final / README.md
Ananta025's picture
Create README.md
6a7a450 verified
---
license: apache-2.0
tags:
- vision-language
- visual-question-answering
- agriculture
- pest-detection
- blip
pipeline_tag: visual-question-answering
base_model: Salesforce/blip-vqa-base
---
# BLIP-IPVQA (Agricultural Visual Question Answering)
This model is a fine-tuned version of **Salesforce/blip-vqa-base** trained on the **Agri-VQA Plus (IPVQA)** dataset.
The model answers natural language questions about agricultural images, such as pest identification, crop diseases, and field conditions.
---
## Dataset
**Agri-VQA Plus (IPVQA)**
- Image–Question–Answer pairs
- Agricultural domain (pests, crops, diseases)
- Optional image descriptions used during training
---
## Training Details
- Base Model: `Salesforce/blip-vqa-base`
- Framework: PyTorch + Hugging Face Transformers
- GPU: NVIDIA P100 (Kaggle)
- Epochs: 3
- Optimizer: AdamW
- Learning Rate: 5e-5
- Mixed Precision Training (AMP)
- Gradient Accumulation used for memory efficiency
---
## How to Use
```python
from transformers import BlipProcessor, BlipForQuestionAnswering
from PIL import Image
import torch
processor = BlipProcessor.from_pretrained("Ananta025/blip-ipvqa-final")
model = BlipForQuestionAnswering.from_pretrained("Ananta025/blip-ipvqa-final")
image = Image.open("sample.jpg").convert("RGB")
question = "What pest is shown in the image?"
inputs = processor(image, question, return_tensors="pt")
with torch.no_grad():
output = model.generate(**inputs)
answer = processor.decode(output[0], skip_special_tokens=True)
print(answer)