--- license: apache-2.0 tags: - vision-language - visual-question-answering - agriculture - pest-detection - blip pipeline_tag: visual-question-answering base_model: Salesforce/blip-vqa-base --- # BLIP-IPVQA (Agricultural Visual Question Answering) This model is a fine-tuned version of **Salesforce/blip-vqa-base** trained on the **Agri-VQA Plus (IPVQA)** dataset. The model answers natural language questions about agricultural images, such as pest identification, crop diseases, and field conditions. --- ## Dataset **Agri-VQA Plus (IPVQA)** - Image–Question–Answer pairs - Agricultural domain (pests, crops, diseases) - Optional image descriptions used during training --- ## Training Details - Base Model: `Salesforce/blip-vqa-base` - Framework: PyTorch + Hugging Face Transformers - GPU: NVIDIA P100 (Kaggle) - Epochs: 3 - Optimizer: AdamW - Learning Rate: 5e-5 - Mixed Precision Training (AMP) - Gradient Accumulation used for memory efficiency --- ## How to Use ```python from transformers import BlipProcessor, BlipForQuestionAnswering from PIL import Image import torch processor = BlipProcessor.from_pretrained("Ananta025/blip-ipvqa-final") model = BlipForQuestionAnswering.from_pretrained("Ananta025/blip-ipvqa-final") image = Image.open("sample.jpg").convert("RGB") question = "What pest is shown in the image?" inputs = processor(image, question, return_tensors="pt") with torch.no_grad(): output = model.generate(**inputs) answer = processor.decode(output[0], skip_special_tokens=True) print(answer)