| | --- |
| | license: apache-2.0 |
| | tags: |
| | - vision-language |
| | - visual-question-answering |
| | - agriculture |
| | - pest-detection |
| | - blip |
| | pipeline_tag: visual-question-answering |
| | base_model: Salesforce/blip-vqa-base |
| | --- |
| | |
| | # BLIP-IPVQA (Agricultural Visual Question Answering) |
| |
|
| | This model is a fine-tuned version of **Salesforce/blip-vqa-base** trained on the **Agri-VQA Plus (IPVQA)** dataset. |
| |
|
| | The model answers natural language questions about agricultural images, such as pest identification, crop diseases, and field conditions. |
| |
|
| | --- |
| |
|
| | ## Dataset |
| | **Agri-VQA Plus (IPVQA)** |
| | - Image–Question–Answer pairs |
| | - Agricultural domain (pests, crops, diseases) |
| | - Optional image descriptions used during training |
| |
|
| | --- |
| |
|
| | ## Training Details |
| | - Base Model: `Salesforce/blip-vqa-base` |
| | - Framework: PyTorch + Hugging Face Transformers |
| | - GPU: NVIDIA P100 (Kaggle) |
| | - Epochs: 3 |
| | - Optimizer: AdamW |
| | - Learning Rate: 5e-5 |
| | - Mixed Precision Training (AMP) |
| | - Gradient Accumulation used for memory efficiency |
| |
|
| | --- |
| |
|
| | ## How to Use |
| |
|
| | ```python |
| | from transformers import BlipProcessor, BlipForQuestionAnswering |
| | from PIL import Image |
| | import torch |
| | |
| | processor = BlipProcessor.from_pretrained("Ananta025/blip-ipvqa-final") |
| | model = BlipForQuestionAnswering.from_pretrained("Ananta025/blip-ipvqa-final") |
| | |
| | image = Image.open("sample.jpg").convert("RGB") |
| | question = "What pest is shown in the image?" |
| | |
| | inputs = processor(image, question, return_tensors="pt") |
| | |
| | with torch.no_grad(): |
| | output = model.generate(**inputs) |
| | |
| | answer = processor.decode(output[0], skip_special_tokens=True) |
| | print(answer) |
| | |