Mouwiya
/

BLIP_image_captioning

image-text-to-text

Model card Files Files and versions

Mouwiya commited on May 24, 2024

Commit

09e8a85

·

verified ·

1 Parent(s): 4e91937

Update README.md

Files changed (1) hide show

README.md +58 -0

README.md CHANGED Viewed

@@ -4,3 +4,61 @@ pipeline_tag: image-to-text
 datasets:
 - Mouwiya/image-in-Words400
 ---

 datasets:
 - Mouwiya/image-in-Words400
 ---
+# BLIP Image Captioning
+## Model Description
+BLIP_image_captioning is a model based on the BLIP (Bootstrapping Language-Image Pre-training) architecture, specifically designed for image captioning tasks. The model has been fine-tuned on the "image-in-words400" dataset, which consists of images and their corresponding descriptive captions. This model leverages both visual and textual data to generate accurate and contextually relevant captions for images.
+## Model Details
+- **Model Architecture**: BLIP (Bootstrapping Language-Image Pre-training)
+- **Base Model**: Salesforce/blip-image-captioning-base
+- **Fine-tuning Dataset**: mouwiya/image-in-words400
+- **Number of Parameters**: 109 million
+## Training Data
+The model was fine-tuned on a shuffled and subsetted version of the **"image-in-words400"** dataset. A total of 400 examples were used during the fine-tuning process to allow for faster iteration and development.
+## Training Procedure
+- **Optimizer**: AdamW
+- **Learning Rate**: 2e-5
+- **Batch Size**: 16
+- **Epochs**: 3
+- **Evaluation Metric**: BLEU Score
+## Usage
+To use this model for image captioning, you can load it using the Hugging Face transformers library and perform inference as shown below:
+```python
+from transformers import BlipProcessor, BlipForConditionalGeneration
+from PIL import Image
+import requests
+from io import BytesIO
+# Load the processor and model
+model_name = "Mouwiya/BLIP_image_captioning"
+processor = BlipProcessor.from_pretrained(model_name)
+model = BlipForConditionalGeneration.from_pretrained(model_name)
+# Example usage
+image_url = "URL_OF_THE_IMAGE"
+response = requests.get(image_url)
+image = Image.open(BytesIO(response.content)).convert("RGB")
+inputs = processor(images=image, return_tensors="pt")
+outputs = model.generate(**inputs)
+caption = processor.decode(outputs[0], skip_special_tokens=True)
+print(caption)
+```
+## Evaluation
+The model was evaluated on a subset of the "image-in-words400" dataset using the BLEU score. The evaluation results are as follows:
+- **Average BLEU Score**: 0.35
+This score indicates the model's ability to generate captions that closely match the reference descriptions in terms of overlapping n-grams.
+## Limitations
+- **Dataset Size**: The model was fine-tuned on a relatively small subset of the dataset, which may limit its generalization capabilities.
+- **Domain-Specific**: This model was trained on a specific dataset and may not perform as well on images from different domains.
+## Contact
+**Mouwiya S. A. Al-Qaisieh**
+mo3awiya@gmail.com