YaekobB
/

blip-caption-model

image-text-to-text

image-captioning

vision-language-model

computer-vision

Model card Files Files and versions

YaekobB commited on 23 days ago

Commit

730ded8

·

verified ·

1 Parent(s): 386f5a8

Add model card documentation

Files changed (1) hide show

README.md +96 -0

README.md ADDED Viewed

	@@ -0,0 +1,96 @@

+---
+license: mit
+tags:
+  - image-captioning
+  - blip
+  - vision-language-model
+  - multimodal-ai
+  - computer-vision
+  - deep-learning
+  - transformers
+  - pytorch
+pipeline_tag: image-to-text
+library_name: transformers
+---
+# BLIP Caption Model
+This repository contains a BLIP-based image captioning model used to generate natural-language captions from uploaded images.
+The model is connected to a live Hugging Face Space demo:
+👉 [Multimodal Image Captioning with BLIP Demo](https://huggingface.co/spaces/YaekobB/image-captioning-blip-demo)
+## Model Description
+This model is designed for automatic image captioning. Given an input image, it generates a short textual description of the visual content.
+The project demonstrates the use of vision-language models for multimodal AI applications, combining computer vision and natural language generation.
+## Intended Use
+This model can be used for:
+- Image caption generation
+- Vision-language AI demonstrations
+- Multimodal learning experiments
+- Educational and portfolio projects
+- Prototyping image-to-text applications
+## How to Use
+```python
+from transformers import BlipProcessor, BlipForConditionalGeneration
+from PIL import Image
+import torch
+model_id = "YaekobB/blip-caption-model"
+processor = BlipProcessor.from_pretrained(model_id)
+model = BlipForConditionalGeneration.from_pretrained(model_id)
+image = Image.open("your_image.jpg").convert("RGB")
+inputs = processor(image, return_tensors="pt")
+with torch.no_grad():
+    output = model.generate(**inputs, max_new_tokens=50)
+caption = processor.decode(output[0], skip_special_tokens=True)
+print(caption)
+```
+## Live Demo
+A live inference demo is available on Hugging Face Spaces:
+[https://huggingface.co/spaces/YaekobB/image-captioning-blip-demo](https://huggingface.co/spaces/YaekobB/image-captioning-blip-demo)
+The demo allows users to upload one or more images and generate captions using the model.
+## Limitations
+This model may generate inaccurate or incomplete captions, especially for:
+- Complex scenes with many objects or people
+- Small or unclear objects
+- Low-quality or blurry images
+- Culturally specific contexts
+- Images requiring detailed reasoning or domain expertise
+Generated captions should be treated as model-generated descriptions, not guaranteed factual annotations.
+## Ethical Considerations
+This model should not be used as the sole source of truth for safety-critical, medical, legal, or identity-sensitive decisions.
+It may produce biased, incomplete, or incorrect descriptions depending on the input image and training data limitations.
+## Author
+**Yaekob Beyene Yowhanns**
+M.Sc. Artificial Intelligence and Computer Science
+University of Calabria
+GitHub: [yaekobB](https://github.com/yaekobB)
+Hugging Face: [YaekobB](https://huggingface.co/YaekobB)