rcfg
/

FashionBLIP-1

Image-Text-to-Text

image-captioning

Model card Files Files and versions

rcfg commited on Feb 16, 2025

Commit

b3e7f12

·

verified ·

1 Parent(s): 1e868ca

Create README.md

Files changed (1) hide show

README.md +37 -0

README.md ADDED Viewed

	@@ -0,0 +1,37 @@

+---
+library_name: transformers
+license: apache-2.0
+tags:
+- vision
+- image-captioning
+- blip
+- multimodal
+- fashion
+datasets:
+- Marqo/fashion200k
+base_model:
+- Salesforce/blip-image-captioning-large
+---
+# Fine-Tuned BLIP Model for Fashion Image Captioning
+This is a fine-tuned BLIP (Bootstrapped Language-Image Pretraining) model specifically designed for **fashion image captioning**. It was fine-tuned on the **Marqo Fashion Dataset** to generate descriptive and contextually relevant captions for fashion-related images.
+## Model Details
+- **Model Type:** BLIP (Vision-Language Pretraining)
+- **Architecture:** BLIP uses a multimodal transformer architecture to jointly model visual and textual information.
+- **Fine-Tuning Dataset:** [Marqo Fashion Dataset](https://github.com/marqo-ai/marqo) (a dataset containing fashion images and corresponding captions)
+- **Task:** Fashion Image Captioning
+- **License:** Apache 2.0
+## Usage
+You can use this model with the Hugging Face `transformers` library for fashion image captioning tasks.
+### Installation
+First, install the required libraries:
+```bash
+pip install transformers torch