rcfg
/

FashionBLIP-1

image-captioning

Model card Files Files and versions

FashionBLIP-1 / README.md

rcfg's picture

Create README.md

b3e7f12 verified 11 months ago

|

history blame contribute delete

1.14 kB

	---
	library_name: transformers
	license: apache-2.0
	tags:
	- vision
	- image-captioning
	- blip
	- multimodal
	- fashion
	datasets:
	- Marqo/fashion200k
	base_model:
	- Salesforce/blip-image-captioning-large
	---

	# Fine-Tuned BLIP Model for Fashion Image Captioning

	This is a fine-tuned BLIP (Bootstrapped Language-Image Pretraining) model specifically designed for fashion image captioning. It was fine-tuned on the Marqo Fashion Dataset to generate descriptive and contextually relevant captions for fashion-related images.

	## Model Details

	- Model Type: BLIP (Vision-Language Pretraining)
	- Architecture: BLIP uses a multimodal transformer architecture to jointly model visual and textual information.
	- Fine-Tuning Dataset: [Marqo Fashion Dataset](https://github.com/marqo-ai/marqo) (a dataset containing fashion images and corresponding captions)
	- Task: Fashion Image Captioning
	- License: Apache 2.0

	## Usage

	You can use this model with the Hugging Face `transformers` library for fashion image captioning tasks.

	### Installation

	First, install the required libraries:

	```bash
	pip install transformers torch