Instructions to use Hams1234/blip2-flickr8k-captioning-merged with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Hams1234/blip2-flickr8k-captioning-merged with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("visual-question-answering", model="Hams1234/blip2-flickr8k-captioning-merged")# Load model directly from transformers import AutoProcessor, AutoModelForVisualQuestionAnswering processor = AutoProcessor.from_pretrained("Hams1234/blip2-flickr8k-captioning-merged") model = AutoModelForVisualQuestionAnswering.from_pretrained("Hams1234/blip2-flickr8k-captioning-merged") - PEFT
How to use Hams1234/blip2-flickr8k-captioning-merged with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
Model Card for Model ID
This is a fine-tuned version of Salesforce's BLIP-2 model, adapted for the task of image captioning using the QLoRA methodology for parameter-efficient fine-tuning. The model is trained on the Flickr8k dataset to generate descriptive, human-like captions for a wide variety of images.
Model Details
Model Description
This model is an adaptation of the powerful BLIP-2 vision-language architecture, specifically the Salesforce/blip2-opt-2.7b variant. It has been fine-tuned to specialize in generating accurate and contextually relevant captions for images.
The fine-tuning was performed using QLoRA (Quantized Low-Rank Adaptation), a highly efficient technique that significantly reduces the computational and memory requirements for training. This is achieved by quantizing the base model to 4-bits and then training small, low-rank adapter matrices, leaving the vast majority of the original model's parameters frozen. This approach makes it possible to adapt large-scale models on consumer-grade hardware while preserving high performance.
This is the model card of a 馃 transformers model that has been pushed on the Hub. This model card has been automatically generated.
- Developed by: [Salesforce]
- Model type: [Vision-Language Model (VLM) based on BLIP-2]
- Language(s) (NLP): [English (en)]
- License: [Apache 2.0]
- Finetuned from model [optional]: [Salesforce/blip2-opt-2.7b]
- Downloads last month
- 9