Image-to-Text
Transformers
PyTorch
Safetensors
vision-encoder-decoder
image-text-to-text
image-captioning
Instructions to use bipin/image-caption-generator with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bipin/image-caption-generator with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="bipin/image-caption-generator")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("bipin/image-caption-generator") model = AutoModelForMultimodalLM.from_pretrained("bipin/image-caption-generator") - Notebooks
- Google Colab
- Kaggle
This model is not working accurately
#4
by humairr313 - opened
Hi @humairr313 , thanks for the heads up!
The model was created to demo the training of an image caption model in my book. The training part doesn't include extensive use of techniques that ensures generalization of the model to any given image. So there are chances that the model may output random captions for certain images. Hope that makes sense :)
bipin changed discussion status to closed
