Instructions to use vidi-deshp/clip-gpt2-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use vidi-deshp/clip-gpt2-finetuned with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("zero-shot-image-classification", model="vidi-deshp/clip-gpt2-finetuned") pipe( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png", candidate_labels=["animals", "humans", "landscape"], )# Load model directly from transformers import AutoProcessor, AutoModelForZeroShotImageClassification processor = AutoProcessor.from_pretrained("vidi-deshp/clip-gpt2-finetuned") model = AutoModelForZeroShotImageClassification.from_pretrained("vidi-deshp/clip-gpt2-finetuned") - Notebooks
- Google Colab
- Kaggle
Fine-Tuned CLIP-GPT2 Model for Image Captioning
This is a fine-tuned version of CLIP-GPT2 for real-time image captioning to aid the visually impaired.
Model Details:
- Base Model: CLIP ViT-B/32
- Fine-Tuned On: VizWiz dataset
- Format: SafeTensors
- Usage:
from transformers import CLIPProcessor, CLIPModel from PIL import Image model = CLIPModel.from_pretrained("vidi-deshp/clip-gpt2-finetuned") processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32") image = Image.open("sample.jpg") inputs = processor(images=image, return_tensors="pt") outputs = model(**inputs)
- Downloads last month
- 4