Hamza66628
/

clip-gpt-captioner

image-captioning

computer-vision

Model card Files Files and versions

CLIPxGPT Captioner - S Model

Image captioning model based on CLIP and GPT-2.

Model Details

Model Size: S
Architecture: CLIP Vision Encoder + Mapping Module + GPT-2 Text Decoder
Dataset: Flickr30k

Usage

See the Kaggle notebook for usage examples.

Files

model.pt: Model checkpoint (state_dict)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support