CLIPxGPT Captioner - S Model

Image captioning model based on CLIP and GPT-2.

Model Details

  • Model Size: S
  • Architecture: CLIP Vision Encoder + Mapping Module + GPT-2 Text Decoder
  • Dataset: Flickr30k

Usage

See the Kaggle notebook for usage examples.

Files

  • model.pt: Model checkpoint (state_dict)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support