CLIPxGPT Captioner - S Model
Image captioning model based on CLIP and GPT-2.
Model Details
- Model Size: S
- Architecture: CLIP Vision Encoder + Mapping Module + GPT-2 Text Decoder
- Dataset: Flickr30k
Usage
See the Kaggle notebook for usage examples.
Files
model.pt: Model checkpoint (state_dict)