Hamza66628
/

clip-prefix-caption-coco

image-captioning

vision-language

Model card Files Files and versions

Hamza66628 commited on Dec 5, 2025

Commit

8a011f8

·

verified ·

1 Parent(s): df013c2

Add README.md

Files changed (1) hide show

README.md +30 -26

README.md CHANGED Viewed

@@ -3,42 +3,46 @@ license: mit
 tags:
 - image-captioning
 - clip
-- gpt-2
-- computer-vision
-- nlp
-- clipcap
 ---
-# CLIP Prefix Caption - Coco Model
-Image captioning model based on CLIP and GPT-2, trained on Coco dataset.
 ## Model Details
-- **Model Type**: CLIP Prefix Captioning
-- **Architecture**: CLIP Vision Encoder + MLP Mapping + GPT-2 Text Decoder
-- **Dataset**: Coco
-- **Prefix Length**: 10 tokens
 - **CLIP Model**: ViT-B/32
-- **GPT-2 Model**: gpt2
 ## Usage
-See the test notebook for usage examples.
-## Files
-- `model.pt`: Model checkpoint (state_dict)
 ## Citation
-If you use this model, please cite:
-```bibtex
-@article{mokady2021clipcap,
-  title={ClipCap: CLIP Prefix for Image Captioning},
-  author={Mokady, Ron and Hertz, Amir and Bermano, Amit H},
-  journal={arXiv preprint arXiv:2111.09734},
-  year={2021}
-}
-```

 tags:
 - image-captioning
 - clip
+- gpt2
+- vision-language
 ---
+# CLIP Prefix Caption Model - COCO
+This model generates captions for images using CLIP image embeddings and GPT-2 language model.
 ## Model Details
+- **Model Type**: CLIP Prefix Caption
+- **Dataset**: COCO
+- **Prefix Length**: 10
 - **CLIP Model**: ViT-B/32
+- **Language Model**: GPT-2
 ## Usage
+```python
+from huggingface_hub import hf_hub_download
+import torch
+from transformers import GPT2Tokenizer, GPT2LMHeadModel
+import clip
+# Load model
+checkpoint_path = hf_hub_download(
+    repo_id="Hamza66628/clip-prefix-caption-coco",
+    filename="model.pt"
+)
+checkpoint = torch.load(checkpoint_path, map_location="cpu")
+# Initialize model (use same architecture as training)
+model = ClipCaptionModel(prefix_length=10)
+model.load_state_dict(checkpoint, strict=False)
+model.eval()
+# Generate caption
+# (See full usage in the notebook)
+```
 ## Citation
+If you use this model, please cite the original CLIP Prefix Caption paper.