Hamza66628 commited on
Commit
8a011f8
·
verified ·
1 Parent(s): df013c2

Add README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -26
README.md CHANGED
@@ -3,42 +3,46 @@ license: mit
3
  tags:
4
  - image-captioning
5
  - clip
6
- - gpt-2
7
- - computer-vision
8
- - nlp
9
- - clipcap
10
  ---
11
 
12
- # CLIP Prefix Caption - Coco Model
13
 
14
- Image captioning model based on CLIP and GPT-2, trained on Coco dataset.
15
 
16
  ## Model Details
17
 
18
- - **Model Type**: CLIP Prefix Captioning
19
- - **Architecture**: CLIP Vision Encoder + MLP Mapping + GPT-2 Text Decoder
20
- - **Dataset**: Coco
21
- - **Prefix Length**: 10 tokens
22
  - **CLIP Model**: ViT-B/32
23
- - **GPT-2 Model**: gpt2
24
 
25
  ## Usage
26
 
27
- See the test notebook for usage examples.
28
-
29
- ## Files
30
-
31
- - `model.pt`: Model checkpoint (state_dict)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
  ## Citation
34
 
35
- If you use this model, please cite:
36
-
37
- ```bibtex
38
- @article{mokady2021clipcap,
39
- title={ClipCap: CLIP Prefix for Image Captioning},
40
- author={Mokady, Ron and Hertz, Amir and Bermano, Amit H},
41
- journal={arXiv preprint arXiv:2111.09734},
42
- year={2021}
43
- }
44
- ```
 
3
  tags:
4
  - image-captioning
5
  - clip
6
+ - gpt2
7
+ - vision-language
 
 
8
  ---
9
 
10
+ # CLIP Prefix Caption Model - COCO
11
 
12
+ This model generates captions for images using CLIP image embeddings and GPT-2 language model.
13
 
14
  ## Model Details
15
 
16
+ - **Model Type**: CLIP Prefix Caption
17
+ - **Dataset**: COCO
18
+ - **Prefix Length**: 10
 
19
  - **CLIP Model**: ViT-B/32
20
+ - **Language Model**: GPT-2
21
 
22
  ## Usage
23
 
24
+ ```python
25
+ from huggingface_hub import hf_hub_download
26
+ import torch
27
+ from transformers import GPT2Tokenizer, GPT2LMHeadModel
28
+ import clip
29
+
30
+ # Load model
31
+ checkpoint_path = hf_hub_download(
32
+ repo_id="Hamza66628/clip-prefix-caption-coco",
33
+ filename="model.pt"
34
+ )
35
+ checkpoint = torch.load(checkpoint_path, map_location="cpu")
36
+
37
+ # Initialize model (use same architecture as training)
38
+ model = ClipCaptionModel(prefix_length=10)
39
+ model.load_state_dict(checkpoint, strict=False)
40
+ model.eval()
41
+
42
+ # Generate caption
43
+ # (See full usage in the notebook)
44
+ ```
45
 
46
  ## Citation
47
 
48
+ If you use this model, please cite the original CLIP Prefix Caption paper.