ceyda commited on
Commit
70fef6c
ยท
1 Parent(s): ba2979b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -9
README.md CHANGED
@@ -1,12 +1,12 @@
1
  # Searching Reaction GIFs with CLIP
2
 
3
- ![header gif](/assets/main.gif)
4
 
5
  Reaction GIFs are an integral part of today's communication. They convey complex emotions with many levels, in a short compact format.
6
 
7
  If a picture is worth a thousand words then a GIF is worth more.
8
 
9
- We might even say that the level of complexity and expressiveness goes like this:
10
 
11
  `Emoji < Memes/Image < GIFs`
12
 
@@ -16,13 +16,14 @@ Although we started out with the more ambitious goal of GIF/Image generation we
16
  Which is needed to properly drive a generation model (like VQGAN).
17
  Available CLIP models wouldn't be suitable to use without this finetuning as explained in the challenges below.
18
 
19
- ## Challenges
20
 
21
  Classic (Image,Text) tasks like, image search, caption generation all focus on cases where the text is a description of the image.
22
  This is mainly because large scale datasets available like COCO,WIT happen to be of that format.
23
- So it is interesting to see if models can also capture some more higher level relations
24
- like sentiment-> image. Where there is greater variation on both sides.
25
  We can think of reaction gif/images to be sentiment like, in fact the dataset we use was also gathered for sentiment analysis.
 
26
 
27
  # Dataset
28
 
@@ -54,7 +55,7 @@ This model is `cardiffnlp/twitter-roberta-base` further fine-tuned on emoji clas
54
 
55
  Also tried `vit-base-patch32-384`, `vit-base-patch16-384` for the vision models, but results were inconclusive.
56
 
57
- ### Training Logs
58
 
59
  Training logs can be found [here](https://wandb.ai/cceyda/flax-clip?workspace=user-cceyda)
60
  It was really easy to overfit since it was a tiny dataset. Used early stopping.
@@ -66,7 +67,7 @@ Other parameters:
66
  --warmup_steps="150"
67
  ```
68
 
69
- # Future Potential
70
 
71
  It is possible to generate a very large training set by scraping twitter.(Couldn't do during the event because of twitter rate limit)
72
 
@@ -78,6 +79,31 @@ I will definitely be trying out training a similar model for emoji & meme data.
78
 
79
  Training CLIP is just the first step, if we have a well trained CLIP generation is within reach ๐Ÿš€
80
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
  # TL;DR The task
82
 
83
  Input: Some sentence (like a tweet)
@@ -86,7 +112,7 @@ Output: The most suitable reaction GIF image (Ranking)
86
  Example:
87
  - Input: I miss you
88
  - Output: ![hug](./assets/example_gif.jpg)
89
-
90
  # Demo
91
 
92
- https://huggingface.co/spaces/flax-community/clip-reply-demo
 
1
  # Searching Reaction GIFs with CLIP
2
 
3
+ ![header gif](./assets/main.gif)
4
 
5
  Reaction GIFs are an integral part of today's communication. They convey complex emotions with many levels, in a short compact format.
6
 
7
  If a picture is worth a thousand words then a GIF is worth more.
8
 
9
+ We might even say that the level of complexity and expressiveness increases like:
10
 
11
  `Emoji < Memes/Image < GIFs`
12
 
 
16
  Which is needed to properly drive a generation model (like VQGAN).
17
  Available CLIP models wouldn't be suitable to use without this finetuning as explained in the challenges below.
18
 
19
+ ## ๐Ÿ“ Challenges
20
 
21
  Classic (Image,Text) tasks like, image search, caption generation all focus on cases where the text is a description of the image.
22
  This is mainly because large scale datasets available like COCO,WIT happen to be of that format.
23
+ So it is interesting to see if models can also capture some more higher level relations.
24
+ like sentiment-> image mapping, where there is great variation on both sides.
25
  We can think of reaction gif/images to be sentiment like, in fact the dataset we use was also gathered for sentiment analysis.
26
+ There is no one correct reaction GIF, which also makes evaluation challenging.
27
 
28
  # Dataset
29
 
 
55
 
56
  Also tried `vit-base-patch32-384`, `vit-base-patch16-384` for the vision models, but results were inconclusive.
57
 
58
+ ### ๐Ÿ“ˆ Training Logs
59
 
60
  Training logs can be found [here](https://wandb.ai/cceyda/flax-clip?workspace=user-cceyda)
61
  It was really easy to overfit since it was a tiny dataset. Used early stopping.
 
67
  --warmup_steps="150"
68
  ```
69
 
70
+ # ๐Ÿ’ก Future Potential
71
 
72
  It is possible to generate a very large training set by scraping twitter.(Couldn't do during the event because of twitter rate limit)
73
 
 
79
 
80
  Training CLIP is just the first step, if we have a well trained CLIP generation is within reach ๐Ÿš€
81
 
82
+ # How to use
83
+
84
+ ```py
85
+ from model import FlaxHybridCLIP # see demo
86
+ from transformers import AutoTokenizer, CLIPProcessor
87
+
88
+ model = FlaxHybridCLIP.from_pretrained("ceyda/clip-reply")
89
+ processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
90
+ processor.tokenizer = AutoTokenizer.from_pretrained("cardiffnlp/twitter-roberta-base")
91
+
92
+ def query(image_paths,query_text):
93
+ images = [Image.open(im).convert("RGB") for im in image_paths]
94
+ inputs = processor(text=[query_text], images=images, return_tensors="jax", padding=True)
95
+ inputs["pixel_values"] = jnp.transpose(inputs["pixel_values"], axes=[0, 2, 3, 1])
96
+ outputs = model(**inputs)
97
+ logits_per_image = outputs.logits_per_image.reshape(-1)
98
+ probs = jax.nn.softmax(logits_per_image)
99
+ ```
100
+
101
+ # Created By
102
+
103
+ Ceyda Cinarel [@ceyda](https://huggingface.co/ceyda)
104
+
105
+ Made during the flax community [event](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/7104/58)
106
+
107
  # TL;DR The task
108
 
109
  Input: Some sentence (like a tweet)
 
112
  Example:
113
  - Input: I miss you
114
  - Output: ![hug](./assets/example_gif.jpg)
115
+
116
  # Demo
117
 
118
+ https://huggingface.co/spaces/flax-community/clip-reply-demo