Add model card

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +51 -0
README.md ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: image-text-to-text
3
+ library_name: transformers
4
+ license: mit
5
+ ---
6
+
7
+ # DiffCLIP: Differential Attention Meets CLIP
8
+
9
+ This repository contains the DiffCLIP model as presented in [DiffCLIP: Differential Attention Meets CLIP](https://huggingface.co/papers/2503.06626).
10
+
11
+ Project Page: https://hammoudhasan.github.io/DiffCLIP
12
+
13
+ Code: https://github.com/hammoudhasan/DiffCLIP
14
+
15
+ ## How to Use
16
+
17
+ ### Installation
18
+
19
+ ```bash
20
+ # Clone the repository
21
+ git clone https://github.com/hammoudhasan/DiffCLIP.git
22
+ cd DiffCLIP
23
+
24
+ # Install dependencies
25
+ pip install -r requirements.txt
26
+ ```
27
+
28
+ ### Basic Usage
29
+
30
+ ```python
31
+ import torch
32
+ from diff_clip import DiffCLIP_VITB16
33
+
34
+ # Create model
35
+ model = DiffCLIP_VITB16()
36
+
37
+ # Process image and text
38
+ image = torch.randn(1, 3, 224, 224)
39
+ text = torch.randint(0, 49408, (1, 77)) # Tokenized text
40
+
41
+ # Get embeddings
42
+ with torch.no_grad():
43
+ outputs = model(image, text)
44
+
45
+ print(outputs["image_embed"].shape) # Should be [1, 512]
46
+ print(outputs["text_embed"].shape) # Should be [1, 512]
47
+ ```
48
+
49
+ ### Zero-Shot Classification
50
+
51
+ You can use the provided `test_models.py` script to perform zero-shot classification. See the [GitHub README](https://github.com/hammoudhasan/DiffCLIP) for details.