hammh0a
/

DiffCLIP_ViTB16_CC12M

Model card Files Files and versions

Add model card

#1

by nielsr HF Staff - opened Mar 12, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +51 -0

README.md ADDED Viewed

	@@ -0,0 +1,51 @@

+---
+pipeline_tag: image-text-to-text
+library_name: transformers
+license: mit
+---
+# DiffCLIP: Differential Attention Meets CLIP
+This repository contains the DiffCLIP model as presented in [DiffCLIP: Differential Attention Meets CLIP](https://huggingface.co/papers/2503.06626).
+Project Page: https://hammoudhasan.github.io/DiffCLIP
+Code: https://github.com/hammoudhasan/DiffCLIP
+## How to Use
+### Installation
+```bash
+# Clone the repository
+git clone https://github.com/hammoudhasan/DiffCLIP.git
+cd DiffCLIP
+# Install dependencies
+pip install -r requirements.txt
+```
+### Basic Usage
+```python
+import torch
+from diff_clip import DiffCLIP_VITB16
+# Create model
+model = DiffCLIP_VITB16()
+# Process image and text
+image = torch.randn(1, 3, 224, 224)
+text = torch.randint(0, 49408, (1, 77))  # Tokenized text
+# Get embeddings
+with torch.no_grad():
+    outputs = model(image, text)
+print(outputs["image_embed"].shape)  # Should be [1, 512]
+print(outputs["text_embed"].shape)   # Should be [1, 512]
+```
+### Zero-Shot Classification
+You can use the provided `test_models.py` script to perform zero-shot classification. See the [GitHub README](https://github.com/hammoudhasan/DiffCLIP) for details.