hammh0a
/

DiffCLIP_ViTB16_CC12M

Model card Files Files and versions

DiffCLIP_ViTB16_CC12M / README.md

nielsr's picture

nielsr HF Staff

Add model card

9708fbc verified 10 months ago

|

1.2 kB

	---
	pipeline_tag: image-text-to-text
	library_name: transformers
	license: mit
	---

	# DiffCLIP: Differential Attention Meets CLIP

	This repository contains the DiffCLIP model as presented in [DiffCLIP: Differential Attention Meets CLIP](https://huggingface.co/papers/2503.06626).

	Project Page: https://hammoudhasan.github.io/DiffCLIP

	Code: https://github.com/hammoudhasan/DiffCLIP

	## How to Use

	### Installation

	```bash
	# Clone the repository
	git clone https://github.com/hammoudhasan/DiffCLIP.git
	cd DiffCLIP

	# Install dependencies
	pip install -r requirements.txt
	```

	### Basic Usage

	```python
	import torch
	from diff_clip import DiffCLIP_VITB16

	# Create model
	model = DiffCLIP_VITB16()

	# Process image and text
	image = torch.randn(1, 3, 224, 224)
	text = torch.randint(0, 49408, (1, 77)) # Tokenized text

	# Get embeddings
	with torch.no_grad():
	outputs = model(image, text)

	print(outputs["image_embed"].shape) # Should be [1, 512]
	print(outputs["text_embed"].shape) # Should be [1, 512]
	```

	### Zero-Shot Classification

	You can use the provided `test_models.py` script to perform zero-shot classification. See the [GitHub README](https://github.com/hammoudhasan/DiffCLIP) for details.