PerceptCLIP
/

PerceptCLIP_IQA

computer_vision

perceptual_tasks

Model card Files Files and versions

PerceptCLIP_IQA / README.md

Amitz244's picture

Update README.md

3fb00a4 verified 12 months ago

|

2.6 kB

	---
	language:
	- en
	base_model:
	- openai/clip-vit-large-patch14
	tags:
	- IQA
	- computer_vision
	- perceptual_tasks
	- CLIP
	- KonIQ-10k
	---
	PerceptCLIP-IQA is a model designed to predict image quality assessment (IQA) score. This is the official model from the paper:
	📄 ["Don't Judge Before You CLIP: A Unified Approach for Perceptual Tasks"](https://arxiv.org/abs/2503.13260).
	We apply LoRA adaptation on the CLIP visual encoder and add an MLP head for IQA score prediction. Our model achieves state-of-the-art results.

	## Training Details

	- Dataset: [KonIQ-10k](https://arxiv.org/pdf/1910.06180)
	- Architecture: CLIP Vision Encoder (ViT-L/14) with LoRA adaptation
	- Loss Function: Pearson correlation induced loss
	<img src="https://huggingface.co/PerceptCLIP/PerceptCLIP_IQA/resolve/main/loss_formula.png" width="150" />
	- Optimizer: AdamW
	- Learning Rate: 5e-05
	- Batch Size: 32

	## Installation & Requirements

	You can set up the environment using environment.yml or manually install dependencies:
	- python=3.9.15
	- cudatoolkit=11.7
	- torchvision=0.14.0
	- transformers=4.45.2
	- peft=0.14.0

	## Usage

	To use the model for inference:

	```python
	from torchvision import transforms
	import torch
	from PIL import Image
	from huggingface_hub import hf_hub_download
	import importlib.util

	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

	# Load the model class definition dynamically
	class_path = hf_hub_download(repo_id="PerceptCLIP/PerceptCLIP_IQA", filename="modeling.py")
	spec = importlib.util.spec_from_file_location("modeling", class_path)
	modeling = importlib.util.module_from_spec(spec)
	spec.loader.exec_module(modeling)

	# initialize a model
	ModelClass = modeling.clip_lora_model
	model = ModelClass().to(device)

	# Load pretrained model
	model_path = hf_hub_download(repo_id="PerceptCLIP/PerceptCLIP_IQA", filename="perceptCLIP_IQA.pth")
	model.load_state_dict(torch.load(model_path, map_location=device))
	model.eval()
	# Load an image
	image = Image.open("image_path.jpg").convert("RGB")

	# Preprocess and predict
	def IQA_preprocess():
	transform = transforms.Compose([
	transforms.Resize(224),
	transforms.CenterCrop(size=(224, 224)),
	transforms.ToTensor(),
	transforms.Normalize(mean=(0.48145466, 0.4578275, 0.40821073),
	std=(0.26862954, 0.26130258, 0.27577711))
	])
	return transform

	image = IQA_preprocess()(image).unsqueeze(0).to(device)

	with torch.no_grad():
	iqa_score = model(image).item()

	print(f"Predicted quality Score: {iqa_score:.4f}")