Jayway0910
/

clip-lora-multilabel

Image Classification

multilabel-classification

Model card Files Files and versions

clip-lora-multilabel / README.md

Jayway0910's picture

Upload folder using huggingface_hub

d202995 verified 11 months ago

|

history blame contribute delete

3.47 kB

	---
	license: mit
	tags:
	- vision
	- clip
	- lora
	- multilabel-classification
	- image-classification
	- bitsandbytes
	- 8bit
	---

	# CLIP-ViT-Large LoRA Adapter for Multi-Label Image Classification

	This model is a lightweight multi-label classification model based on [openai/clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14), optimized using LoRA (Low-Rank Adaptation) and 8-bit quantization (via `bitsandbytes`). It is suitable for multi-label classification tasks involving 20 distinct image categories.

	This repo contains only:
	- The LoRA adapter weights (`adapter_model.safetensors`)
	- The classifier head weights (`classifier_head.pt`)
	- A sample loading script in this README

	---

	## 🧠 Model Architecture

	- Backbone: `openai/clip-vit-large-patch14`
	- Quantization: 8-bit (`load_in_8bit=True`)
	- Fine-tuning method: LoRA (r=16, alpha=32) via `peft`
	- Classification head: `LayerNorm → Dropout → Linear(num_labels=20)`

	---

	## 🧪 Training Details

	- LoRA was applied to the attention projection modules: `q_proj`, `k_proj`, `v_proj`, `out_proj`
	- Optimizer: AdamW
	- Loss: Asymmetric Focal Loss (γ⁻=2)
	- Epochs: 2 (grid search on LR and gamma_neg)

	---

	## 📂 Class Labels

	The model supports 20 categories:
	```
	Class 0, Class 1, Class 2, ..., Class 19
	```
	You can replace these with your own label names based on your dataset.

	---

	## 🚀 How to Use

	### 📦 Install dependencies

	```python
	!pip install transformers peft bitsandbytes accelerate
	```

	### 🧩 Load model

	```python
	import torch
	import torch.nn as nn
	from transformers import CLIPModel, BitsAndBytesConfig, CLIPProcessor
	from peft import PeftModel

	class CLIPForMultiLabel(nn.Module):
	def __init__(self, backbone, num_labels=20, dropout=0.1):
	super().__init__()
	self.backbone = backbone
	hidden_size = backbone.config.projection_dim
	self.classifier = nn.Sequential(
	nn.LayerNorm(hidden_size),
	nn.Dropout(dropout),
	nn.Linear(hidden_size, num_labels)
	)

	def forward(self, pixel_values):
	image_feats = self.backbone.get_image_features(pixel_values=pixel_values)
	return self.classifier(image_feats)

	# Load LoRA backbone
	quant_cfg = BitsAndBytesConfig(load_in_8bit=True)
	base = CLIPModel.from_pretrained("openai/clip-vit-large-patch14", quantization_config=quant_cfg)
	backbone = PeftModel.from_pretrained(base, "YOUR_USERNAME/clip-lora-multilabel")

	# Load classifier head
	model = CLIPForMultiLabel(backbone, num_labels=20)
	state_dict = torch.hub.load_state_dict_from_url(
	"https://huggingface.co/YOUR_USERNAME/clip-lora-multilabel/resolve/main/classifier_head.pt",
	map_location="cpu"
	)
	model.classifier.load_state_dict(state_dict)
	model.eval()

	# Load processor
	processor = CLIPProcessor.from_pretrained("openai/clip-vit-large-patch14")
	```

	### 🖼️ Predict on an image

	```python
	from PIL import Image

	image = Image.open("your_image.jpg").convert("RGB")
	inputs = processor(images=image, return_tensors="pt")
	pixel_values = inputs["pixel_values"]

	with torch.no_grad():
	logits = model(pixel_values)
	probs = torch.sigmoid(logits)
	preds = (probs > 0.5).int().cpu().numpy()

	print("Predicted multi-hot vector:", preds)
	```

	---

	## 📜 License

	This model is released under the MIT license.

	---

	## 💬 Citation

	If you use this model in your work, please cite this repository or acknowledge it appropriately.