dashtoon
/

CLIP_aievals

Model card Files Files and versions

CLIP_aievals / README.md

srini-dash's picture

--basic model card

9286e72 verified 2 months ago

|

history blame contribute delete

3.45 kB

	# CLIP_aievals: AI–Generated Image Detector

	This model is a CLIP-based classifier fine-tuned to detect AI-generated images across a wide range of generative models. It is trained using a mixture of real datasets (FFHQ, COCO, ImageNet, AFHQ, etc.) and synthetic datasets from diffusion, GANs, and hybrid architectures.

	## Overview

	`CLIP_aievals` is designed for robust AI-vs-Real detection by leveraging a CLIP Vision Transformer backbone and a lightweight classification head. It is optimized for generalization across unseen generative sources and large-scale evaluation pipelines.

	This repository contains the model weights (`clip_vith14_argus.pt`) and supporting configuration files used for inference.

	---

	# Model Architecture

	### Backbone

	* CLIP ViT-H/14 vision encoder
	* Pretrained on LAION-2B
	* Frozen or partially unfrozen depending on training configuration

	### Classifier Head

	* Two-layer MLP:

	* Input: CLIP image embedding (1024-d)
	* Hidden Layer: 512 with GELU activation
	* Output Layer: 1-unit sigmoid classifier producing probability of AI-generated content

	### Regularization and Calibration

	* Dropout: 0.1
	* Weight decay: 1e-4
	* Temperature calibration performed post-hoc using validation logits
	* Optional threshold tuning using Eval metrics or Unknown-source analysis

	### Training Objective

	* Binary cross-entropy
	* Oversampling and class-balancing for multi-source synthetic datasets

	---

	# Datasets

	The training pipeline uses a mixture of curated datasets:

	### Real Data

	* FFHQ (70k)
	* COCO (160k)
	* ImageNet (90k+)
	* AFHQ v1/v2 (cats, dogs, wildlife)
	* DIV2K
	* OpenImages

	### Fake Data

	* Stable Diffusion (v1.x, v2.x)
	* Latent Diffusion Models
	* StyleGAN3
	* CIPS
	* BigGAN
	* GANformer
	* CycleGAN (horse2zebra, monet2photo)
	* DDPM and DDGAN
	* Face Synthetics
	* Glide
	* Generative Inpainting (partial and full)

	Labels are binary: `0 = real`, `1 = fake`.

	---

	# Performance Summary

	Evaluated on 850k+ mixed-source images:

	* ROC-AUC: 0.764
	* PR-AUC (AI class): 0.612
	* Global FPR (real images): 0.0073
	* Accuracy: 0.693
	* Precision (AI): 0.853
	* Recall (AI): 0.086

	Performance is dataset-dependent: high confidence on many synthetic sources, lower recall on advanced diffusion models exhibiting strong photorealism.

	---

	# Intended Use

	### Primary

	* Detect whether an image is AI-generated
	* Large-scale offline evaluation of generative models
	* Data filtering for dataset curation
	* Quality and authenticity control in multimedia pipelines

	### Secondary

	* Research on generative model detection
	* Cross-model robustness evaluation

	### Not Intended For

	* Legal or forensic verification
	* High-stakes decision systems
	* Per-pixel or localized artifact detection

	---

	# Limitations

	* Lower recall on highly realistic diffusion models.
	* Model can produce false positives on:

	* Overprocessed images
	* Heavy JPEG compression
	* Artistic filters
	* Not calibrated for forensic authenticity analysis.

	---

	# How to Use

	## In Python

	```python
	from src.model import AIImageDetector
	from PIL import Image
	import torch

	model = AIImageDetector(
	clip_model_name="ViT-H-14",
	device="cuda",
	dropout=0.1
	)

	model.load_state_dict(torch.load("clip_vith14_argus.pt", map_location="cpu"))
	model.eval()

	img = Image.open("your_image.jpg")
	prob = model.predict(img) # returns probability of AI generation
	print(prob)
	```