# **CLIP_aievals: AI–Generated Image Detector**

This model is a CLIP-based classifier fine-tuned to detect AI-generated images across a wide range of generative models. It is trained using a mixture of real datasets (FFHQ, COCO, ImageNet, AFHQ, etc.) and synthetic datasets from diffusion, GANs, and hybrid architectures.

## Overview

`CLIP_aievals` is designed for robust AI-vs-Real detection by leveraging a CLIP Vision Transformer backbone and a lightweight classification head. It is optimized for generalization across unseen generative sources and large-scale evaluation pipelines.

This repository contains the model weights (`clip_vith14_argus.pt`) and supporting configuration files used for inference.

---

# **Model Architecture**

### **Backbone**

* CLIP ViT-H/14 vision encoder
* Pretrained on LAION-2B
* Frozen or partially unfrozen depending on training configuration

### **Classifier Head**

* Two-layer MLP:

  * Input: CLIP image embedding (1024-d)
  * Hidden Layer: 512 with GELU activation
  * Output Layer: 1-unit sigmoid classifier producing probability of AI-generated content

### **Regularization and Calibration**

* Dropout: 0.1
* Weight decay: 1e-4
* Temperature calibration performed post-hoc using validation logits
* Optional threshold tuning using Eval metrics or Unknown-source analysis

### **Training Objective**

* Binary cross-entropy
* Oversampling and class-balancing for multi-source synthetic datasets

---

# **Datasets**

The training pipeline uses a mixture of curated datasets:

### **Real Data**

* FFHQ (70k)
* COCO (160k)
* ImageNet (90k+)
* AFHQ v1/v2 (cats, dogs, wildlife)
* DIV2K
* OpenImages

### **Fake Data**

* Stable Diffusion (v1.x, v2.x)
* Latent Diffusion Models
* StyleGAN3
* CIPS
* BigGAN
* GANformer
* CycleGAN (horse2zebra, monet2photo)
* DDPM and DDGAN
* Face Synthetics
* Glide
* Generative Inpainting (partial and full)

Labels are binary: `0 = real`, `1 = fake`.

---

# **Performance Summary**

Evaluated on 850k+ mixed-source images:

* ROC-AUC: 0.764
* PR-AUC (AI class): 0.612
* Global FPR (real images): 0.0073
* Accuracy: 0.693
* Precision (AI): 0.853
* Recall (AI): 0.086

Performance is dataset-dependent: high confidence on many synthetic sources, lower recall on advanced diffusion models exhibiting strong photorealism.

---

# **Intended Use**

### **Primary**

* Detect whether an image is AI-generated
* Large-scale offline evaluation of generative models
* Data filtering for dataset curation
* Quality and authenticity control in multimedia pipelines

### **Secondary**

* Research on generative model detection
* Cross-model robustness evaluation

### **Not Intended For**

* Legal or forensic verification
* High-stakes decision systems
* Per-pixel or localized artifact detection

---

# **Limitations**

* Lower recall on highly realistic diffusion models.
* Model can produce false positives on:

  * Overprocessed images
  * Heavy JPEG compression
  * Artistic filters
* Not calibrated for forensic authenticity analysis.

---

# **How to Use**

## In Python

```python
from src.model import AIImageDetector
from PIL import Image
import torch

model = AIImageDetector(
    clip_model_name="ViT-H-14",
    device="cuda",
    dropout=0.1
)

model.load_state_dict(torch.load("clip_vith14_argus.pt", map_location="cpu"))
model.eval()

img = Image.open("your_image.jpg")
prob = model.predict(img)  # returns probability of AI generation
print(prob)
```