File size: 3,446 Bytes
9286e72
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
# **CLIP_aievals: AI–Generated Image Detector**

This model is a CLIP-based classifier fine-tuned to detect AI-generated images across a wide range of generative models. It is trained using a mixture of real datasets (FFHQ, COCO, ImageNet, AFHQ, etc.) and synthetic datasets from diffusion, GANs, and hybrid architectures.

## Overview

`CLIP_aievals` is designed for robust AI-vs-Real detection by leveraging a CLIP Vision Transformer backbone and a lightweight classification head. It is optimized for generalization across unseen generative sources and large-scale evaluation pipelines.

This repository contains the model weights (`clip_vith14_argus.pt`) and supporting configuration files used for inference.

---

# **Model Architecture**

### **Backbone**

* CLIP ViT-H/14 vision encoder
* Pretrained on LAION-2B
* Frozen or partially unfrozen depending on training configuration

### **Classifier Head**

* Two-layer MLP:

  * Input: CLIP image embedding (1024-d)
  * Hidden Layer: 512 with GELU activation
  * Output Layer: 1-unit sigmoid classifier producing probability of AI-generated content

### **Regularization and Calibration**

* Dropout: 0.1
* Weight decay: 1e-4
* Temperature calibration performed post-hoc using validation logits
* Optional threshold tuning using Eval metrics or Unknown-source analysis

### **Training Objective**

* Binary cross-entropy
* Oversampling and class-balancing for multi-source synthetic datasets

---

# **Datasets**

The training pipeline uses a mixture of curated datasets:

### **Real Data**

* FFHQ (70k)
* COCO (160k)
* ImageNet (90k+)
* AFHQ v1/v2 (cats, dogs, wildlife)
* DIV2K
* OpenImages

### **Fake Data**

* Stable Diffusion (v1.x, v2.x)
* Latent Diffusion Models
* StyleGAN3
* CIPS
* BigGAN
* GANformer
* CycleGAN (horse2zebra, monet2photo)
* DDPM and DDGAN
* Face Synthetics
* Glide
* Generative Inpainting (partial and full)

Labels are binary: `0 = real`, `1 = fake`.

---

# **Performance Summary**

Evaluated on 850k+ mixed-source images:

* ROC-AUC: 0.764
* PR-AUC (AI class): 0.612
* Global FPR (real images): 0.0073
* Accuracy: 0.693
* Precision (AI): 0.853
* Recall (AI): 0.086

Performance is dataset-dependent: high confidence on many synthetic sources, lower recall on advanced diffusion models exhibiting strong photorealism.

---

# **Intended Use**

### **Primary**

* Detect whether an image is AI-generated
* Large-scale offline evaluation of generative models
* Data filtering for dataset curation
* Quality and authenticity control in multimedia pipelines

### **Secondary**

* Research on generative model detection
* Cross-model robustness evaluation

### **Not Intended For**

* Legal or forensic verification
* High-stakes decision systems
* Per-pixel or localized artifact detection

---

# **Limitations**

* Lower recall on highly realistic diffusion models.
* Model can produce false positives on:

  * Overprocessed images
  * Heavy JPEG compression
  * Artistic filters
* Not calibrated for forensic authenticity analysis.

---

# **How to Use**

## In Python

```python
from src.model import AIImageDetector
from PIL import Image
import torch

model = AIImageDetector(
    clip_model_name="ViT-H-14",
    device="cuda",
    dropout=0.1
)

model.load_state_dict(torch.load("clip_vith14_argus.pt", map_location="cpu"))
model.eval()

img = Image.open("your_image.jpg")
prob = model.predict(img)  # returns probability of AI generation
print(prob)
```