File size: 3,446 Bytes
9286e72 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
# **CLIP_aievals: AI–Generated Image Detector**
This model is a CLIP-based classifier fine-tuned to detect AI-generated images across a wide range of generative models. It is trained using a mixture of real datasets (FFHQ, COCO, ImageNet, AFHQ, etc.) and synthetic datasets from diffusion, GANs, and hybrid architectures.
## Overview
`CLIP_aievals` is designed for robust AI-vs-Real detection by leveraging a CLIP Vision Transformer backbone and a lightweight classification head. It is optimized for generalization across unseen generative sources and large-scale evaluation pipelines.
This repository contains the model weights (`clip_vith14_argus.pt`) and supporting configuration files used for inference.
---
# **Model Architecture**
### **Backbone**
* CLIP ViT-H/14 vision encoder
* Pretrained on LAION-2B
* Frozen or partially unfrozen depending on training configuration
### **Classifier Head**
* Two-layer MLP:
* Input: CLIP image embedding (1024-d)
* Hidden Layer: 512 with GELU activation
* Output Layer: 1-unit sigmoid classifier producing probability of AI-generated content
### **Regularization and Calibration**
* Dropout: 0.1
* Weight decay: 1e-4
* Temperature calibration performed post-hoc using validation logits
* Optional threshold tuning using Eval metrics or Unknown-source analysis
### **Training Objective**
* Binary cross-entropy
* Oversampling and class-balancing for multi-source synthetic datasets
---
# **Datasets**
The training pipeline uses a mixture of curated datasets:
### **Real Data**
* FFHQ (70k)
* COCO (160k)
* ImageNet (90k+)
* AFHQ v1/v2 (cats, dogs, wildlife)
* DIV2K
* OpenImages
### **Fake Data**
* Stable Diffusion (v1.x, v2.x)
* Latent Diffusion Models
* StyleGAN3
* CIPS
* BigGAN
* GANformer
* CycleGAN (horse2zebra, monet2photo)
* DDPM and DDGAN
* Face Synthetics
* Glide
* Generative Inpainting (partial and full)
Labels are binary: `0 = real`, `1 = fake`.
---
# **Performance Summary**
Evaluated on 850k+ mixed-source images:
* ROC-AUC: 0.764
* PR-AUC (AI class): 0.612
* Global FPR (real images): 0.0073
* Accuracy: 0.693
* Precision (AI): 0.853
* Recall (AI): 0.086
Performance is dataset-dependent: high confidence on many synthetic sources, lower recall on advanced diffusion models exhibiting strong photorealism.
---
# **Intended Use**
### **Primary**
* Detect whether an image is AI-generated
* Large-scale offline evaluation of generative models
* Data filtering for dataset curation
* Quality and authenticity control in multimedia pipelines
### **Secondary**
* Research on generative model detection
* Cross-model robustness evaluation
### **Not Intended For**
* Legal or forensic verification
* High-stakes decision systems
* Per-pixel or localized artifact detection
---
# **Limitations**
* Lower recall on highly realistic diffusion models.
* Model can produce false positives on:
* Overprocessed images
* Heavy JPEG compression
* Artistic filters
* Not calibrated for forensic authenticity analysis.
---
# **How to Use**
## In Python
```python
from src.model import AIImageDetector
from PIL import Image
import torch
model = AIImageDetector(
clip_model_name="ViT-H-14",
device="cuda",
dropout=0.1
)
model.load_state_dict(torch.load("clip_vith14_argus.pt", map_location="cpu"))
model.eval()
img = Image.open("your_image.jpg")
prob = model.predict(img) # returns probability of AI generation
print(prob)
```
|