srini-dash commited on
Commit
9286e72
·
verified ·
1 Parent(s): bf43187

--basic model card

Browse files
Files changed (1) hide show
  1. README.md +145 -0
README.md ADDED
@@ -0,0 +1,145 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # **CLIP_aievals: AI–Generated Image Detector**
2
+
3
+ This model is a CLIP-based classifier fine-tuned to detect AI-generated images across a wide range of generative models. It is trained using a mixture of real datasets (FFHQ, COCO, ImageNet, AFHQ, etc.) and synthetic datasets from diffusion, GANs, and hybrid architectures.
4
+
5
+ ## Overview
6
+
7
+ `CLIP_aievals` is designed for robust AI-vs-Real detection by leveraging a CLIP Vision Transformer backbone and a lightweight classification head. It is optimized for generalization across unseen generative sources and large-scale evaluation pipelines.
8
+
9
+ This repository contains the model weights (`clip_vith14_argus.pt`) and supporting configuration files used for inference.
10
+
11
+ ---
12
+
13
+ # **Model Architecture**
14
+
15
+ ### **Backbone**
16
+
17
+ * CLIP ViT-H/14 vision encoder
18
+ * Pretrained on LAION-2B
19
+ * Frozen or partially unfrozen depending on training configuration
20
+
21
+ ### **Classifier Head**
22
+
23
+ * Two-layer MLP:
24
+
25
+ * Input: CLIP image embedding (1024-d)
26
+ * Hidden Layer: 512 with GELU activation
27
+ * Output Layer: 1-unit sigmoid classifier producing probability of AI-generated content
28
+
29
+ ### **Regularization and Calibration**
30
+
31
+ * Dropout: 0.1
32
+ * Weight decay: 1e-4
33
+ * Temperature calibration performed post-hoc using validation logits
34
+ * Optional threshold tuning using Eval metrics or Unknown-source analysis
35
+
36
+ ### **Training Objective**
37
+
38
+ * Binary cross-entropy
39
+ * Oversampling and class-balancing for multi-source synthetic datasets
40
+
41
+ ---
42
+
43
+ # **Datasets**
44
+
45
+ The training pipeline uses a mixture of curated datasets:
46
+
47
+ ### **Real Data**
48
+
49
+ * FFHQ (70k)
50
+ * COCO (160k)
51
+ * ImageNet (90k+)
52
+ * AFHQ v1/v2 (cats, dogs, wildlife)
53
+ * DIV2K
54
+ * OpenImages
55
+
56
+ ### **Fake Data**
57
+
58
+ * Stable Diffusion (v1.x, v2.x)
59
+ * Latent Diffusion Models
60
+ * StyleGAN3
61
+ * CIPS
62
+ * BigGAN
63
+ * GANformer
64
+ * CycleGAN (horse2zebra, monet2photo)
65
+ * DDPM and DDGAN
66
+ * Face Synthetics
67
+ * Glide
68
+ * Generative Inpainting (partial and full)
69
+
70
+ Labels are binary: `0 = real`, `1 = fake`.
71
+
72
+ ---
73
+
74
+ # **Performance Summary**
75
+
76
+ Evaluated on 850k+ mixed-source images:
77
+
78
+ * ROC-AUC: 0.764
79
+ * PR-AUC (AI class): 0.612
80
+ * Global FPR (real images): 0.0073
81
+ * Accuracy: 0.693
82
+ * Precision (AI): 0.853
83
+ * Recall (AI): 0.086
84
+
85
+ Performance is dataset-dependent: high confidence on many synthetic sources, lower recall on advanced diffusion models exhibiting strong photorealism.
86
+
87
+ ---
88
+
89
+ # **Intended Use**
90
+
91
+ ### **Primary**
92
+
93
+ * Detect whether an image is AI-generated
94
+ * Large-scale offline evaluation of generative models
95
+ * Data filtering for dataset curation
96
+ * Quality and authenticity control in multimedia pipelines
97
+
98
+ ### **Secondary**
99
+
100
+ * Research on generative model detection
101
+ * Cross-model robustness evaluation
102
+
103
+ ### **Not Intended For**
104
+
105
+ * Legal or forensic verification
106
+ * High-stakes decision systems
107
+ * Per-pixel or localized artifact detection
108
+
109
+ ---
110
+
111
+ # **Limitations**
112
+
113
+ * Lower recall on highly realistic diffusion models.
114
+ * Model can produce false positives on:
115
+
116
+ * Overprocessed images
117
+ * Heavy JPEG compression
118
+ * Artistic filters
119
+ * Not calibrated for forensic authenticity analysis.
120
+
121
+ ---
122
+
123
+ # **How to Use**
124
+
125
+ ## In Python
126
+
127
+ ```python
128
+ from src.model import AIImageDetector
129
+ from PIL import Image
130
+ import torch
131
+
132
+ model = AIImageDetector(
133
+ clip_model_name="ViT-H-14",
134
+ device="cuda",
135
+ dropout=0.1
136
+ )
137
+
138
+ model.load_state_dict(torch.load("clip_vith14_argus.pt", map_location="cpu"))
139
+ model.eval()
140
+
141
+ img = Image.open("your_image.jpg")
142
+ prob = model.predict(img) # returns probability of AI generation
143
+ print(prob)
144
+ ```
145
+