dashtoon
/

CLIP_aievals

Model card Files Files and versions

xet

Community

srini-dash commited on Nov 25, 2025

Commit

9286e72

verified ·

1 Parent(s): bf43187

--basic model card

Browse files

Files changed (1) hide show

README.md +145 -0

README.md ADDED Viewed

	@@ -0,0 +1,145 @@

+# **CLIP_aievals: AI–Generated Image Detector**
+This model is a CLIP-based classifier fine-tuned to detect AI-generated images across a wide range of generative models. It is trained using a mixture of real datasets (FFHQ, COCO, ImageNet, AFHQ, etc.) and synthetic datasets from diffusion, GANs, and hybrid architectures.
+## Overview
+`CLIP_aievals` is designed for robust AI-vs-Real detection by leveraging a CLIP Vision Transformer backbone and a lightweight classification head. It is optimized for generalization across unseen generative sources and large-scale evaluation pipelines.
+This repository contains the model weights (`clip_vith14_argus.pt`) and supporting configuration files used for inference.
+---
+# **Model Architecture**
+### **Backbone**
+* CLIP ViT-H/14 vision encoder
+* Pretrained on LAION-2B
+* Frozen or partially unfrozen depending on training configuration
+### **Classifier Head**
+* Two-layer MLP:
+  * Input: CLIP image embedding (1024-d)
+  * Hidden Layer: 512 with GELU activation
+  * Output Layer: 1-unit sigmoid classifier producing probability of AI-generated content
+### **Regularization and Calibration**
+* Dropout: 0.1
+* Weight decay: 1e-4
+* Temperature calibration performed post-hoc using validation logits
+* Optional threshold tuning using Eval metrics or Unknown-source analysis
+### **Training Objective**
+* Binary cross-entropy
+* Oversampling and class-balancing for multi-source synthetic datasets
+---
+# **Datasets**
+The training pipeline uses a mixture of curated datasets:
+### **Real Data**
+* FFHQ (70k)
+* COCO (160k)
+* ImageNet (90k+)
+* AFHQ v1/v2 (cats, dogs, wildlife)
+* DIV2K
+* OpenImages
+### **Fake Data**
+* Stable Diffusion (v1.x, v2.x)
+* Latent Diffusion Models
+* StyleGAN3
+* CIPS
+* BigGAN
+* GANformer
+* CycleGAN (horse2zebra, monet2photo)
+* DDPM and DDGAN
+* Face Synthetics
+* Glide
+* Generative Inpainting (partial and full)
+Labels are binary: `0 = real`, `1 = fake`.
+---
+# **Performance Summary**
+Evaluated on 850k+ mixed-source images:
+* ROC-AUC: 0.764
+* PR-AUC (AI class): 0.612
+* Global FPR (real images): 0.0073
+* Accuracy: 0.693
+* Precision (AI): 0.853
+* Recall (AI): 0.086
+Performance is dataset-dependent: high confidence on many synthetic sources, lower recall on advanced diffusion models exhibiting strong photorealism.
+---
+# **Intended Use**
+### **Primary**
+* Detect whether an image is AI-generated
+* Large-scale offline evaluation of generative models
+* Data filtering for dataset curation
+* Quality and authenticity control in multimedia pipelines
+### **Secondary**
+* Research on generative model detection
+* Cross-model robustness evaluation
+### **Not Intended For**
+* Legal or forensic verification
+* High-stakes decision systems
+* Per-pixel or localized artifact detection
+---
+# **Limitations**
+* Lower recall on highly realistic diffusion models.
+* Model can produce false positives on:
+  * Overprocessed images
+  * Heavy JPEG compression
+  * Artistic filters
+* Not calibrated for forensic authenticity analysis.
+---
+# **How to Use**
+## In Python
+```python
+from src.model import AIImageDetector
+from PIL import Image
+import torch
+model = AIImageDetector(
+    clip_model_name="ViT-H-14",
+    device="cuda",
+    dropout=0.1
+)
+model.load_state_dict(torch.load("clip_vith14_argus.pt", map_location="cpu"))
+model.eval()
+img = Image.open("your_image.jpg")
+prob = model.predict(img)  # returns probability of AI generation
+print(prob)
+```