# **CLIP_aievals: AI–Generated Image Detector** This model is a CLIP-based classifier fine-tuned to detect AI-generated images across a wide range of generative models. It is trained using a mixture of real datasets (FFHQ, COCO, ImageNet, AFHQ, etc.) and synthetic datasets from diffusion, GANs, and hybrid architectures. ## Overview `CLIP_aievals` is designed for robust AI-vs-Real detection by leveraging a CLIP Vision Transformer backbone and a lightweight classification head. It is optimized for generalization across unseen generative sources and large-scale evaluation pipelines. This repository contains the model weights (`clip_vith14_argus.pt`) and supporting configuration files used for inference. --- # **Model Architecture** ### **Backbone** * CLIP ViT-H/14 vision encoder * Pretrained on LAION-2B * Frozen or partially unfrozen depending on training configuration ### **Classifier Head** * Two-layer MLP: * Input: CLIP image embedding (1024-d) * Hidden Layer: 512 with GELU activation * Output Layer: 1-unit sigmoid classifier producing probability of AI-generated content ### **Regularization and Calibration** * Dropout: 0.1 * Weight decay: 1e-4 * Temperature calibration performed post-hoc using validation logits * Optional threshold tuning using Eval metrics or Unknown-source analysis ### **Training Objective** * Binary cross-entropy * Oversampling and class-balancing for multi-source synthetic datasets --- # **Datasets** The training pipeline uses a mixture of curated datasets: ### **Real Data** * FFHQ (70k) * COCO (160k) * ImageNet (90k+) * AFHQ v1/v2 (cats, dogs, wildlife) * DIV2K * OpenImages ### **Fake Data** * Stable Diffusion (v1.x, v2.x) * Latent Diffusion Models * StyleGAN3 * CIPS * BigGAN * GANformer * CycleGAN (horse2zebra, monet2photo) * DDPM and DDGAN * Face Synthetics * Glide * Generative Inpainting (partial and full) Labels are binary: `0 = real`, `1 = fake`. --- # **Performance Summary** Evaluated on 850k+ mixed-source images: * ROC-AUC: 0.764 * PR-AUC (AI class): 0.612 * Global FPR (real images): 0.0073 * Accuracy: 0.693 * Precision (AI): 0.853 * Recall (AI): 0.086 Performance is dataset-dependent: high confidence on many synthetic sources, lower recall on advanced diffusion models exhibiting strong photorealism. --- # **Intended Use** ### **Primary** * Detect whether an image is AI-generated * Large-scale offline evaluation of generative models * Data filtering for dataset curation * Quality and authenticity control in multimedia pipelines ### **Secondary** * Research on generative model detection * Cross-model robustness evaluation ### **Not Intended For** * Legal or forensic verification * High-stakes decision systems * Per-pixel or localized artifact detection --- # **Limitations** * Lower recall on highly realistic diffusion models. * Model can produce false positives on: * Overprocessed images * Heavy JPEG compression * Artistic filters * Not calibrated for forensic authenticity analysis. --- # **How to Use** ## In Python ```python from src.model import AIImageDetector from PIL import Image import torch model = AIImageDetector( clip_model_name="ViT-H-14", device="cuda", dropout=0.1 ) model.load_state_dict(torch.load("clip_vith14_argus.pt", map_location="cpu")) model.eval() img = Image.open("your_image.jpg") prob = model.predict(img) # returns probability of AI generation print(prob) ```