DeForge-AIGIBench / README.md
TheKernel01's picture
Update README.md
16d2e30 verified

A newer version of the Gradio SDK is available: 6.17.3

Upgrade
metadata
title: DeForge AI
emoji: πŸ“Š
colorFrom: yellow
colorTo: gray
sdk: gradio
sdk_version: 6.14.0
python_version: '3.13'
app_file: app.py
pinned: false
short_description: AI image detection benchmark including DeForge-AI.

Is Artificial Intelligence Generated Image Detection a Solved Problem?

Ziqiang Li1, Jiazhen Yan1, Ziwen He1, Kai Zeng2, Weiwei Jiang1, Lizhi Xiong1, Zhangjie Fu1‑

‑Corresponding author

1Nanjing University of Information Science and Technology 2University of Siena

πŸ”₯ News

  • [2025-09-19]πŸŽ‰πŸŽ‰πŸŽ‰ AIGIBench is accepted by NeurIPS 2025 Datasets and Benchmarks.

This repository is the official repository of the AIGIBench.

This is a modified version of the original AIGIBench repository. In addition to the original dataset and methods, it includes my custom detection solutions: DeForge-AI and C2P-DINOv2 (intermediary solution).

This repository contains the AIGIBench dataset and the evaluated methods.

AIGIBench dataset contains two types of training and 25 test subsets. This dataset has the following advantages:

  • Comprehensive generate types: including GAN-based Noise-to-Image Generation, Diffusion for Text-to-Image Generation, GANs for Deepfake, Diffusion for Personalized Generation, and Open-source Platforms.
  • State-of-the-art Generators: MidjourneyV6, Stable Diffusion 3, Imagen, DALLE3, InstantID, FaceSwap, StyleGAN-XL and so on.
  • Completely unknown generation method: Crawl pictures from communities and social media to build datasets CommunityAI & SocialRF, making detection more challenging.

example

If this project helps you, please fork, watch, and give a star to this repository.

πŸ“šDataset

The training set and testing set used in the paper can be downloaded on Huggingface/Baidu Netdisk.

Each folder contains compressed files. After unzip the file, files under the data root directory can be organized as follows.

Train

AIGIBench introduces two training dataset settings: (i) Setting-I: Training on 144K images generated by ProGAN across four object categoriesβ€”car, cat, chair, and horse. (ii) Setting-II: Training on 144K images generated by both SD-v1.4 and ProGAN, covering the same four object categories. The data of ProGAN comes from ForenSynths, and the data of sdv1.4 comes from GenImage. In order to maintain the fairness of the training data, we randomly select the sdv1.4 training images of GenImage to keep the same number as ProGAN, and then merge the data. The file directory is as follows:

β”œβ”€β”€ train
β”‚   β”œβ”€β”€ car
β”‚   β”‚   β”œβ”€β”€ 0_real
β”‚   β”‚   β”œβ”€β”€ 1_fake
β”‚   β”œβ”€β”€ cat
β”‚   β”‚   β”œβ”€β”€ ...
β”‚   β”œβ”€β”€ chair
β”‚   β”‚   β”œβ”€β”€ ...
β”‚   β”œβ”€β”€ horse
β”‚   β”‚   β”œβ”€β”€ ...
β”‚   β”œβ”€β”€ sdv1.4
β”‚   β”‚   β”œβ”€β”€ 0_real
β”‚   β”‚   β”œβ”€β”€ 1_fake
β”œβ”€β”€ val
β”‚   β”œβ”€β”€ ...
β”‚   β”‚   β”œβ”€β”€ 0_real
β”‚   β”‚   β”œβ”€β”€ 1_fake
β”‚   β”‚   ...

Test

AIGIBench comprehensively tests the performance of the detector and builds a test dataset from five perspectives: GAN-based Noise-to-Image Generation, Diffusion for Text-to-Image Generation, GANs for Deepfake, Diffusion for Personalized Generation, and Open-source Platforms. The file directory is as follows:

β”œβ”€β”€ test
β”‚   β”œβ”€β”€ ProGAN
β”‚   β”‚   β”œβ”€β”€ 0_real
β”‚   β”‚   β”œβ”€β”€ 1_fake
β”‚   β”œβ”€β”€ R3GAN
β”‚   β”‚   β”œβ”€β”€ ...
β”‚   β”‚   ...
β”‚   β”œβ”€β”€ BlendFace
β”‚   β”‚   β”œβ”€β”€ 0_real
β”‚   β”‚   β”œβ”€β”€ 1_fake
β”‚   β”œβ”€β”€ InSwap
β”‚   β”‚   β”œβ”€β”€ ...
β”‚   β”‚   ...
β”‚   β”œβ”€β”€ FLUX1-dev
β”‚   β”‚   β”œβ”€β”€ 0_real
β”‚   β”‚   β”œβ”€β”€ 1_fake
β”‚   β”œβ”€β”€ Midjourney-V6
β”‚   β”‚   β”œβ”€β”€ ...
β”‚   β”‚   ...
β”‚   β”œβ”€β”€ BLIP
β”‚   β”‚   β”œβ”€β”€ 0_real
β”‚   β”‚   β”œβ”€β”€ 1_fake
β”‚   β”œβ”€β”€ Infinite-ID
β”‚   β”‚   β”œβ”€β”€ ...
β”‚   β”‚   ...
β”‚   β”œβ”€β”€ CommunityAI
β”‚   β”‚   β”œβ”€β”€ 0_real
β”‚   β”‚   β”œβ”€β”€ 1_fake
β”‚   β”œβ”€β”€ SocialRF
β”‚   β”‚   β”œβ”€β”€ ...

Note: The test set count in the paper contained some errors, which we are correcting here. Please note that the number of real images and generated images are consistent; only the number of generated images is listed below.

Generator Number
CommunityAI 6000
SocialRF 3000
FaceSwap 4000
ImSwap 4000
WFIR 1000

πŸ”Detection Methods

We use the official code for all detection codes and make unified modifications to the input and output. The code we use for training in Setting-II is publicly available above, the corresponding pre-trained checkpoints are publicly available on Huggingface. Of course, if you need the code from the original paper, the following is the corresponding detection code in the paper:

  • ResNet-50: Deep Residual Learning for Image Recognition
  • CNNDetection: CNN-generated images are surprisingly easy to spot...for now
  • GramNet: Global Texture Enhancement for Fake Face Detection in the Wild
  • LGrad: Learning on Gradients: Generalized Artifacts Representation for GAN-Generated Images Detection
  • CLIPDetection: Towards Universal Fake Image Detectors that Generalize Across Generative Models
  • FreqNet: FreqNet: A Frequency-domain Image Super-Resolution Network with Dicrete Cosine Transform
  • NPR: Rethinking the Up-Sampling Operations in CNN-based Generative Network for Generalizable Deepfake Detection
  • DFFreq: Dual Frequency Branch Framework with Reconstructed Sliding Windows Attention for AI-Generated Image Detection
  • LaDeDa: Real-Time Deepfake Detection in the Real-World
  • AIDE: A Sanity Check for AI-generated Image Detection
  • SAFE: Improving Synthetic Image Detection Towards Generalization: An Image Transformation Perspectives
  • Effort: Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection

⏳Detection Results (Continuously updating)

To ensure a fair comparison, we retrain all baseline methods on the Setting-II of AIGIBench.

If your retrained results differ significantly from those shown, please contact us.

Method Paper Ref R.Acc. F.Acc. Acc. A.P.
CNNDetection CNN-generated images are surprisingly easy to spot... for now CVPR 2020 98.2 11.6 54.9 67.0
Gram-Net Global Texture Enhancement for Fake Face Detection In the Wild CVPR 2020 90.5 26.6 58.6 62.4
LGrad Learning on Gradients: Generalized Artifacts Representation for GAN-Generated Images Detection CVPR 2023 85.8 39.6 62.9 66.6
UniFD Towards Universal Fake Image Detectors that Generalize Across Generative Models CVPR 2023 73.3 71.5 72.5 75.6
FreqNet Frequency-Aware Deepfake Detection: Improving Generalizability through Frequency Space Learning AAAI 2024 65.9 66.4 66.2 70.1
NPR Rethinking the Up-Sampling Operations in CNN-based Generative Network for Generalizable Deepfake Detection CVPR 2024 93.8 41.9 67.9 73.9
Ladeda Real-Time Deepfake Detection in the Real-World Arxiv 2024 91.7 54.9 73.4 79.3
DFFreq Dual Frequency Branch Framework with Reconstructed Sliding Windows Attention for AI-Generated Image Detection TIFS 2026 91.8 58.0 75.1 82.2
C2P-CLIP* C2P-CLIP: Injecting Category Common Prompt in CLIP to Enhance Generalization in Deepfake Detection AAAI 2025 93.8 49.8 71.8 82.2
AIDE A Sanity Check for AI-generated Image Detection ICLR 2025 88.1 67.0 77.6 82.7
SAFE Improving Synthetic Image Detection Towards Generalization: An Image Transformation Perspectives KDD 2025 89.0 66.6 78.1 83.6
VIB-Net Towards Universal AI-Generated Image Detection by Variational Information Bottleneck Network CVPR 2025 60.6 78.1 69.3 70.9
$D^3$ $D^3$: Scaling Up Deepfake Detection by Learning from Discrepancy CVPR 2025 81.0 46.4 63.7 68.9
Effort Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection ICML 2025 96.9 57.1 77.1 87.2
FerretNet FerretNet: Efficient Synthetic Image Detection via Local Pixel Dependencies NIPS 2025 96.6 61.8 79.4 85.8
LOTA LOTA: Bit-Planes Guided AI-Generated Image Detection ICCV 2025 89.3 65.1 77.4 83.1
BSF Beyond Semantic Features: Pixel-level Mapping for Generalized AI-Generated Image Detection AAAI 2026 91.5 65.6 78.8 81.1
LTD Layer Consistency Matters: Elegant Latent Transition Discrepancy for Generalizable Synthetic Image Detection CVPR 2026 82.0 67.7 74.9 77.6

For specific reasons, in the following method, we directly utilize the official pre-trained weights for inference.

Method Paper Ref R.Acc. F.Acc. Acc. A.P.
DDA Dual Data Alignment Makes AI-Generated Image Detector Easier Generalizable NIPS 2025 93.9 69.3 81.6 90.2

Citation

@inproceedings{li2025artificial,
  title={Is Artificial Intelligence Generated Image Detection a Solved Problem?},
  author={Li, Ziqiang and Yan, Jiazhen and He, Ziwen and Zeng, Kai and Jiang, Weiwei and Xiong, Lizhi and Fu, Zhangjie},
  booktitle={Advances in Neural Information Processing Systems},
  year={2025}
}

Contact

If you have any question about this project, please feel free to contact 247918horizon@gmail.com