File size: 2,452 Bytes
a74aa88 2023a5e c6b69b8 ce8f665 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 | ---
license: apache-2.0
pipeline_tag: image-classification
tags:
- computer-vision
- image-classification
- pytorch
library_name: pytorch
---
Official repository for the paper "Simplicity Prevails: The Emergence of Generalizable AIGI Detection in Visual Foundation Models"(https://arxiv.org/pdf/2602.01738)
If you have any questions, please feel free to open a discussion in the Community tab. For direct inquiries, you can also reach out to us via email at 2450042008@mails.szu.edu.cn.
# VFM Baselines Release
This directory contains the 7 vision foundation model baselines used in the paper:
- `MetaCLIP-Linear`
- `MetaCLIP2-Linear`
- `SigLIP-Linear`
- `SigLIP2-Linear`
- `PE-CLIP-Linear`
- `DINOv2-Linear`
- `DINOv3-Linear`
## Contents
- `models.py`: unified model-loading code for all 7 baselines
- `test_vfm_baselines.py`: unified evaluation script
- `weights/`: released checkpoints
- `core/vision_encoder/`: vendored PE vision encoder code required by `PE-CLIP-Linear`
## Model Names
The unified loader and test script accept these names:
- `metacliplin`
- `metaclip2lin`
- `sigliplin`
- `siglip2lin`
- `pelin`
- `dinov2lin`
- `dinov3lin`
The paper names such as `MetaCLIP-Linear` and `DINOv3-Linear` are also accepted.
## Usage
Evaluate a single model:
```bash
python test_vfm_baselines.py \
--model sigliplin \
--real-dir /path/to/0_real \
--fake-dir /path/to/1_fake \
--max-samples 100
```
Evaluate all 7 models:
```bash
python test_vfm_baselines.py \
--model all \
--real-dir /path/to/0_real \
--fake-dir /path/to/1_fake \
--max-samples 100
```
Optional arguments:
- `--checkpoint`: override the default checkpoint for single-model evaluation
- `--batch-size`: batch size for evaluation
- `--num-workers`: dataloader workers
- `--device`: explicit device such as `cuda:0` or `cpu`
- `--save-json`: save results to a JSON file
## Dependencies
The release code expects these Python packages:
- `torch`
- `torchvision`
- `transformers`
- `scikit-learn`
- `Pillow`
- `timm`
- `einops`
- `ftfy`
- `regex`
- `huggingface_hub`
## Notes
- The clip-family and DINO-family baselines instantiate the backbone from Hugging Face model configs and then load the released checkpoint.
- `PE-CLIP-Linear` uses the vendored `core/vision_encoder` code in this directory.
- The checkpoints in `weights/` are arranged locally for packaging convenience. For public release, they can be uploaded as the same filenames.
|