| --- |
| license: apache-2.0 |
| pipeline_tag: image-classification |
| tags: |
| - computer-vision |
| - image-classification |
| - pytorch |
| library_name: pytorch |
| --- |
| |
| Official repository for the paper "Simplicity Prevails: The Emergence of Generalizable AIGI Detection in Visual Foundation Models"(https://arxiv.org/pdf/2602.01738) |
|
|
| If you have any questions, please feel free to open a discussion in the Community tab. For direct inquiries, you can also reach out to us via email at 2450042008@mails.szu.edu.cn. |
|
|
| # VFM Baselines Release |
|
|
| This directory contains the 7 vision foundation model baselines used in the paper: |
|
|
| - `MetaCLIP-Linear` |
| - `MetaCLIP2-Linear` |
| - `SigLIP-Linear` |
| - `SigLIP2-Linear` |
| - `PE-CLIP-Linear` |
| - `DINOv2-Linear` |
| - `DINOv3-Linear` |
|
|
| ## Contents |
|
|
| - `models.py`: unified model-loading code for all 7 baselines |
| - `test_vfm_baselines.py`: unified evaluation script |
| - `weights/`: released checkpoints |
| - `core/vision_encoder/`: vendored PE vision encoder code required by `PE-CLIP-Linear` |
|
|
| ## Model Names |
|
|
| The unified loader and test script accept these names: |
|
|
| - `metacliplin` |
| - `metaclip2lin` |
| - `sigliplin` |
| - `siglip2lin` |
| - `pelin` |
| - `dinov2lin` |
| - `dinov3lin` |
|
|
| The paper names such as `MetaCLIP-Linear` and `DINOv3-Linear` are also accepted. |
|
|
| ## Usage |
|
|
| Evaluate a single model: |
|
|
| ```bash |
| python test_vfm_baselines.py \ |
| --model sigliplin \ |
| --real-dir /path/to/0_real \ |
| --fake-dir /path/to/1_fake \ |
| --max-samples 100 |
| ``` |
|
|
| Evaluate all 7 models: |
|
|
| ```bash |
| python test_vfm_baselines.py \ |
| --model all \ |
| --real-dir /path/to/0_real \ |
| --fake-dir /path/to/1_fake \ |
| --max-samples 100 |
| ``` |
|
|
| Optional arguments: |
|
|
| - `--checkpoint`: override the default checkpoint for single-model evaluation |
| - `--batch-size`: batch size for evaluation |
| - `--num-workers`: dataloader workers |
| - `--device`: explicit device such as `cuda:0` or `cpu` |
| - `--save-json`: save results to a JSON file |
|
|
| ## Dependencies |
|
|
| The release code expects these Python packages: |
|
|
| - `torch` |
| - `torchvision` |
| - `transformers` |
| - `scikit-learn` |
| - `Pillow` |
| - `timm` |
| - `einops` |
| - `ftfy` |
| - `regex` |
| - `huggingface_hub` |
|
|
| ## Notes |
|
|
| - The clip-family and DINO-family baselines instantiate the backbone from Hugging Face model configs and then load the released checkpoint. |
| - `PE-CLIP-Linear` uses the vendored `core/vision_encoder` code in this directory. |
| - The checkpoints in `weights/` are arranged locally for packaging convenience. For public release, they can be uploaded as the same filenames. |
|
|