Lunahera
/

simplicityprevails

Image Classification

computer-vision

Model card Files Files and versions

simplicityprevails / README.md

Lunahera's picture

Update README.md

2023a5e verified 25 days ago

|

history blame contribute delete

2.45 kB

	---
	license: apache-2.0
	pipeline_tag: image-classification
	tags:
	- computer-vision
	- image-classification
	- pytorch
	library_name: pytorch
	---

	Official repository for the paper "Simplicity Prevails: The Emergence of Generalizable AIGI Detection in Visual Foundation Models"(https://arxiv.org/pdf/2602.01738)

	If you have any questions, please feel free to open a discussion in the Community tab. For direct inquiries, you can also reach out to us via email at 2450042008@mails.szu.edu.cn.

	# VFM Baselines Release

	This directory contains the 7 vision foundation model baselines used in the paper:

	- `MetaCLIP-Linear`
	- `MetaCLIP2-Linear`
	- `SigLIP-Linear`
	- `SigLIP2-Linear`
	- `PE-CLIP-Linear`
	- `DINOv2-Linear`
	- `DINOv3-Linear`

	## Contents

	- `models.py`: unified model-loading code for all 7 baselines
	- `test_vfm_baselines.py`: unified evaluation script
	- `weights/`: released checkpoints
	- `core/vision_encoder/`: vendored PE vision encoder code required by `PE-CLIP-Linear`

	## Model Names

	The unified loader and test script accept these names:

	- `metacliplin`
	- `metaclip2lin`
	- `sigliplin`
	- `siglip2lin`
	- `pelin`
	- `dinov2lin`
	- `dinov3lin`

	The paper names such as `MetaCLIP-Linear` and `DINOv3-Linear` are also accepted.

	## Usage

	Evaluate a single model:

	```bash
	python test_vfm_baselines.py \
	--model sigliplin \
	--real-dir /path/to/0_real \
	--fake-dir /path/to/1_fake \
	--max-samples 100
	```

	Evaluate all 7 models:

	```bash
	python test_vfm_baselines.py \
	--model all \
	--real-dir /path/to/0_real \
	--fake-dir /path/to/1_fake \
	--max-samples 100
	```

	Optional arguments:

	- `--checkpoint`: override the default checkpoint for single-model evaluation
	- `--batch-size`: batch size for evaluation
	- `--num-workers`: dataloader workers
	- `--device`: explicit device such as `cuda:0` or `cpu`
	- `--save-json`: save results to a JSON file

	## Dependencies

	The release code expects these Python packages:

	- `torch`
	- `torchvision`
	- `transformers`
	- `scikit-learn`
	- `Pillow`
	- `timm`
	- `einops`
	- `ftfy`
	- `regex`
	- `huggingface_hub`

	## Notes

	- The clip-family and DINO-family baselines instantiate the backbone from Hugging Face model configs and then load the released checkpoint.
	- `PE-CLIP-Linear` uses the vendored `core/vision_encoder` code in this directory.
	- The checkpoints in `weights/` are arranged locally for packaging convenience. For public release, they can be uploaded as the same filenames.