Aloukik21 commited on
Commit
0709b87
Β·
verified Β·
1 Parent(s): ab05071

Upload image/Bombek1-siglip-dinov2/README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. image/Bombek1-siglip-dinov2/README.md +188 -0
image/Bombek1-siglip-dinov2/README.md ADDED
@@ -0,0 +1,188 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - image-classification
5
+ - ai-detection
6
+ - deepfake-detection
7
+ - siglip
8
+ - dinov2
9
+ - lora
10
+ - pytorch
11
+ - quality-agnostic
12
+ datasets:
13
+ - nebula-9000/OpenFake
14
+ metrics:
15
+ - accuracy
16
+ - roc_auc
17
+ pipeline_tag: image-classification
18
+ ---
19
+
20
+ # AI Image Detector (SigLIP2 + DINOv2 Ensemble)
21
+
22
+ A high-accuracy, **quality-agnostic** model for detecting AI-generated images, achieving **0.9997 AUC** on validation and strong cross-dataset generalization.
23
+
24
+ ## Key Features
25
+
26
+ - **Quality-agnostic**: Performs consistently on both pristine and degraded images (JPEG compression, blur, noise)
27
+ - **Dual-encoder architecture**: Combines SigLIP2's semantic understanding with DINOv2's self-supervised features
28
+ - **Efficient fine-tuning**: Uses LoRA adapters (~8M trainable params out of ~740M total)
29
+ - **Production-ready**: Tested on 10+ external datasets
30
+
31
+ ## Performance
32
+
33
+ ### Validation Results (OpenFake, 5K images)
34
+
35
+ | Metric | Clean Images | Degraded Images | Average |
36
+ |--------|--------------|-----------------|---------|
37
+ | AUC | 0.9998 | 0.9995 | **0.9997** |
38
+ | Accuracy | 99.24% | 98.96% | 99.10% |
39
+
40
+ **Quality-agnostic verification**: AUC gap between clean and degraded images is only **0.0003**, confirming robust performance across image quality levels.
41
+
42
+ ### Cross-Dataset Generalization
43
+
44
+ #### Real Image Datasets (Target: Classify as Real)
45
+
46
+ | Dataset | Samples | Accuracy | Mean P(AI) |
47
+ |---------|---------|----------|------------|
48
+ | Food-101 | 300 | **100.00%** | 0.032 |
49
+ | COCO 2017 | 300 | 90.67% | 0.135 |
50
+ | Cats vs Dogs | 300 | **99.67%** | 0.036 |
51
+ | Stanford Cars | 300 | 94.67% | 0.110 |
52
+ | Oxford Flowers | 300 | 95.67% | 0.115 |
53
+ | **Average** | β€” | **96.13%** | β€” |
54
+
55
+ #### AI-Generated Image Datasets (Target: Classify as AI)
56
+
57
+ | Dataset | Generator | Samples | Accuracy | Mean P(AI) |
58
+ |---------|-----------|---------|----------|------------|
59
+ | DALL-E 3 | OpenAI | 300 | **100.00%** | 0.993 |
60
+ | Midjourney V6 | Midjourney | 300 | 96.33% | 0.936 |
61
+ | **Average** | β€” | β€” | **98.17%** | β€” |
62
+
63
+ #### Mixed Benchmark Datasets
64
+
65
+ | Dataset | Samples | Accuracy | AUC | F1 |
66
+ |---------|---------|----------|-----|-----|
67
+ | AI-or-Not | 500 | **96.80%** | **0.9986** | 97.04% |
68
+
69
+ **Overall cross-dataset accuracy: 97.15%**
70
+
71
+ ### Supported AI Generators
72
+
73
+ Trained on OpenFake dataset which includes images from 25+ generators:
74
+
75
+ - **Diffusion models**: Stable Diffusion (1.5, 2.1, XL, 3.5), Flux (1.0, 1.1 Pro), DALL-E 3, Midjourney (v5, v6), Imagen, Kandinsky
76
+ - **GANs**: StyleGAN, ProGAN, BigGAN
77
+ - **Other**: GPT-Image-1, Firefly, Ideogram, and more
78
+
79
+ ## Usage
80
+
81
+ ### Installation
82
+
83
+ ```bash
84
+ pip install torch torchvision transformers timm peft pillow
85
+ ```
86
+
87
+ ### Quick Start
88
+
89
+ ```python
90
+ from huggingface_hub import hf_hub_download
91
+ from model import AIImageDetector
92
+
93
+ # Download model
94
+ model_path = hf_hub_download(
95
+ repo_id="Bombek1/ai-image-detector-siglip-dinov2",
96
+ filename="pytorch_model.pt"
97
+ )
98
+
99
+ # Initialize detector
100
+ detector = AIImageDetector(model_path)
101
+
102
+ # Predict single image
103
+ result = detector.predict("path/to/image.jpg")
104
+ print(f"Prediction: {result['prediction']}")
105
+ print(f"Confidence: {result['confidence']:.1%}")
106
+ print(f"P(AI): {result['probability']:.4f}")
107
+ ```
108
+
109
+ ### Batch Processing
110
+
111
+ ```python
112
+ from pathlib import Path
113
+
114
+ images = list(Path("./images").glob("*.jpg"))
115
+ for img_path in images:
116
+ result = detector.predict(img_path)
117
+ print(f"{img_path.name}: {result['prediction']} ({result['confidence']:.1%})")
118
+ ```
119
+
120
+ ## Model Architecture
121
+
122
+ ```
123
+ EnsembleAIDetector (~740M parameters, ~8M trainable)
124
+ β”œβ”€β”€ SigLIP2-SO400M-patch14-384 (with LoRA r=32 on q_proj, v_proj)
125
+ β”‚ └── Output: 1152-dim features
126
+ β”œβ”€β”€ DINOv2-Large-patch14 (with LoRA r=32 on qkv)
127
+ β”‚ └── Output: 1024-dim features
128
+ └── ClassificationHead
129
+ β”œβ”€β”€ LayerNorm(2176)
130
+ β”œβ”€β”€ Linear(2176 β†’ 512) + GELU + Dropout(0.3)
131
+ β”œβ”€β”€ Linear(512 β†’ 256) + GELU + Dropout(0.3)
132
+ └── Linear(256 β†’ 1) β†’ Sigmoid
133
+ ```
134
+
135
+ ## Training Details
136
+
137
+ | Parameter | Value |
138
+ |-----------|-------|
139
+ | Dataset | OpenFake (~95K train, 5K val) |
140
+ | Image Size | 392Γ—392 |
141
+ | Epochs | 5 |
142
+ | Batch Size | 16 (effective: 144 with grad accum) |
143
+ | Learning Rate | 2e-4 (head), 5e-5 (LoRA) |
144
+ | Scheduler | Cosine with warmup |
145
+ | LoRA Rank | 32 |
146
+ | LoRA Alpha | 64 |
147
+ | Loss | Focal Loss (Ξ³=2, Ξ±=0.25) |
148
+
149
+ ### Quality-Agnostic Augmentations
150
+
151
+ The model is trained with aggressive image degradation to ensure robustness:
152
+
153
+ - JPEG compression (quality 30-95)
154
+ - Gaussian blur (Οƒ up to 2.0)
155
+ - Gaussian noise (Οƒ up to 0.05)
156
+ - Resize artifacts (down to 50% then back up)
157
+ - Color jitter, random crops, flips
158
+
159
+ ## Limitations
160
+
161
+ | Limitation | Details |
162
+ |------------|---------|
163
+ | **Low-resolution images** | Performance degrades on images <128Γ—128 (e.g., CIFAKE 32Γ—32 dataset shows ~50% accuracy) |
164
+ | **COCO-style images** | ~9% false positive rate on casual/cluttered real photos |
165
+ | **Artistic macro photography** | Professional studio/macro shots may occasionally trigger false positives (~5%) |
166
+ | **Non-photographic content** | Designed for photographs; screenshots, graphics, and illustrations may not work well |
167
+
168
+ ## Files
169
+
170
+ - `pytorch_model.pt` β€” Full checkpoint with LoRA weights
171
+ - `model.py` β€” Inference code with `AIImageDetector` class
172
+ - `config.json` β€” Model configuration
173
+
174
+ ## Citation
175
+
176
+ ```bibtex
177
+ @misc{ai-image-detector-2025,
178
+ author = {Bombek1},
179
+ title = {AI Image Detector (SigLIP2 + DINOv2 Ensemble)},
180
+ year = {2025},
181
+ publisher = {Hugging Face},
182
+ url = {https://huggingface.co/Bombek1/ai-image-detector-siglip-dinov2}
183
+ }
184
+ ```
185
+
186
+ ## License
187
+
188
+ MIT License