Aloukik21
/

ai-detection-weights

Safetensors

Model card Files Files and versions

xet

Community

Aloukik21 commited on Feb 25

Commit

0709b87

verified ·

1 Parent(s): ab05071

Upload image/Bombek1-siglip-dinov2/README.md with huggingface_hub

Browse files

Files changed (1) hide show

image/Bombek1-siglip-dinov2/README.md +188 -0

image/Bombek1-siglip-dinov2/README.md ADDED Viewed

	@@ -0,0 +1,188 @@

+---
+license: mit
+tags:
+  - image-classification
+  - ai-detection
+  - deepfake-detection
+  - siglip
+  - dinov2
+  - lora
+  - pytorch
+  - quality-agnostic
+datasets:
+  - nebula-9000/OpenFake
+metrics:
+  - accuracy
+  - roc_auc
+pipeline_tag: image-classification
+---
+# AI Image Detector (SigLIP2 + DINOv2 Ensemble)
+A high-accuracy, **quality-agnostic** model for detecting AI-generated images, achieving **0.9997 AUC** on validation and strong cross-dataset generalization.
+## Key Features
+- **Quality-agnostic**: Performs consistently on both pristine and degraded images (JPEG compression, blur, noise)
+- **Dual-encoder architecture**: Combines SigLIP2's semantic understanding with DINOv2's self-supervised features
+- **Efficient fine-tuning**: Uses LoRA adapters (~8M trainable params out of ~740M total)
+- **Production-ready**: Tested on 10+ external datasets
+## Performance
+### Validation Results (OpenFake, 5K images)
+| Metric | Clean Images | Degraded Images | Average |
+|--------|--------------|-----------------|---------|
+| AUC | 0.9998 | 0.9995 | **0.9997** |
+| Accuracy | 99.24% | 98.96% | 99.10% |
+**Quality-agnostic verification**: AUC gap between clean and degraded images is only **0.0003**, confirming robust performance across image quality levels.
+### Cross-Dataset Generalization
+#### Real Image Datasets (Target: Classify as Real)
+| Dataset | Samples | Accuracy | Mean P(AI) |
+|---------|---------|----------|------------|
+| Food-101 | 300 | **100.00%** | 0.032 |
+| COCO 2017 | 300 | 90.67% | 0.135 |
+| Cats vs Dogs | 300 | **99.67%** | 0.036 |
+| Stanford Cars | 300 | 94.67% | 0.110 |
+| Oxford Flowers | 300 | 95.67% | 0.115 |
+| **Average** | — | **96.13%** | — |
+#### AI-Generated Image Datasets (Target: Classify as AI)
+| Dataset | Generator | Samples | Accuracy | Mean P(AI) |
+|---------|-----------|---------|----------|------------|
+| DALL-E 3 | OpenAI | 300 | **100.00%** | 0.993 |
+| Midjourney V6 | Midjourney | 300 | 96.33% | 0.936 |
+| **Average** | — | — | **98.17%** | — |
+#### Mixed Benchmark Datasets
+| Dataset | Samples | Accuracy | AUC | F1 |
+|---------|---------|----------|-----|-----|
+| AI-or-Not | 500 | **96.80%** | **0.9986** | 97.04% |
+**Overall cross-dataset accuracy: 97.15%**
+### Supported AI Generators
+Trained on OpenFake dataset which includes images from 25+ generators:
+- **Diffusion models**: Stable Diffusion (1.5, 2.1, XL, 3.5), Flux (1.0, 1.1 Pro), DALL-E 3, Midjourney (v5, v6), Imagen, Kandinsky
+- **GANs**: StyleGAN, ProGAN, BigGAN
+- **Other**: GPT-Image-1, Firefly, Ideogram, and more
+## Usage
+### Installation
+```bash
+pip install torch torchvision transformers timm peft pillow
+```
+### Quick Start
+```python
+from huggingface_hub import hf_hub_download
+from model import AIImageDetector
+# Download model
+model_path = hf_hub_download(
+    repo_id="Bombek1/ai-image-detector-siglip-dinov2",
+    filename="pytorch_model.pt"
+)
+# Initialize detector
+detector = AIImageDetector(model_path)
+# Predict single image
+result = detector.predict("path/to/image.jpg")
+print(f"Prediction: {result['prediction']}")
+print(f"Confidence: {result['confidence']:.1%}")
+print(f"P(AI): {result['probability']:.4f}")
+```
+### Batch Processing
+```python
+from pathlib import Path
+images = list(Path("./images").glob("*.jpg"))
+for img_path in images:
+    result = detector.predict(img_path)
+    print(f"{img_path.name}: {result['prediction']} ({result['confidence']:.1%})")
+```
+## Model Architecture
+```
+EnsembleAIDetector (~740M parameters, ~8M trainable)
+├── SigLIP2-SO400M-patch14-384 (with LoRA r=32 on q_proj, v_proj)
+│   └── Output: 1152-dim features
+├── DINOv2-Large-patch14 (with LoRA r=32 on qkv)
+│   └── Output: 1024-dim features
+└── ClassificationHead
+    ├── LayerNorm(2176)
+    ├── Linear(2176 → 512) + GELU + Dropout(0.3)
+    ├── Linear(512 → 256) + GELU + Dropout(0.3)
+    └── Linear(256 → 1) → Sigmoid
+```
+## Training Details
+| Parameter | Value |
+|-----------|-------|
+| Dataset | OpenFake (~95K train, 5K val) |
+| Image Size | 392×392 |
+| Epochs | 5 |
+| Batch Size | 16 (effective: 144 with grad accum) |
+| Learning Rate | 2e-4 (head), 5e-5 (LoRA) |
+| Scheduler | Cosine with warmup |
+| LoRA Rank | 32 |
+| LoRA Alpha | 64 |
+| Loss | Focal Loss (γ=2, α=0.25) |
+### Quality-Agnostic Augmentations
+The model is trained with aggressive image degradation to ensure robustness:
+- JPEG compression (quality 30-95)
+- Gaussian blur (σ up to 2.0)
+- Gaussian noise (σ up to 0.05)
+- Resize artifacts (down to 50% then back up)
+- Color jitter, random crops, flips
+## Limitations
+| Limitation | Details |
+|------------|---------|
+| **Low-resolution images** | Performance degrades on images <128×128 (e.g., CIFAKE 32×32 dataset shows ~50% accuracy) |
+| **COCO-style images** | ~9% false positive rate on casual/cluttered real photos |
+| **Artistic macro photography** | Professional studio/macro shots may occasionally trigger false positives (~5%) |
+| **Non-photographic content** | Designed for photographs; screenshots, graphics, and illustrations may not work well |
+## Files
+- `pytorch_model.pt` — Full checkpoint with LoRA weights
+- `model.py` — Inference code with `AIImageDetector` class
+- `config.json` — Model configuration
+## Citation
+```bibtex
+@misc{ai-image-detector-2025,
+  author = {Bombek1},
+  title = {AI Image Detector (SigLIP2 + DINOv2 Ensemble)},
+  year = {2025},
+  publisher = {Hugging Face},
+  url = {https://huggingface.co/Bombek1/ai-image-detector-siglip-dinov2}
+}
+```
+## License
+MIT License