Upload folder using huggingface_hub

Browse files

Files changed (7) hide show

.gitattributes +2 -0
README.md +262 -0
config.json +24 -0
confusion_matrix.png +3 -0
model.safetensors +3 -0
training_curves.png +3 -0
training_results.json +160 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+confusion_matrix.png filter=lfs diff=lfs merge=lfs -text
+training_curves.png filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,262 @@

+---
+language: en
+license: apache-2.0
+tags:
+- image-classification
+- ai-detection
+- sdxl
+- vision-transformer
+- fake-detection
+datasets:
+- huggan/wikiart
+- ash12321/sdxl-generated-10k
+metrics:
+- accuracy
+- precision
+- recall
+- f1
+model-index:
+- name: SDXL Detector ViT
+  results:
+  - task:
+      type: image-classification
+      name: AI Image Detection
+    metrics:
+    - type: accuracy
+      value: 0.9960
+      name: Test Accuracy
+    - type: f1
+      value: 0.9960
+      name: F1 Score
+    - type: precision
+      value: 0.9930
+      name: Precision
+    - type: recall
+      value: 0.9990
+      name: Recall
+---
+# SDXL Detector - Vision Transformer
+## Model Description
+This model is a **specialized binary classifier** trained to detect images generated by **Stable Diffusion XL (SDXL)**. It achieves **99.60% accuracy** on held-out test data.
+### Key Features
+- 🎯 **Specialist Detector**: Optimized specifically for SDXL-generated images
+- 🚀 **High Accuracy**: 99.60% test accuracy
+- ⚡ **Fast Inference**: ~10ms per image on GPU
+- 🛡️ **Robust**: Trained with 6-layer overfitting prevention
+- 📊 **Well-Validated**: Separate train/val/test splits with no overlap
+### Model Details
+- **Base Model**: google/vit-base-patch16-224 (Vision Transformer)
+- **Task**: Binary Image Classification (Real vs SDXL-Fake)
+- **Input**: 224×224 RGB images
+- **Output**: 2 classes (0: Real, 1: SDXL-Fake)
+- **Parameters**: 85.8M total
+## Performance
+### Test Set Results
+```
+Accuracy:  0.9960
+Precision: 0.9930
+Recall:    0.9990
+F1 Score:  0.9960
+AUC-ROC:   0.9999
+False Positive Rate: 0.0070
+False Negative Rate: 0.0010
+```
+### Confusion Matrix
+```
+                Predicted
+              Real    Fake
+Actual Real    993       7
+Actual Fake      1     999
+```
+**Interpretation:**
+- Out of 1,000 real images: 993 correctly identified (99.3%)
+- Out of 1,000 SDXL images: 999 correctly identified (99.9%)
+## Training Details
+### Dataset
+**Training Data:**
+- Real Images: 8,000 (WikiArt paintings)
+- SDXL Images: 8,000 (generated with SDXL base model)
+- Total: 16,000 images
+**Validation & Test:**
+- 2,000 images each (1,000 real + 1,000 SDXL)
+- Completely separate from training data
+### Training Configuration
+```python
+Model: Vision Transformer (ViT-base-patch16-224)
+Optimizer: AdamW
+Learning Rate: 2e-5
+Batch Size: 32
+Epochs: 3 (early stopping from max 20)
+Training Time: 21.7 minutes
+Overfitting Prevention:
+- Early Stopping (patience=5)
+- Data Augmentation (random crops, flips, rotations, color jitter)
+- Dropout (0.1)
+- Label Smoothing (0.1)
+- Weight Decay (0.01)
+- Learning Rate Scheduling
+```
+## Usage
+### Installation
+```bash
+pip install transformers torch pillow
+```
+### Quick Start
+```python
+import torch
+from PIL import Image
+from transformers import ViTForImageClassification, ViTImageProcessor
+# Load model and processor
+model = ViTForImageClassification.from_pretrained(
+    "ash12321/sdxl-detector-vit"
+)
+processor = ViTImageProcessor.from_pretrained(
+    "google/vit-base-patch16-224"
+)
+# Load and preprocess image
+image = Image.open("your_image.jpg")
+inputs = processor(images=image, return_tensors="pt")
+# Get prediction
+model.eval()
+with torch.no_grad():
+    outputs = model(**inputs)
+    logits = outputs.logits
+    probs = torch.softmax(logits, dim=1)
+    prediction = logits.argmax(dim=1).item()
+# Interpret results
+if prediction == 1:
+    confidence = probs[0][1].item()
+    print(f"SDXL-Generated (confidence: {confidence:.2%})")
+else:
+    confidence = probs[0][0].item()
+    print(f"Real Image (confidence: {confidence:.2%})")
+```
+### Advanced Usage with Threshold
+```python
+def detect_sdxl(image_path, threshold=0.5):
+    """
+    Detect if image is SDXL-generated
+    Args:
+        image_path: Path to image
+        threshold: Classification threshold (default 0.5)
+    Returns:
+        dict: {is_sdxl: bool, confidence: float, label: str}
+    """
+    image = Image.open(image_path).convert('RGB')
+    inputs = processor(images=image, return_tensors="pt")
+    with torch.no_grad():
+        outputs = model(**inputs)
+        probs = torch.softmax(outputs.logits, dim=1)
+        sdxl_prob = probs[0][1].item()
+    is_sdxl = sdxl_prob > threshold
+    return {
+        'is_sdxl': is_sdxl,
+        'confidence': sdxl_prob if is_sdxl else (1 - sdxl_prob),
+        'label': 'SDXL-Generated' if is_sdxl else 'Real Image',
+        'sdxl_probability': sdxl_prob,
+        'real_probability': 1 - sdxl_prob
+    }
+# Example
+result = detect_sdxl("test_image.jpg")
+print(f"{result['label']} ({result['confidence']:.2%} confident)")
+```
+## Limitations
+### What This Model Detects
+✅ **SDXL-generated images** (Stable Diffusion XL)
+### What This Model Does NOT Detect
+❌ Other AI generators (FLUX, Midjourney, DALL-E, etc.)
+❌ Edited/manipulated real images
+❌ Heavily compressed or low-quality images may reduce accuracy
+**Recommendation**: Use as part of an ensemble with other specialized detectors for comprehensive AI detection.
+## Intended Use
+### Primary Use Cases
+- Content moderation platforms
+- Academic research on AI-generated content
+- Watermarking and provenance systems
+- Educational tools for AI literacy
+### Out-of-Scope Uses
+- Sole basis for legal decisions
+- Detection of non-SDXL generators without validation
+- Processing of illegal or harmful content
+## Ethical Considerations
+- This model should be used responsibly as part of broader content verification systems
+- Performance may degrade on images outside the training distribution
+- Always combine automated detection with human review for critical decisions
+- Be transparent about using AI detection systems
+## Citation
+```bibtex
+@misc{sdxl-detector-vit,
+  author = {ash12321},
+  title = {SDXL Detector - Vision Transformer},
+  year = {2024},
+  publisher = {HuggingFace},
+  howpublished = {\url{https://huggingface.co/ash12321/sdxl-detector-vit}},
+}
+```
+## Model Card Authors
+ash12321
+## Model Card Contact
+For questions or feedback, please open an issue on the model repository.
+---
+**Created**: 2025-12-31
+**Framework**: PyTorch + Transformers
+**License**: Apache 2.0

config.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "architectures": [
+    "ViTForImageClassification"
+  ],
+  "attention_probs_dropout_prob": 0.0,
+  "dtype": "float32",
+  "encoder_stride": 16,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.0,
+  "hidden_size": 768,
+  "image_size": 224,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "layer_norm_eps": 1e-12,
+  "model_type": "vit",
+  "num_attention_heads": 12,
+  "num_channels": 3,
+  "num_hidden_layers": 12,
+  "patch_size": 16,
+  "pooler_act": "tanh",
+  "pooler_output_size": 768,
+  "qkv_bias": true,
+  "transformers_version": "4.57.3"
+}

confusion_matrix.png ADDED Viewed

Git LFS Details

SHA256: 2000270e8ff56256b1873396f29493e69b2011dd0851a8ea079b38fd978d1f1b
Pointer size: 131 Bytes
Size of remote file: 114 kB

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:309145ceb0de80a47ff158557d032f6d3946bb3c255b2b829942b87a59d473a4
+size 343223968

training_curves.png ADDED Viewed

Git LFS Details

SHA256: 3a0c4dc9327dc143cfb37602dee197000806ae2f76ed477f84048a518590e713
Pointer size: 131 Bytes
Size of remote file: 331 kB

training_results.json ADDED Viewed

	@@ -0,0 +1,160 @@

+{
+  "detector_name": "SDXL",
+  "random_seed": 42,
+  "best_epoch": 3,
+  "best_val_acc": 0.9975,
+  "training_time_seconds": 1299.7091298103333,
+  "test_metrics": {
+    "accuracy": 0.996,
+    "precision": 0.9930417495029821,
+    "recall": 0.999,
+    "f1": 0.996011964107677,
+    "auc": 0.999919,
+    "fpr": 0.007,
+    "fnr": 0.001
+  },
+  "confusion_matrix": [
+    [
+      993,
+      7
+    ],
+    [
+      1,
+      999
+    ]
+  ],
+  "training_history": {
+    "train_loss": [
+      0.237593307107687,
+      0.20456762167811393,
+      0.2024222546517849,
+      0.2007896138727665,
+      0.20122809171676637,
+      0.2011393434405327,
+      0.2014675495028496,
+      0.1998926806151867,
+      0.20160003173351287,
+      0.20079970782995224,
+      0.20000070515275,
+      0.20020837017893792
+    ],
+    "train_acc": [
+      0.979125,
+      0.9974375,
+      0.9981875,
+      0.999125,
+      0.9989375,
+      0.998875,
+      0.99875,
+      0.9995,
+      0.998625,
+      0.9988125,
+      0.9994375,
+      0.99925
+    ],
+    "val_loss": [
+      0.21416716017420329,
+      0.20845685946562933,
+      0.20591867182935988,
+      0.20520150283026317,
+      0.2066837553940122,
+      0.20536757390650492,
+      0.20409096872049665,
+      0.20484876656343068,
+      0.20928849303533162,
+      0.20644679760176038,
+      0.20396651351262654,
+      0.205487027527794
+    ],
+    "val_acc": [
+      0.996,
+      0.997,
+      0.9975,
+      0.9975,
+      0.9955,
+      0.997,
+      0.9975,
+      0.9965,
+      0.995,
+      0.995,
+      0.997,
+      0.9965
+    ],
+    "val_precision": [
+      0.9940239043824701,
+      0.9960079840319361,
+      0.9950248756218906,
+      0.996011964107677,
+      0.9979899497487437,
+      0.998995983935743,
+      0.9950248756218906,
+      0.996996996996997,
+      0.9969879518072289,
+      0.9969879518072289,
+      0.9950199203187251,
+      0.996996996996997
+    ],
+    "val_recall": [
+      0.998,
+      0.998,
+      1.0,
+      0.999,
+      0.993,
+      0.995,
+      1.0,
+      0.996,
+      0.993,
+      0.993,
+      0.999,
+      0.996
+    ],
+    "val_f1": [
+      0.9960079840319361,
+      0.997002997002997,
+      0.9975062344139651,
+      0.9975037443834248,
+      0.9954887218045113,
+      0.996993987975952,
+      0.9975062344139651,
+      0.9964982491245623,
+      0.9949899799599199,
+      0.9949899799599199,
+      0.9970059880239521,
+      0.9964982491245623
+    ],
+    "val_auc": [
+      0.9995489999999999,
+      0.999897,
+      0.999853,
+      0.99961,
+      0.999957,
+      0.999959,
+      0.99996,
+      0.999961,
+      0.999934,
+      0.999937,
+      0.999919,
+      0.9999610000000001
+    ]
+  },
+  "config": {
+    "model_name": "google/vit-base-patch16-224",
+    "image_size": 224,
+    "num_classes": 2,
+    "batch_size": 32,
+    "learning_rate": 2e-05,
+    "num_epochs": 20,
+    "early_stopping_patience": 5,
+    "dropout_rate": 0.1,
+    "label_smoothing": 0.1,
+    "weight_decay": 0.01
+  },
+  "dataset_info": {
+    "train_real": 8000,
+    "train_fake": 8000,
+    "val_real": 1000,
+    "val_fake": 1000,
+    "test_real": 1000,
+    "test_fake": 1000
+  }
+}