ArtifactDetect: Forensic Pixel Detector & VLM Reasoner π
Overview
ArtifactDetect is a dual-stream forensic pipeline designed to detect sophisticated Generative AI manipulations in high-stakes media, specifically focusing on Real Estate & Commercial Integrity (Track B).
Developed for the MenaML Winter School 2026 GenAI Detection Challenge, this system moves beyond black-box detection by grounding its analysis in physical signal processing and semantic reasoning.
Key Performance
- Detection Accuracy: 99.13% on the blind test set (229/231 images correctly classified).
- Calibration: Optimized decision threshold (0.20) for high-sensitivity detection of "in-the-wild" web-scraped content.
π§ System Architecture
Our solution employs a Dual-Stream Fusion Strategy to ensure both technical accuracy and logical explainability:
Module 1: Forensic Signal Detector (Pixel-Level)
- Backbone:
EfficientNet-B0(Pretrained) for robust feature extraction. - Forensic Extractors: Custom layers that analyze:
- Frequency Domain: FFT fingerprinting to detect GAN grid artifacts.
- Noise Residuals: High-pass filtering to identify splicing anomalies.
- Texture Consistency: Gradient analysis for unnatural "waxy" smoothing.
- Output: Binary Authenticity Score + Specific Manipulation Technique (12 classes).
Module 2: VLM Logic Reasoner (Semantic-Level)
- Model:
llava-hf/llava-1.5-7b-hf(Large Language-and-Vision Assistant). - Optimization: 4-bit quantization (
BitsAndBytes) for efficient inference on standard GPUs. - Mechanism: Uses Prompt-Guided Injection. The specific technique detected by Module 1 (e.g., "Shadow Mismatch") is injected into the VLM prompt to generate grounded, human-readable explanations.
π§ͺ Data Engineering: "Mathematical Injection"
Unlike standard approaches relying on generative models, we engineered a deterministic mathematical pipeline to create our training data. This ensures precise ground truth without hallucination.
- Techniques: We rigorously injected specific artifacts (e.g., quantization noise, affine geometric shifts) into 2,000 authentic images.
- Datasets:
- Training Set: genai-manipulation-detection-interior
- Test Set: Forensic-Manipulation-Test-Set
π οΈ Installation & Usage
1. Requirements
pip install -r requirements.txt
Key dependencies: torch, transf
ormers, timm, opencv-python, huggingface_hub, bitsandbytes
- Run Inference Our predict.py script is fully self-contained. It will automatically download our custom weights and the test dataset from Hugging Face. python predict.py
- Output: A submission.json file containing:
- authenticity_score: (0.0 - 1.0)
- manipulation_type: Specific technique (e.g., "Shadow Mismatch")
- vlm_reasoning: Explanation (e.g., "Shadows from the chair point left, while window light suggests they should point right.")
π Resources
All assets are publicly hosted on Hugging Face for reproducibility:
- Model Weights: FatimahEmadEldin/Forensic-Pixel-Detector
- Combined Weights File: best_combined_model.pth
π₯ Team Details
Team Name: ArtifactDetect Track: B (Real Estate & Commercial Integrity)
- Fatimah Emad Eldin (Fatemah.it@gmail.com)
- Mohammed Mustafa Bremoo (mohabremoo@gmail.com)
- Abdulellah Mojalled (abdulellah.mazen@gmail.com)