ArtifactDetect: Forensic Pixel Detector & VLM Reasoner πŸ†

Overview

ArtifactDetect is a dual-stream forensic pipeline designed to detect sophisticated Generative AI manipulations in high-stakes media, specifically focusing on Real Estate & Commercial Integrity (Track B).

Developed for the MenaML Winter School 2026 GenAI Detection Challenge, this system moves beyond black-box detection by grounding its analysis in physical signal processing and semantic reasoning.

Key Performance

  • Detection Accuracy: 99.13% on the blind test set (229/231 images correctly classified).
  • Calibration: Optimized decision threshold (0.20) for high-sensitivity detection of "in-the-wild" web-scraped content.

🧠 System Architecture

Our solution employs a Dual-Stream Fusion Strategy to ensure both technical accuracy and logical explainability:

Module 1: Forensic Signal Detector (Pixel-Level)

  • Backbone: EfficientNet-B0 (Pretrained) for robust feature extraction.
  • Forensic Extractors: Custom layers that analyze:
    • Frequency Domain: FFT fingerprinting to detect GAN grid artifacts.
    • Noise Residuals: High-pass filtering to identify splicing anomalies.
    • Texture Consistency: Gradient analysis for unnatural "waxy" smoothing.
  • Output: Binary Authenticity Score + Specific Manipulation Technique (12 classes).

Module 2: VLM Logic Reasoner (Semantic-Level)

  • Model: llava-hf/llava-1.5-7b-hf (Large Language-and-Vision Assistant).
  • Optimization: 4-bit quantization (BitsAndBytes) for efficient inference on standard GPUs.
  • Mechanism: Uses Prompt-Guided Injection. The specific technique detected by Module 1 (e.g., "Shadow Mismatch") is injected into the VLM prompt to generate grounded, human-readable explanations.

πŸ§ͺ Data Engineering: "Mathematical Injection"

Unlike standard approaches relying on generative models, we engineered a deterministic mathematical pipeline to create our training data. This ensures precise ground truth without hallucination.

πŸ› οΈ Installation & Usage

1. Requirements

pip install -r requirements.txt

Key dependencies: torch, transf

ormers, timm, opencv-python, huggingface_hub, bitsandbytes

  1. Run Inference Our predict.py script is fully self-contained. It will automatically download our custom weights and the test dataset from Hugging Face. python predict.py
  • Output: A submission.json file containing:
    • authenticity_score: (0.0 - 1.0)
    • manipulation_type: Specific technique (e.g., "Shadow Mismatch")
    • vlm_reasoning: Explanation (e.g., "Shadows from the chair point left, while window light suggests they should point right.")

πŸ”— Resources

All assets are publicly hosted on Hugging Face for reproducibility:

  • Model Weights: FatimahEmadEldin/Forensic-Pixel-Detector
  • Combined Weights File: best_combined_model.pth

πŸ‘₯ Team Details

Team Name: ArtifactDetect Track: B (Real Estate & Commercial Integrity)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train FatimahEmadEldin/forensic-pixel-detector

Collection including FatimahEmadEldin/forensic-pixel-detector