|
|
--- |
|
|
title: Digital Integrity |
|
|
emoji: π |
|
|
colorFrom: yellow |
|
|
colorTo: green |
|
|
sdk: gradio |
|
|
sdk_version: 6.3.0 |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
license: apache-2.0 |
|
|
short_description: Elm Challenge 2 - Computer Vision |
|
|
--- |
|
|
|
|
|
# Detecting GenAI & Sophisticated Manipulation in Public Media πππ |
|
|
|
|
|
## Context & Motivation |
|
|
|
|
|
As Generative AI becomes mainstream, the line between reality and synthetic media is blurring. |
|
|
On social media, "perfect" AI influencers are indistinguishable from humans, and in real estate, "virtual staging" can mislead buyers by hiding structural flaws. |
|
|
|
|
|
Existing content moderation tools often check for "Community Guidelines" (violence, hate speech), but fail to detect authenticity. |
|
|
This hackathon challenges you to build a two-module system that identifies GenAI-generated or heavily manipulated images in high-stakes public domains (Social Media & Real Estate). |
|
|
|
|
|
## System Design Requirements |
|
|
|
|
|
Participants must design a dual-path pipeline: |
|
|
|
|
|
**Module 1: The Forensic Signal Detector (Pixel-Level)** |
|
|
- Objective: Identify low-level technical anomalies. |
|
|
- Target: Frequency analysis, noise residuals, and GAN/Diffusion artifacts. |
|
|
- Expected Signals: |
|
|
- Texture Consistency: Detecting "unnatural smoothness" in skin or wall textures. |
|
|
- Compression Discrepancies: Identifying if an object (e.g., a furniture piece or a person) was digitally "spliced" into a scene. |
|
|
- Frequency Domain Analysis: Using FFT to find the mathematical "fingerprint" left by upscalers or generators. |
|
|
|
|
|
**Module 2: The VLM Logic Reasoner (Semantic-Level)** |
|
|
- Objective: Use a Vision-Language Model (VLM) to provide a "Human-in-the-loop" style reasoning. |
|
|
- Target: Detecting "The Uncanny Valley" and physical impossibilities. |
|
|
- Expected Reasoning: |
|
|
- Physics Check: Do the shadows of the AI-generated model match the sunβs direction in the background? |
|
|
- Structural Integrity: Does the "renovated" real estate kitchen have impossible geometry (e.g., cabinets merging into walls)? |
|
|
- Explanation: A natural language output explaining why the image is flagged (e.g., "The reflection in the window shows a different room layout than the one pictured. " ) |
|
|
|
|
|
## Challenge Tracks (Choose One) |
|
|
|
|
|
**Track A: Social Media & Influencer Authenticity** |
|
|
- Problem: Detection of "AI-Wash" filters and fully synthetic personas. |
|
|
- Data Focus: Portraits, lifestyle photography, and high-fashion edits. |
|
|
- Goal: Differentiate between "Touch-ups" (acceptable) and "Identity Fabrication" (adversarial). |
|
|
|
|
|
**Track B: Real Estate & Commercial Integrity** |
|
|
- Problem: Detecting deceptive virtual staging or AI-generated property photos. |
|
|
- Data Focus: Interior/Exterior architectural photos. |
|
|
- Goal: Identify where AI has been used to remove power lines, hide cracks in walls, or completely replace furniture in a misleading way. |
|
|
|
|
|
## Datasets Specification |
|
|
This challenge is dataset-flexible: You may train using any public datasets and/or self-generated manipulations (inpainting, splicing, virtual staging). |
|
|
Examples include ArtiFact / Celeb-DF for Track A, Places365 / SUN RGB-D for Track B, and Columbia Splicing for general manipulation forensics. |
|
|
|
|
|
## Submission Deliverables |
|
|
|
|
|
1. **Inference Script**: A clean "predict.py" script to process a folder of images and requirements.txt (all dependencies needed to run predict.py). |
|
|
2. **The "Audit Report"**: For every flagged image, the system must produce a JSON output containing: |
|
|
|
|
|
- authenticity_score: (0.0 to 1.0) [0.0 = authentic, 1.0 = manipulated] |
|
|
- manipulation_type: (e.g., "In-painting", "Full Synthesis", "Filter") |
|
|
- vlm_reasoning: A 2-sentence explanation of the red flags. |
|
|
|
|
|
Expected "predictions.json" format: |
|
|
{ |
|
|
"image_name": "000001.jpg", |
|
|
"authenticity_score": 0.91, |
|
|
"manipulation_type": "inpainting", |
|
|
"vlm_reasoning": "The window reflection is inconsistent with the room layout. Shadow direction on the sofa does not match the light source." |
|
|
} |
|
|
|
|
|
3. **Technical Report**: A 3-page summary of the architecture and the "Fusion Strategy" used to combine Module 1 and Module 2. |
|
|
4. We are expecting to run your code this way: |
|
|
|
|
|
pip install -r requirements.txt |
|
|
|
|
|
python predict.py --input_dir /test_images --output_file predictions.json |
|
|
|
|
|
## Evaluation Criteria |
|
|
- Detection Accuracy (40%): Performance on a hidden test set containing 50/50 real and manipulated images. |
|
|
- Explainability (30%): How logical and accurate are the VLMβs explanations? (Evaluated by human judges). |
|
|
- Generalization (20%): Does the model work on different lighting, resolutions, and "unseen" AI generators (e.g., Midjourney v6 vs. Flux)? |
|
|
- Efficiency (10%): Speed of inference and model size. |
|
|
|
|
|
## Recommended submission method |
|
|
Please publish your solution as a Hugging Face model repository containing predict.py, requirements.txt, and the technical report. |
|
|
Model weights should either be included in the repo (if lightweight) or downloaded-able from a Hugging Face model repo. |
|
|
Once your repo is ready, please submit your solution using the official online submission form: |
|
|
https://forms.office.com/r/864ac0pUAC |
|
|
|
|
|
Please ensure all required fields for this challenge in the form are completed, and any links or uploaded materials included in the form must be accessible (e.g., public or view-enabled as required). |
|
|
|
|
|
## Team Requirements & Eligibility |
|
|
- Each team must consist of 2β3 participants. |
|
|
- All participants must be enrolled in MenaML Winter School 2026. |
|
|
|
|
|
## Submission Deadline |
|
|
Wednesday, 28/01/2026 at 2:00 PM Riyadh time. |