--- title: Digital Integrity emoji: 🔍 colorFrom: yellow colorTo: green sdk: gradio sdk_version: 6.3.0 app_file: app.py pinned: false license: apache-2.0 short_description: Elm Challenge 2 - Computer Vision --- # Detecting GenAI & Sophisticated Manipulation in Public Media 🏆🏆🏆 ## Context & Motivation As Generative AI becomes mainstream, the line between reality and synthetic media is blurring. On social media, "perfect" AI influencers are indistinguishable from humans, and in real estate, "virtual staging" can mislead buyers by hiding structural flaws. Existing content moderation tools often check for "Community Guidelines" (violence, hate speech), but fail to detect authenticity. This hackathon challenges you to build a two-module system that identifies GenAI-generated or heavily manipulated images in high-stakes public domains (Social Media & Real Estate). ## System Design Requirements Participants must design a dual-path pipeline: **Module 1: The Forensic Signal Detector (Pixel-Level)** - Objective: Identify low-level technical anomalies. - Target: Frequency analysis, noise residuals, and GAN/Diffusion artifacts. - Expected Signals: - Texture Consistency: Detecting "unnatural smoothness" in skin or wall textures. - Compression Discrepancies: Identifying if an object (e.g., a furniture piece or a person) was digitally "spliced" into a scene. - Frequency Domain Analysis: Using FFT to find the mathematical "fingerprint" left by upscalers or generators. **Module 2: The VLM Logic Reasoner (Semantic-Level)** - Objective: Use a Vision-Language Model (VLM) to provide a "Human-in-the-loop" style reasoning. - Target: Detecting "The Uncanny Valley" and physical impossibilities. - Expected Reasoning: - Physics Check: Do the shadows of the AI-generated model match the sun’s direction in the background? - Structural Integrity: Does the "renovated" real estate kitchen have impossible geometry (e.g., cabinets merging into walls)? - Explanation: A natural language output explaining why the image is flagged (e.g., "The reflection in the window shows a different room layout than the one pictured. " ) ## Challenge Tracks (Choose One) **Track A: Social Media & Influencer Authenticity** - Problem: Detection of "AI-Wash" filters and fully synthetic personas. - Data Focus: Portraits, lifestyle photography, and high-fashion edits. - Goal: Differentiate between "Touch-ups" (acceptable) and "Identity Fabrication" (adversarial). **Track B: Real Estate & Commercial Integrity** - Problem: Detecting deceptive virtual staging or AI-generated property photos. - Data Focus: Interior/Exterior architectural photos. - Goal: Identify where AI has been used to remove power lines, hide cracks in walls, or completely replace furniture in a misleading way. ## Datasets Specification This challenge is dataset-flexible: You may train using any public datasets and/or self-generated manipulations (inpainting, splicing, virtual staging). Examples include ArtiFact / Celeb-DF for Track A, Places365 / SUN RGB-D for Track B, and Columbia Splicing for general manipulation forensics. ## Submission Deliverables 1. **Inference Script**: A clean "predict.py" script to process a folder of images and requirements.txt (all dependencies needed to run predict.py). 2. **The "Audit Report"**: For every flagged image, the system must produce a JSON output containing: - authenticity_score: (0.0 to 1.0) [0.0 = authentic, 1.0 = manipulated] - manipulation_type: (e.g., "In-painting", "Full Synthesis", "Filter") - vlm_reasoning: A 2-sentence explanation of the red flags. Expected "predictions.json" format: { "image_name": "000001.jpg", "authenticity_score": 0.91, "manipulation_type": "inpainting", "vlm_reasoning": "The window reflection is inconsistent with the room layout. Shadow direction on the sofa does not match the light source." } 3. **Technical Report**: A 3-page summary of the architecture and the "Fusion Strategy" used to combine Module 1 and Module 2. 4. We are expecting to run your code this way: pip install -r requirements.txt python predict.py --input_dir /test_images --output_file predictions.json ## Evaluation Criteria - Detection Accuracy (40%): Performance on a hidden test set containing 50/50 real and manipulated images. - Explainability (30%): How logical and accurate are the VLM’s explanations? (Evaluated by human judges). - Generalization (20%): Does the model work on different lighting, resolutions, and "unseen" AI generators (e.g., Midjourney v6 vs. Flux)? - Efficiency (10%): Speed of inference and model size. ## Recommended submission method Please publish your solution as a Hugging Face model repository containing predict.py, requirements.txt, and the technical report. Model weights should either be included in the repo (if lightweight) or downloaded-able from a Hugging Face model repo. Once your repo is ready, please submit your solution using the official online submission form: https://forms.office.com/r/864ac0pUAC Please ensure all required fields for this challenge in the form are completed, and any links or uploaded materials included in the form must be accessible (e.g., public or view-enabled as required). ## Team Requirements & Eligibility - Each team must consist of 2–3 participants. - All participants must be enrolled in MenaML Winter School 2026. ## Submission Deadline Wednesday, 28/01/2026 at 2:00 PM Riyadh time.