Spaces:

Elm-Challenges
/

Digital_Integrity

Running

App Files Files Community

yingzhi commited on Jan 20

Commit

06b3dd8

verified ·

1 Parent(s): 804973c

Update README.md

Browse files

Files changed (1) hide show

README.md +86 -3

README.md CHANGED Viewed

@@ -1,8 +1,8 @@
 ---
 title: Digital Integrity
 emoji: 🐠
-colorFrom: pink
-colorTo: red
 sdk: gradio
 sdk_version: 6.3.0
 app_file: app.py
@@ -11,4 +11,87 @@ license: apache-2.0
 short_description: Elm Challenge 2 - Computer Vision
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: Digital Integrity
 emoji: 🐠
+colorFrom: yellow
+colorTo: green
 sdk: gradio
 sdk_version: 6.3.0
 app_file: app.py
 short_description: Elm Challenge 2 - Computer Vision
 ---
+# Theme: Detecting GenAI & Sophisticated Manipulation in Public Media 🏆🏆🏆
+## Context & Motivation
+As Generative AI becomes mainstream, the line between reality and synthetic media is blurring. On social media,
+"perfect" AI influencers are indistinguishable from humans, and in real estate, "virtual staging" can mislead buyers by
+hiding structural flaws.
+Existing content moderation tools often check for "Community Guidelines" (violence, hate speech), but fail to detect
+Authenticity. This hackathon challenges you to build a two-module system that identifies GenAI-generated or
+heavily manipulated images in high-stakes public domains (Social Media & Real Estate).
+## System Design Requirements
+Participants must design a dual-path pipeline:
+**Module 1: The Forensic Signal Detector (Pixel-Level)**
+  • Objective: Identify low-level technical anomalies.
+  • Target: Frequency analysis, noise residuals, and GAN/Diffusion artifacts.
+  • Expected Signals:
+    o Texture Consistency: Detecting "unnatural smoothness" in skin or wall textures.
+    o Compression Discrepancies: Identifying if an object (e.g., a furniture piece or a person) was digitally "spliced" into a scene.
+    o Frequency Domain Analysis: Using FFT to find the mathematical "fingerprint" left by upscalers or generators.
+**Module 2: The VLM Logic Reasoner (Semantic-Level)**
+  • Objective: Use a Vision-Language Model (VLM) to provide a "Human-in-the-loop" style reasoning.
+  • Target: Detecting "The Uncanny Valley" and physical impossibilities.
+  • Expected Reasoning:
+    o Physics Check: Do the shadows of the AI-generated model match the sun’s direction in the background?
+    o Structural Integrity: Does the "renovated" real estate kitchen have impossible geometry (e.g., cabinets merging into walls)?
+    o Explanation: A natural language output explaining why the image is flagged (e.g., "The reflection in the window shows a different room layout than the one pictured. " )
+## Challenge Tracks (Choose One)
+**Track A: Social Media & Influencer Authenticity**
+  • Problem: Detection of "AI-Wash" filters and fully synthetic personas.
+  • Data Focus: Portraits, lifestyle photography, and high-fashion edits.
+  • Goal: Differentiate between "Touch-ups" (acceptable) and "Identity Fabrication" (adversarial).
+**Track B: Real Estate & Commercial Integrity**
+  • Problem: Detecting deceptive virtual staging or AI-generated property photos.
+  • Data Focus: Interior/Exterior architectural photos.
+  • Goal: Identify where AI has been used to remove power lines, hide cracks in walls, or completely replace furniture in a misleading way.
+## Submission Deliverables
+  1. **Inference Script**: A clean Python script to process a folder of images.
+  2. **The "Audit Report"**: For every flagged image, the system must produce a JSON output containing:
+    o authenticity_score: (0.0 to 1.0)
+    o manipulation_type: (e.g., "In-painting", "Full Synthesis", "Filter")
+    o vlm_reasoning: A 2-sentence explanation of the red flags.
+  3. **Technical Report**: A 3-page summary of the architecture and the "Fusion Strategy" used to combine Module 1 and Module 2.
+## Evaluation Criteria
+  • Detection Accuracy (40%): Performance on a hidden test set containing 50/50 real and manipulated images.
+  • Explainability (30%): How logical and accurate are the VLM’s explanations? (Evaluated by human judges).
+  • Generalization (20%): Does the model work on different lighting, resolutions, and "unseen" AI generators (e.g., Midjourney v6 vs. Flux)?
+  • Efficiency (10%): Speed of inference and model size.