yingzhi commited on
Commit
06b3dd8
·
verified ·
1 Parent(s): 804973c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -3
README.md CHANGED
@@ -1,8 +1,8 @@
1
  ---
2
  title: Digital Integrity
3
  emoji: 🐠
4
- colorFrom: pink
5
- colorTo: red
6
  sdk: gradio
7
  sdk_version: 6.3.0
8
  app_file: app.py
@@ -11,4 +11,87 @@ license: apache-2.0
11
  short_description: Elm Challenge 2 - Computer Vision
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: Digital Integrity
3
  emoji: 🐠
4
+ colorFrom: yellow
5
+ colorTo: green
6
  sdk: gradio
7
  sdk_version: 6.3.0
8
  app_file: app.py
 
11
  short_description: Elm Challenge 2 - Computer Vision
12
  ---
13
 
14
+ # Theme: Detecting GenAI & Sophisticated Manipulation in Public Media 🏆🏆🏆
15
+
16
+ ## Context & Motivation
17
+
18
+ As Generative AI becomes mainstream, the line between reality and synthetic media is blurring. On social media,
19
+ "perfect" AI influencers are indistinguishable from humans, and in real estate, "virtual staging" can mislead buyers by
20
+ hiding structural flaws.
21
+
22
+ Existing content moderation tools often check for "Community Guidelines" (violence, hate speech), but fail to detect
23
+ Authenticity. This hackathon challenges you to build a two-module system that identifies GenAI-generated or
24
+ heavily manipulated images in high-stakes public domains (Social Media & Real Estate).
25
+
26
+ ## System Design Requirements
27
+
28
+ Participants must design a dual-path pipeline:
29
+
30
+ **Module 1: The Forensic Signal Detector (Pixel-Level)**
31
+
32
+ • Objective: Identify low-level technical anomalies.
33
+
34
+ • Target: Frequency analysis, noise residuals, and GAN/Diffusion artifacts.
35
+
36
+ • Expected Signals:
37
+
38
+ o Texture Consistency: Detecting "unnatural smoothness" in skin or wall textures.
39
+
40
+ o Compression Discrepancies: Identifying if an object (e.g., a furniture piece or a person) was digitally "spliced" into a scene.
41
+
42
+ o Frequency Domain Analysis: Using FFT to find the mathematical "fingerprint" left by upscalers or generators.
43
+
44
+ **Module 2: The VLM Logic Reasoner (Semantic-Level)**
45
+
46
+ • Objective: Use a Vision-Language Model (VLM) to provide a "Human-in-the-loop" style reasoning.
47
+
48
+ • Target: Detecting "The Uncanny Valley" and physical impossibilities.
49
+
50
+ • Expected Reasoning:
51
+
52
+ o Physics Check: Do the shadows of the AI-generated model match the sun’s direction in the background?
53
+
54
+ o Structural Integrity: Does the "renovated" real estate kitchen have impossible geometry (e.g., cabinets merging into walls)?
55
+
56
+ o Explanation: A natural language output explaining why the image is flagged (e.g., "The reflection in the window shows a different room layout than the one pictured. " )
57
+
58
+ ## Challenge Tracks (Choose One)
59
+
60
+ **Track A: Social Media & Influencer Authenticity**
61
+
62
+ • Problem: Detection of "AI-Wash" filters and fully synthetic personas.
63
+
64
+ • Data Focus: Portraits, lifestyle photography, and high-fashion edits.
65
+
66
+ • Goal: Differentiate between "Touch-ups" (acceptable) and "Identity Fabrication" (adversarial).
67
+
68
+ **Track B: Real Estate & Commercial Integrity**
69
+
70
+ • Problem: Detecting deceptive virtual staging or AI-generated property photos.
71
+
72
+ • Data Focus: Interior/Exterior architectural photos.
73
+
74
+ • Goal: Identify where AI has been used to remove power lines, hide cracks in walls, or completely replace furniture in a misleading way.
75
+
76
+ ## Submission Deliverables
77
+
78
+ 1. **Inference Script**: A clean Python script to process a folder of images.
79
+ 2. **The "Audit Report"**: For every flagged image, the system must produce a JSON output containing:
80
+
81
+ o authenticity_score: (0.0 to 1.0)
82
+
83
+ o manipulation_type: (e.g., "In-painting", "Full Synthesis", "Filter")
84
+
85
+ o vlm_reasoning: A 2-sentence explanation of the red flags.
86
+
87
+ 3. **Technical Report**: A 3-page summary of the architecture and the "Fusion Strategy" used to combine Module 1 and Module 2.
88
+
89
+ ## Evaluation Criteria
90
+
91
+ • Detection Accuracy (40%): Performance on a hidden test set containing 50/50 real and manipulated images.
92
+
93
+ • Explainability (30%): How logical and accurate are the VLM’s explanations? (Evaluated by human judges).
94
+
95
+ • Generalization (20%): Does the model work on different lighting, resolutions, and "unseen" AI generators (e.g., Midjourney v6 vs. Flux)?
96
+
97
+ • Efficiency (10%): Speed of inference and model size.