StephenSAI commited on
Commit
e50ced5
Β·
verified Β·
1 Parent(s): df87edc

Initial org card

Browse files
Files changed (1) hide show
  1. README.md +90 -5
README.md CHANGED
@@ -1,10 +1,95 @@
1
  ---
2
- title: README
3
- emoji: 🐒
4
- colorFrom: purple
5
- colorTo: green
6
  sdk: static
7
  pinned: false
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Scam.AI
3
+ emoji: πŸ›‘οΈ
4
+ colorFrom: blue
5
+ colorTo: indigo
6
  sdk: static
7
  pinned: false
8
  ---
9
 
10
+ # Scam.AI
11
+
12
+ **Detection systems for AI-driven fraud β€” deepfakes, document forgery, synthetic media, and adversarial attacks against identity verification.**
13
+
14
+ [![Website](https://img.shields.io/badge/scam.ai-Website-blue)](https://www.scam.ai)
15
+ [![Research](https://img.shields.io/badge/Research-Publications-orange)](https://www.scam.ai/en/research)
16
+ [![Datasets](https://img.shields.io/badge/Datasets-7%20open-green)](https://huggingface.co/Scam-AI)
17
+
18
+ ---
19
+
20
+ ## What We Do
21
+
22
+ Scam.AI builds detection systems that protect identity-verification pipelines, financial-document workflows, and digital media ecosystems from the next generation of AI-driven fraud. Our research portfolio spans **deepfake detection, document forgery forensics, AI-generated image attribution, age-estimation robustness, and behavioral-biometric verification** β€” published at top venues (CVPR, arXiv) and released here as open benchmarks for the community.
23
+
24
+ ---
25
+
26
+ ## πŸ”¬ Research Areas
27
+
28
+ | Area | Focus | Key Datasets |
29
+ |------|-------|--------------|
30
+ | **🎭 Deepfake Detection** | Real-world faceswap detection beyond academic benchmarks | [RWFS](./datasets/Scam-AI/RWFS) |
31
+ | **πŸ“„ Document Forgery** | AI-inpainted receipts, forms, and financial documents | [AIForge-Doc-v2](./datasets/Scam-AI/AIForge-Doc-v2) Β· [AIForge-Doc-v1](./datasets/Scam-AI/AIForge-Doc-v1) Β· [gpt4o-receipt](./datasets/Scam-AI/gpt4o-receipt) |
32
+ | **πŸ–ΌοΈ AI-Generated Image Detection** | Self-reported AI-generated images in the wild | [gpt-image-2](./datasets/Scam-AI/gpt-image-2) |
33
+ | **πŸ›‘οΈ Age Estimation Robustness** | Cosmetic adversarial attacks against age verification | [age-adversarial-attack](./datasets/Scam-AI/age-adversarial-attack) |
34
+ | **πŸ‘οΈ Behavioral Biometrics** | Gaze-based liveness for video interview verification | [synthetic-gaze-reading](./datasets/Scam-AI/synthetic-gaze-reading) |
35
+
36
+ ---
37
+
38
+ ## πŸ“š Featured Datasets
39
+
40
+ All datasets are released for **academic research and non-commercial use** under CC-BY-NC-SA 4.0. Email-gated download with automatic approval.
41
+
42
+ ### 🎭 Deepfake Detection
43
+ - **[RWFS β€” Real-World Faceswap Dataset](./datasets/Scam-AI/RWFS)** β€” 847 deepfakes from 8 production faceswap tools (Pixlr, Magic Hour, Remaker, etc) + 900 authentic faces. The first dataset reflecting how deepfakes actually appear in the wild.
44
+ > *Ren et al., "Do Deepfake Detectors Work in Reality?" β€” arXiv:2502.10920*
45
+
46
+ ### πŸ“„ Document Forgery & Forensics
47
+ - **[AIForge-Doc v2](./datasets/Scam-AI/AIForge-Doc-v2)** β€” 3,066 GPT-Image-2 inpainted document forgeries paired with authentic source + pixel-precise tampering masks. DocTamper-compatible.
48
+ - **[AIForge-Doc v1](./datasets/Scam-AI/AIForge-Doc-v1)** β€” 4,061 forgeries via Gemini 2.5 / Ideogram v2. Same-spec pairing with v2 enables cross-generator detector analysis.
49
+ - **[GPT4o-Receipt](./datasets/Scam-AI/gpt4o-receipt)** β€” 935 fully AI-synthesized receipts (GPT-4o + GPT-Image-1) across 159 merchant categories. Companion human-vs-LLM forensic detection study.
50
+
51
+ ### πŸ–ΌοΈ AI-Generated Image Detection
52
+ - **[GPT-Image-2 Twitter Dataset](./datasets/Scam-AI/gpt-image-2)** β€” 10,217 confirmed GPT-Image-2 outputs scraped from Twitter/X in the first week post-launch. Multi-language: EN (40%), JA (33%), ZH (19%).
53
+
54
+ ### πŸ›‘οΈ Identity Verification Robustness
55
+ - **[Age Adversarial Attack Dataset](./datasets/Scam-AI/age-adversarial-attack)** β€” 5,809 VLM-simulated cosmetic attacks (beard, gray hair, makeup, wrinkles) demonstrating 29–65% attack-conversion rate on production age estimators.
56
+ > *Ren et al., CVPR 2026*
57
+ - **[Synthetic Eye Movement Dataset](./datasets/Scam-AI/synthetic-gaze-reading)** β€” 12 hours of synthetic eye-movement video (144 sessions Γ— 5 min) for script-reading detection in video interviews.
58
+
59
+ ---
60
+
61
+ ## πŸ“‘ Publications
62
+
63
+ 13 papers across deepfake detection, AI-generated detection, document forgery, age estimation, and interview technology. Browse the full list at **[scam.ai/research](https://www.scam.ai/en/research)**.
64
+
65
+ Selected work:
66
+ - **Do Deepfake Detectors Work in Reality?** β€” Ren, Patil, Zewde et al.
67
+ - **AIForge-Doc: A Benchmark for Detecting AI-Forged Tampering in Financial and Form Documents** β€” Wu, Zhou, Xu et al. (arXiv:2602.20569)
68
+ - **GPT-Image-2 in the Wild** β€” Zewde, Ren, Shen et al. (arXiv:2604.25370)
69
+ - **Can a Teenager Fool an AI? Evaluating Low-Cost Cosmetic Attacks on Age Estimation Systems** β€” Shen, Duong, An et al. (arXiv:2602.19539, CVPR 2026)
70
+
71
+ ---
72
+
73
+ ## πŸ’Ό For Enterprise
74
+
75
+ The datasets above are released for the research community. For production needs we offer:
76
+
77
+ - **Detection APIs** β€” Deepfake, document forgery, AI-image, and age-verification endpoints with latency and accuracy SLAs
78
+ - **On-premise deployment** β€” Private cloud or air-gapped installations for regulated industries (banking, government, healthcare)
79
+ - **Commercial licensing** β€” Use our datasets and models in commercial pipelines
80
+ - **Custom models** β€” Trained on your domain, evaluated against the threat models we've published
81
+
82
+ πŸ“§ **sales@scam.ai** Β· 🌐 **[scam.ai](https://www.scam.ai)**
83
+
84
+ ---
85
+
86
+ ## 🀝 Get Involved
87
+
88
+ - ⭐ **Follow** this org to get notified of new dataset releases
89
+ - πŸ“₯ **Download** any dataset (free for non-commercial research, just provide name + email)
90
+ - πŸ“ **Cite** our papers if you publish work building on these resources
91
+ - πŸ› **Open a discussion** on any dataset to report issues or share results
92
+
93
+ ---
94
+
95
+ *Building detection systems for an era when generative AI makes every digital artifact suspect.*