Shorten org card — drop research-areas table, get-involved, publications section
Browse files
README.md
CHANGED
|
@@ -9,87 +9,41 @@ pinned: false
|
|
| 9 |
|
| 10 |
# Scam.AI
|
| 11 |
|
| 12 |
-
**Detection systems for AI-driven fraud
|
| 13 |
|
| 14 |
-
|
| 15 |
-
[](https://www.scam.ai/en/research)
|
| 16 |
-
[](https://huggingface.co/Scam-AI)
|
| 17 |
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
## What We Do
|
| 21 |
-
|
| 22 |
-
Scam.AI builds detection systems that protect identity-verification pipelines, financial-document workflows, and digital media ecosystems from the next generation of AI-driven fraud. Our research portfolio spans **deepfake detection, document forgery forensics, AI-generated image attribution, age-estimation robustness, and behavioral-biometric verification** — published at top venues (CVPR, arXiv) and released here as open benchmarks for the community.
|
| 23 |
-
|
| 24 |
-
---
|
| 25 |
-
|
| 26 |
-
## 🔬 Research Areas
|
| 27 |
-
|
| 28 |
-
| Area | Focus | Key Datasets |
|
| 29 |
-
|------|-------|--------------|
|
| 30 |
-
| **🎭 Deepfake Detection** | Real-world faceswap detection beyond academic benchmarks | [RWFS](https://huggingface.co/datasets/Scam-AI/RWFS) |
|
| 31 |
-
| **📄 Document Forgery** | AI-inpainted receipts, forms, and financial documents | [AIForge-Doc-v2](https://huggingface.co/datasets/Scam-AI/AIForge-Doc-v2) · [AIForge-Doc-v1](https://huggingface.co/datasets/Scam-AI/AIForge-Doc-v1) · [gpt4o-receipt](https://huggingface.co/datasets/Scam-AI/gpt4o-receipt) |
|
| 32 |
-
| **🖼️ AI-Generated Image Detection** | Self-reported AI-generated images in the wild | [gpt-image-2](https://huggingface.co/datasets/Scam-AI/gpt-image-2) |
|
| 33 |
-
| **🛡️ Age Estimation Robustness** | Cosmetic adversarial attacks against age verification | [age-adversarial-attack](https://huggingface.co/datasets/Scam-AI/age-adversarial-attack) |
|
| 34 |
-
| **👁️ Behavioral Biometrics** | Gaze-based liveness for video interview verification | [synthetic-gaze-reading](https://huggingface.co/datasets/Scam-AI/synthetic-gaze-reading) |
|
| 35 |
|
| 36 |
---
|
| 37 |
|
| 38 |
-
## 📚
|
| 39 |
-
|
| 40 |
-
All datasets are released for **academic research and non-commercial use** under CC-BY-NC-SA 4.0. Email-gated download with automatic approval.
|
| 41 |
-
|
| 42 |
-
### 🎭 Deepfake Detection
|
| 43 |
-
- **[RWFS — Real-World Faceswap Dataset](https://huggingface.co/datasets/Scam-AI/RWFS)** — 847 deepfakes from 8 production faceswap tools (Pixlr, Magic Hour, Remaker, etc) + 900 authentic faces. The first dataset reflecting how deepfakes actually appear in the wild.
|
| 44 |
-
> *Ren et al., "Do Deepfake Detectors Work in Reality?" — arXiv:2502.10920*
|
| 45 |
-
|
| 46 |
-
### 📄 Document Forgery & Forensics
|
| 47 |
-
- **[AIForge-Doc v2](https://huggingface.co/datasets/Scam-AI/AIForge-Doc-v2)** — 3,066 GPT-Image-2 inpainted document forgeries paired with authentic source + pixel-precise tampering masks. DocTamper-compatible.
|
| 48 |
-
- **[AIForge-Doc v1](https://huggingface.co/datasets/Scam-AI/AIForge-Doc-v1)** — 4,061 forgeries via Gemini 2.5 / Ideogram v2. Same-spec pairing with v2 enables cross-generator detector analysis.
|
| 49 |
-
- **[GPT4o-Receipt](https://huggingface.co/datasets/Scam-AI/gpt4o-receipt)** — 935 fully AI-synthesized receipts (GPT-4o + GPT-Image-1) across 159 merchant categories. Companion human-vs-LLM forensic detection study.
|
| 50 |
|
| 51 |
-
|
| 52 |
-
- **[GPT-Image-2 Twitter Dataset](https://huggingface.co/datasets/Scam-AI/gpt-image-2)** — 10,217 confirmed GPT-Image-2 outputs scraped from Twitter/X in the first week post-launch. Multi-language: EN (40%), JA (33%), ZH (19%).
|
| 53 |
|
| 54 |
-
|
| 55 |
-
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
---
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
13 papers across deepfake detection, AI-generated detection, document forgery, age estimation, and interview technology. Browse the full list at **[scam.ai/research](https://www.scam.ai/en/research)**.
|
| 64 |
-
|
| 65 |
-
Selected work:
|
| 66 |
-
- **Do Deepfake Detectors Work in Reality?** — Ren, Patil, Zewde et al.
|
| 67 |
-
- **AIForge-Doc: A Benchmark for Detecting AI-Forged Tampering in Financial and Form Documents** — Wu, Zhou, Xu et al. (arXiv:2602.20569)
|
| 68 |
-
- **GPT-Image-2 in the Wild** — Zewde, Ren, Shen et al. (arXiv:2604.25370)
|
| 69 |
-
- **Can a Teenager Fool an AI? Evaluating Low-Cost Cosmetic Attacks on Age Estimation Systems** — Shen, Duong, An et al. (arXiv:2602.19539, CVPR 2026)
|
| 70 |
|
| 71 |
---
|
| 72 |
|
| 73 |
## 💼 For Enterprise
|
| 74 |
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
- **Detection APIs** — Deepfake, document forgery, AI-image, and age-verification endpoints with latency and accuracy SLAs
|
| 78 |
-
- **On-premise deployment** — Private cloud or air-gapped installations for regulated industries (banking, government, healthcare)
|
| 79 |
-
- **Commercial licensing** — Use our datasets and models in commercial pipelines
|
| 80 |
-
- **Custom models** — Trained on your domain, evaluated against the threat models we've published
|
| 81 |
-
|
| 82 |
-
📧 **sales@scam.ai** · 🌐 **[scam.ai](https://www.scam.ai)**
|
| 83 |
-
|
| 84 |
-
---
|
| 85 |
|
| 86 |
-
|
|
|
|
|
|
|
|
|
|
| 87 |
|
| 88 |
-
|
| 89 |
-
- 📥 **Download** any dataset (free for non-commercial research, just provide name + email)
|
| 90 |
-
- 📝 **Cite** our papers if you publish work building on these resources
|
| 91 |
-
- 🐛 **Open a discussion** on any dataset to report issues or share results
|
| 92 |
|
| 93 |
---
|
| 94 |
|
| 95 |
-
*Building detection
|
|
|
|
| 9 |
|
| 10 |
# Scam.AI
|
| 11 |
|
| 12 |
+
**Detection systems for AI-driven fraud.**
|
| 13 |
|
| 14 |
+
We build production-grade detectors for deepfakes, document forgery, AI-generated media, and adversarial attacks against identity verification — and release the underlying benchmarks for the research community.
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
🌐 [scam.ai](https://www.scam.ai) · 📑 [Research](https://www.scam.ai/en/research) · 💼 sales@scam.ai
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
---
|
| 19 |
|
| 20 |
+
## 📚 Open Datasets
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
+
7 datasets · email-gated · CC-BY-NC-SA 4.0 · auto-approved
|
|
|
|
| 23 |
|
| 24 |
+
| Dataset | What it is |
|
| 25 |
+
|---|---|
|
| 26 |
+
| [**RWFS**](https://huggingface.co/datasets/Scam-AI/RWFS) 🎭 | 847 real-world deepfakes from 8 production faceswap tools. Reveals a 30+ pt AUC gap between academic and real-world performance. |
|
| 27 |
+
| [**AIForge-Doc v2**](https://huggingface.co/datasets/Scam-AI/AIForge-Doc-v2) 📄 | 3,066 GPT-Image-2 inpainted document forgeries with pixel-precise masks. |
|
| 28 |
+
| [**AIForge-Doc v1**](https://huggingface.co/datasets/Scam-AI/AIForge-Doc-v1) 📄 | 4,061 forgeries via Gemini 2.5 / Ideogram v2. Cross-generator pairing with v2. |
|
| 29 |
+
| [**GPT4o-Receipt**](https://huggingface.co/datasets/Scam-AI/gpt4o-receipt) 📄 | 935 fully AI-synthesized receipts across 159 merchant categories. |
|
| 30 |
+
| [**GPT-Image-2 Twitter**](https://huggingface.co/datasets/Scam-AI/gpt-image-2) 🖼️ | 10,217 confirmed GPT-Image-2 outputs scraped in the first week post-launch. |
|
| 31 |
+
| [**Age Adversarial Attack**](https://huggingface.co/datasets/Scam-AI/age-adversarial-attack) 🛡️ | 5,809 cosmetic attacks fooling production age estimators 69% of the time. *(CVPR 2026)* |
|
| 32 |
+
| [**Synthetic Gaze Reading**](https://huggingface.co/datasets/Scam-AI/synthetic-gaze-reading) 👁️ | 12 hours of synthetic eye-movement video for interview liveness. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
---
|
| 35 |
|
| 36 |
## 💼 For Enterprise
|
| 37 |
|
| 38 |
+
Need production-grade detection?
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
+
- **Detection APIs** with latency / accuracy SLAs
|
| 41 |
+
- **On-premise deployment** for regulated industries
|
| 42 |
+
- **Commercial licensing** of our datasets and models
|
| 43 |
+
- **Custom models** trained on your domain
|
| 44 |
|
| 45 |
+
📧 **sales@scam.ai**
|
|
|
|
|
|
|
|
|
|
| 46 |
|
| 47 |
---
|
| 48 |
|
| 49 |
+
*Building detection for an era when every digital artifact is suspect.*
|