scam / Docs /Doc.md
Gankit12's picture
Upload 129 files
31f0e50 verified

**ScamShield AI: National-Scale Real-Time Fraud Call Defense System**
**India AI Impact Buildathon 2026 Submission Document**

### Executive Summary
ScamShield AI is a complete, AI-powered prototype addressing India's escalating fraud call crisis (500,000+ scam calls daily, ₹60+ crore daily losses, 3+ spam calls per citizen). It delivers real-time audio call analysis for spam/fraud detection and alerting, augmented by an agentic honeypot for counteraction. Advanced features include **voice fingerprinting** (banning voice prints instead of spoofed numbers), repeat scammer identification via tone/pattern/style, **voice fraud detection** (covering both AI-generated synthetic/deepfake voices and cloned real-person voices), scammer location tracing, and automated report generation/sending to nearest police station or cybercrime portals. During honeypot activation, the system disables mute/unmute and call end buttons to ensure uninterrupted AI engagement, with a user-accessible "unlock" option for override.

Key differentiators: Multilingual support (focus on English, Hindi, Gujarati + broad Indic coverage), live call processing, responsible AI compliance, and scalability for 1.4 billion users. Built in Python for rapid prototyping and demo feasibility. This hybrid solution exceeds baseline requirements (Challenge 1: real-time detection/alerting) and incorporates Challenge 2 (honeypot), positioning it for #1 ranking through deep problem understanding, innovative design, strong responsible AI, and clear pitch/demo readiness.

AI usage: ~80-90% of core functionality (detection, voice biometrics/fraud analysis, honeypot conversation, behavior analysis). Non-AI components (~10-20%): audio I/O, UI/demo wrappers, report formatting.

### Problem Statement
India faces a massive and evolving fraud crisis:
- 500,000+ scam calls flood daily.
- ₹60+ crore lost daily to fraudulent calls.
- 3+ spam calls per citizen per day.
- Scams predominantly via calls (UPI frauds, fake loans, police/bank impersonation, **voice cloning/deepfake scams**).
- Challenges: Spoofed numbers, multilingual/regional accents (Hindi, Gujarati, Tamil, etc.), emotional manipulation, repeat offenders, rural/elderly vulnerability, lack of real-time defense and law enforcement linkage. In 2025-2026, AI-powered voice fraud has surged dramatically—scammers clone voices from just seconds of audio (e.g., from social media/WhatsApp) to impersonate family members, officials, or trusted contacts in "emergency" scams (fake kidnappings, accidents, arrests). Recent cases include teachers/individuals losing ₹1 lakh+ to cloned cousin/friend voices, with India showing high vulnerability (47% surveyed affected or knew victims, far above global averages). Synthetic AI voices (deepfakes) further enable scalable, artifact-reduced impersonation.

"Solve Problem of Your Choice" flexibility is leveraged for a proactive, full-cycle solution: detect → alert → engage honeypot → trace → report → enable police action.

### Solution Overview
**ScamShield AI** – A real-time, edge/cloud-capable system that:
1. Processes live or recorded audio calls.
2. Detects scams via multimodal analysis (audio patterns, keywords, sentiment/behavior).
3. Uses voice fingerprinting to identify known/repeat scammers instantly.
4. Includes **Voice Fraud Detector** for both synthetic (AI-generated/deepfake) voices and cloned real-person voices.
5. Triggers honeypot agent on detection/match, with secure handover: Disables mute/unmute and call end buttons during AI engagement to prevent interruptions, ensuring safe stalling; users can override via an on-screen "unlock" option.
6. Traces scammer location (where possible).
7. Generates detailed reports (transcript, embeddings match, voice fraud flags, location, extracted data).
8. Simulates secure sending to nearest police station/cybercrime portal (with consent).
9. Supports multiple Indian languages for inclusive coverage.

### Multilingual Support
- Primary focus: English, Hindi, Gujarati (high accuracy for user location relevance in Gujarat/India-wide).
- Core models:
- ai4bharat/indicwav2vec-v1-hindi (fine-tuned for 9+ Indic languages including Hindi, Gujarati).
- facebook/wav2vec2-xls-r-300m (pretrained on 128 languages, strong generalization to Indic + global).
- Handles code-switching, regional accents/dialects.
- Honeypot responses generated/adapted in detected language.
- Voice Fraud Detector operates language-agnostically (focuses on audio artifacts/biometrics).
- Limitations: Less common dialects require additional fine-tuning; accuracy highest on major languages (85-95% WER benchmarks).

### System Architecture
1. **Input Layer**: Live audio (PyAudio mic/stream for calls) or file (.wav). Supports mobile integration (Android permissions for call recording/TelephonyManager for number).
2. **Preprocessing**: Normalize, resample to 16kHz, chunk into 5-10s segments for real-time.
3. **Detection Layer**:
- Audio features: Librosa (MFCCs, spectrograms).
- Speech-to-Text: Wav2Vec2 / IndicWav2Vec.
- Keyword/Behavior: IndicBERT or multilingual sentiment (urgency, fear tactics); India-specific scam keywords (UPI, OTP, bank transfer, police/arrest).
- Voice Fingerprinting (core innovation): Wav2Vec2 embeddings → Vector DB query.
- **Voice Fraud Detector** (covers AI synthetic/deepfake and real-voice cloning):
- Synthetic/deepfake detection: Specialized models (e.g., Resemblyzer, Hugging Face audio-classification pipelines, or fine-tuned on ASVspoof/FakeAVCeleb datasets) to flag AI artifacts (unnatural harmonics, prosody inconsistencies, lack of breath/micro-variations).
- Cloned real voice detection: Compare incoming embedding against user-enrolled "trusted voice gallery" (consented family/friend samples) + cross-check known scam embeddings in vector DB. If high match to trusted voice + scam context → flag as cloning risk.
- Unified scoring: Synthetic probability >70% OR trusted voice match + scam signals → high-risk voice fraud flag (triggers alert/honeypot).
4. **Voice Fingerprinting & Vector DB**:
- Embeddings generated from processed audio (mean-pooled hidden states, dimension 768).
- Vector DB: Pinecone (cloud, free tier, cosine metric) or ChromaDB (local prototype).
- Store: Anonymized embeddings + metadata (scammer_id, tone/style patterns, voice fraud type flag: synthetic/cloned).
- Query: Cosine distance threshold (e.g., <0.2 = match) → instant honeypot trigger on repeat.
- Repeat identification: Captures tone, pitch variation, rhythm, pauses, accent/style implicitly via embeddings; enhanced by voice fraud checks.
5. **Honeypot Layer**: LLM (gpt2 baseline; optional OpenAI/Groq for naturalness) simulates gullible user, stalls scammer, extracts UPI/links/phishing safely (regex + NLP), logs for enforcement. Adapts to voice fraud flags (e.g., probes for verification if cloned). Upon activation (user consent via prompt), the system disables device mute/unmute and call end buttons to ensure uninterrupted AI-scammer interaction (preventing accidental unmute or call cut that could alert the scammer). Users can re-enable via an on-screen "unlock" button (e.g., requiring PIN or tap confirmation) for emergency override.
6. **Tracing & Reporting**:
- Location: phonenumbers lib + free geolocation APIs (ipapi.co or similar; note spoofing limitations ~50-70% accuracy).
- Nearest Police: Google Maps Places API query for "police station near [location]".
- Report: fpdf-generated PDF including transcription, voice match details, voice fraud flags (e.g., "Synthetic AI voice: 85%" or "Potential cloned real voice"), location, extracted info.
- Auto-send: Simulated (email/smtplib or mock POST to cybercrime.gov.in); real deployment needs API partnerships.
7. **Alert/Output**: Instant user notification (popup/vibrate in app demo), honeypot activation (with button disable for safety), report trigger (with consent). During honeypot, UI shows real-time transcript and "unlock" option to re-enable buttons if needed (e.g., for legitimate calls misdetected).
8. **Backend/Demo**: Flask/Streamlit web interface for upload/live demo; exportable to mobile (Flutter API calls). Mobile prototype simulates button disable via app overlays or Android system hooks.

### Technical Implementation Details
- **Language**: Python 3.
- **Core Libraries** (all free/open-source):
- Audio: librosa, soundfile, pyaudio.
- AI/ML: torch, transformers (Hugging Face), numpy.
- Vector DB: chromadb (local) / pinecone-client.
- Parsing/Geo: phonenumbers, requests.
- Reports: fpdf.
- Voice Fraud Detection: Additional pipelines (e.g., resemblyzer or open-source deepfake tools like those from ASVspoof benchmarks).
- Demo/UI: flask or streamlit; for button disable simulation in prototype, use UI flags or mock Android APIs.
- **APIs**:
- Free: Hugging Face models (offline downloadable), phonenumbers.
- Optional paid: OpenAI (honeypot quality), Google Maps API (police stations), Pinecone (production scale).
- **Live Calls**: Fully supported via PyAudio chunked processing. Demo: Role-play WhatsApp/actual call recording (permissions required on device). Processes in real-time (<2s latency target), including voice fraud checks. Honeypot handover includes simulated button lock (e.g., via app UI elements).
- **Pre-trained vs Custom**: Starts with pre-trained multilingual models; fine-tune on labeled scam/normal Indic audio datasets (Common Voice, synthetic TTS scams, ASVspoof for deepfakes) for 90%+ accuracy.
- **Metrics (Target/Demo)**: Detection accuracy 90-95% F1 on test data, voice match 95% on repeats, voice fraud detection 85-95% (benchmarks), latency <2s, low false positives via thresholds.
- **Deployment Clarification**: The full production system is envisioned as a mobile app (e.g., Android/iOS) with native OS-level integrations for features like call interception, button disabling, and seamless audio routing. However, for this hackathon prototype (due to timeline constraints), we implement a web/laptop-based demo using Python tools (Flask/Streamlit/PyAudio) to simulate the experience—role-playing live calls, detection, honeypot, and UI elements like button locks. This avoids confusion: The demo proves core feasibility, while the mobile vision is outlined for real-world scaling.

### Responsible AI & Ethics
- Privacy: On-device/edge processing priority; no raw audio storage; hashed embeddings only; user consent for DB storage/report sending/voice fraud flagging/trusted gallery enrollment/honeypot activation (including button disable).
- Bias/Fairness: Train/audit on diverse Indic accents (North/South, rural/urban), gender, age; use fairness tools (e.g., AIF360 equivalent). Voice fraud models audited for bias in voice types/synthesis.
- Safety: Honeypot avoids escalation/provocation; human review option for reports; no auto-action without consent. Voice fraud false positives mitigated (e.g., multi-factor confirmation). Button disable enhances safety by preventing user interruptions during scammer engagement, reducing risk of alerting fraudsters; "unlock" ensures user control (e.g., for misdetections or emergencies).
- Inclusivity: Multilingual design empowers non-English users; focuses on vulnerable groups (elderly, rural) hit hardest by cloning scams.
- Compliance: Aligned with IT Act 2000, DPDP Act 2023, TRAI guidelines.
- Limitations Acknowledged: Spoofing reduces tracing accuracy; false positives mitigated by thresholds/multi-factor detection; real police integration requires official partnerships (e.g., DoT, cybercrime.gov.in API); advanced 2026 voice fraud may require ongoing model updates; button disable is mobile-dependent (simulated in prototype); prototype uses web/laptop simulation for demo, with full features targeted for mobile app deployment.

### Demo & Submission Deliverables
- GitHub repo with: Full code, README (this document), requirements.txt, sample audio (scam/normal, Indic languages, synthetic AI voices, cloned simulations), mock DB seeds.
- Demo Video (5 min max): Live call simulation → detection (incl. voice fraud flags for synthetic/cloned) → honeypot activation (show button disable/unlock UI) → tracing → PDF report → simulated send.
- Pitch Structure: Problem stats → Solution flow (diagrams) → Innovation (voice fingerprinting + comprehensive voice fraud detection + secure honeypot handover) → Responsible AI → Impact metrics → Team.
- Evaluation Alignment: Deep problem understanding (India-specific stats, 2025-2026 voice fraud trends), Feasible design (prototype runs), Responsible AI (explicit section), Clear pitch (video + slides).

### Impact & Scalability
- Prevents portion of ₹60 crore daily losses via real-time alerts, including countermeasures against surging voice cloning/deepfake scams.
- Enables faster police response (e.g., raids on scam hubs).
- Voice bans + fraud detection counter evolving threats (spoofing, cloning, synthetic voices); secure handover (button disable) maximizes honeypot effectiveness.
- Scalable: Cloud deployment (AWS/GCP Lambda), crowdsourced (consented) scammer embeddings for national database.
- Post-hackathon: Potential integration with Truecaller-like apps, govt portals, HCL GUVI network.

### Future Work
- Fine-tune custom models on larger Indic scam datasets, including advanced synthetic/cloned voice samples.
- Full mobile app (Flutter + on-device TensorFlow Lite), with native button disable via system APIs.
- Official APIs: TRAI CNAP, cybercrime.gov.in, DoT fraud reporting.
- Advanced biometrics: Add prosody analysis, speaker diarization, improved voice fraud resilience.
- Community model training for continuous improvement.

This document captures every discussed element comprehensively. Implement, test thoroughly on live Indic audio demos (including handover simulation), and submit by February 5 deadline. This positions the entry for national stage presentation and top prize. Execute with precision for #1 win.