# Phase 2: Voice Implementation - Quick Start Guide ## What is Phase 2? Phase 2 adds **live two-way voice conversation** to the ScamShield AI honeypot: - **You speak** (as scammer) → AI transcribes → processes → **AI speaks back** - Completely isolated from Phase 1 (text honeypot) - Optional feature (enabled via `PHASE_2_ENABLED=true`) ## Architecture ``` Voice Input (You) → ASR (Whisper) → Text ↓ Phase 1 Honeypot (Unchanged) ↓ Voice Output (AI) ← TTS (gTTS) ← Text Reply ``` **Key Point:** Phase 1 text honeypot is **not modified**. Voice is just input/output wrapper. ## Quick Setup ### 1. Install Dependencies ```bash # Install Phase 2 dependencies pip install -r requirements-phase2.txt # Note: PyAudio may need system packages # Windows: pip install pipwin && pipwin install pyaudio # Linux: sudo apt-get install portaudio19-dev # Mac: brew install portaudio ``` ### 2. Configure Environment ```bash # Add to your .env file PHASE_2_ENABLED=true WHISPER_MODEL=base TTS_ENGINE=gtts VOICE_FRAUD_DETECTION=false ``` ### 3. Start Server ```bash # Start FastAPI server (same as Phase 1) python -m uvicorn app.main:app --reload --host 0.0.0.0 --port 8000 ``` ### 4. Open Voice UI ``` Open in browser: http://localhost:8000/ui/voice.html ``` ## Testing the Voice Feature ### Option 1: Record Live 1. Click **"Start Recording"** 2. Speak as a scammer (e.g., "Your account is blocked. Send OTP immediately.") 3. Click **"Stop Recording"** 4. Wait for AI to: - Transcribe your voice - Process through honeypot - Reply with voice ### Option 2: Upload Audio File 1. Click **"Upload Audio File"** 2. Select a `.wav`, `.mp3`, or `.m4a` file 3. AI processes and replies ## API Endpoint ### POST `/api/v1/voice/engage` **Request:** ```bash curl -X POST "http://localhost:8000/api/v1/voice/engage" \ -H "x-api-key: dev-key-12345" \ -F "audio_file=@recording.wav" \ -F "session_id=voice-test-001" \ -F "language=auto" ``` **Response:** ```json { "session_id": "voice-test-001", "scam_detected": true, "scam_confidence": 0.92, "scam_type": "financial_fraud", "turn_count": 1, "ai_reply_text": "Oh no! What should I do? Can you help me?", "ai_reply_audio_url": "/api/v1/voice/audio/reply_xyz.mp3", "transcription": { "text": "Your account is blocked. Send OTP immediately.", "language": "en", "confidence": 0.95 }, "voice_fraud": null, "extracted_intelligence": { "upi_ids": [], "bank_accounts": [], "phone_numbers": [], "urls": [] }, "processing_time_ms": 3450 } ``` ## File Structure ``` app/ ├── voice/ # NEW: Phase 2 voice modules │ ├── __init__.py │ ├── asr.py # Whisper ASR │ ├── tts.py # gTTS text-to-speech │ └── fraud_detector.py # Optional voice fraud detection ├── api/ │ ├── voice_endpoints.py # NEW: Voice API endpoints │ └── voice_schemas.py # NEW: Voice API schemas └── ... (Phase 1 unchanged) ui/ ├── voice.html # NEW: Voice UI ├── voice.js # NEW: Voice UI logic ├── voice.css # NEW: Voice UI styles └── ... (Phase 1 unchanged) PHASE_2_VOICE_IMPLEMENTATION_PLAN.md # Full implementation plan requirements-phase2.txt # Phase 2 dependencies .env.phase2.example # Phase 2 config example ``` ## Impact on Phase 1 **ZERO IMPACT:** - ✅ Phase 1 text honeypot unchanged - ✅ All existing tests pass - ✅ Existing API endpoints unchanged - ✅ Existing UI unchanged - ✅ Phase 2 is opt-in (disabled by default) ## Performance | Metric | Target | Notes | |--------|--------|-------| | ASR Latency | <2s | Whisper base model | | TTS Latency | <1s | gTTS | | Total Loop | <5s | Voice in → Voice out | | Accuracy | >85% | Transcription WER | ## Troubleshooting ### "Voice API unavailable" - Check `PHASE_2_ENABLED=true` in `.env` - Verify dependencies installed: `pip list | grep whisper` - Check logs: `tail -f logs/app.log` ### "Microphone access denied" - Browser needs microphone permission - Check browser settings → Privacy → Microphone - Use HTTPS or localhost (required for `getUserMedia`) ### "PyAudio installation failed" ```bash # Windows pip install pipwin pipwin install pyaudio # Linux sudo apt-get install portaudio19-dev python3-pyaudio pip install pyaudio # Mac brew install portaudio pip install pyaudio ``` ### "Whisper model download slow" - First run downloads model (~150MB for base) - Models cached in `~/.cache/whisper/` - Use smaller model: `WHISPER_MODEL=tiny` ## Advanced Features ### Voice Fraud Detection (Optional) Detect synthetic/deepfake voices: ```bash # Enable in .env VOICE_FRAUD_DETECTION=true # Install additional dependency pip install resemblyzer ``` Response includes: ```json "voice_fraud": { "is_synthetic": false, "confidence": 0.85, "risk_level": "low" } ``` ### Custom TTS Voice Future: Replace gTTS with IndicTTS for better Indic language support. ### Streaming Audio Future: Real-time audio streaming instead of record-then-send. ## Testing Checklist - [ ] Install Phase 2 dependencies - [ ] Set `PHASE_2_ENABLED=true` - [ ] Start server - [ ] Open voice UI - [ ] Record voice message - [ ] Verify transcription - [ ] Verify AI reply (text) - [ ] Verify AI reply (audio) - [ ] Check metadata (language, confidence) - [ ] Verify Phase 1 tests still pass ## Next Steps 1. **Review:** Read `PHASE_2_VOICE_IMPLEMENTATION_PLAN.md` for full details 2. **Install:** Run `pip install -r requirements-phase2.txt` 3. **Configure:** Copy settings from `.env.phase2.example` to `.env` 4. **Test:** Open `ui/voice.html` and try recording 5. **Deploy:** Set `PHASE_2_ENABLED=true` in production ## Support - Full plan: `PHASE_2_VOICE_IMPLEMENTATION_PLAN.md` - Issues: Check logs in `logs/app.log` - Questions: Review implementation plan sections --- **Phase 2 Status:** ✅ Planned, 🚧 Ready to Implement **Estimated Implementation Time:** 17-21 hours **Priority:** Optional (Phase 1 is complete and sufficient for competition)