scam / PHASE_2_README.md
Gankit12's picture
Relative API URLs, docker-compose port fix, Phase 2 voice, HF deploy guide
6a4a552
# Phase 2: Voice Implementation - Quick Start Guide
## What is Phase 2?
Phase 2 adds **live two-way voice conversation** to the ScamShield AI honeypot:
- **You speak** (as scammer) β†’ AI transcribes β†’ processes β†’ **AI speaks back**
- Completely isolated from Phase 1 (text honeypot)
- Optional feature (enabled via `PHASE_2_ENABLED=true`)
## Architecture
```
Voice Input (You) β†’ ASR (Whisper) β†’ Text
↓
Phase 1 Honeypot (Unchanged)
↓
Voice Output (AI) ← TTS (gTTS) ← Text Reply
```
**Key Point:** Phase 1 text honeypot is **not modified**. Voice is just input/output wrapper.
## Quick Setup
### 1. Install Dependencies
```bash
# Install Phase 2 dependencies
pip install -r requirements-phase2.txt
# Note: PyAudio may need system packages
# Windows: pip install pipwin && pipwin install pyaudio
# Linux: sudo apt-get install portaudio19-dev
# Mac: brew install portaudio
```
### 2. Configure Environment
```bash
# Add to your .env file
PHASE_2_ENABLED=true
WHISPER_MODEL=base
TTS_ENGINE=gtts
VOICE_FRAUD_DETECTION=false
```
### 3. Start Server
```bash
# Start FastAPI server (same as Phase 1)
python -m uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
```
### 4. Open Voice UI
```
Open in browser: http://localhost:8000/ui/voice.html
```
## Testing the Voice Feature
### Option 1: Record Live
1. Click **"Start Recording"**
2. Speak as a scammer (e.g., "Your account is blocked. Send OTP immediately.")
3. Click **"Stop Recording"**
4. Wait for AI to:
- Transcribe your voice
- Process through honeypot
- Reply with voice
### Option 2: Upload Audio File
1. Click **"Upload Audio File"**
2. Select a `.wav`, `.mp3`, or `.m4a` file
3. AI processes and replies
## API Endpoint
### POST `/api/v1/voice/engage`
**Request:**
```bash
curl -X POST "http://localhost:8000/api/v1/voice/engage" \
-H "x-api-key: dev-key-12345" \
-F "audio_file=@recording.wav" \
-F "session_id=voice-test-001" \
-F "language=auto"
```
**Response:**
```json
{
"session_id": "voice-test-001",
"scam_detected": true,
"scam_confidence": 0.92,
"scam_type": "financial_fraud",
"turn_count": 1,
"ai_reply_text": "Oh no! What should I do? Can you help me?",
"ai_reply_audio_url": "/api/v1/voice/audio/reply_xyz.mp3",
"transcription": {
"text": "Your account is blocked. Send OTP immediately.",
"language": "en",
"confidence": 0.95
},
"voice_fraud": null,
"extracted_intelligence": {
"upi_ids": [],
"bank_accounts": [],
"phone_numbers": [],
"urls": []
},
"processing_time_ms": 3450
}
```
## File Structure
```
app/
β”œβ”€β”€ voice/ # NEW: Phase 2 voice modules
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ asr.py # Whisper ASR
β”‚ β”œβ”€β”€ tts.py # gTTS text-to-speech
β”‚ └── fraud_detector.py # Optional voice fraud detection
β”œβ”€β”€ api/
β”‚ β”œβ”€β”€ voice_endpoints.py # NEW: Voice API endpoints
β”‚ └── voice_schemas.py # NEW: Voice API schemas
└── ... (Phase 1 unchanged)
ui/
β”œβ”€β”€ voice.html # NEW: Voice UI
β”œβ”€β”€ voice.js # NEW: Voice UI logic
β”œβ”€β”€ voice.css # NEW: Voice UI styles
└── ... (Phase 1 unchanged)
PHASE_2_VOICE_IMPLEMENTATION_PLAN.md # Full implementation plan
requirements-phase2.txt # Phase 2 dependencies
.env.phase2.example # Phase 2 config example
```
## Impact on Phase 1
**ZERO IMPACT:**
- βœ… Phase 1 text honeypot unchanged
- βœ… All existing tests pass
- βœ… Existing API endpoints unchanged
- βœ… Existing UI unchanged
- βœ… Phase 2 is opt-in (disabled by default)
## Performance
| Metric | Target | Notes |
|--------|--------|-------|
| ASR Latency | <2s | Whisper base model |
| TTS Latency | <1s | gTTS |
| Total Loop | <5s | Voice in β†’ Voice out |
| Accuracy | >85% | Transcription WER |
## Troubleshooting
### "Voice API unavailable"
- Check `PHASE_2_ENABLED=true` in `.env`
- Verify dependencies installed: `pip list | grep whisper`
- Check logs: `tail -f logs/app.log`
### "Microphone access denied"
- Browser needs microphone permission
- Check browser settings β†’ Privacy β†’ Microphone
- Use HTTPS or localhost (required for `getUserMedia`)
### "PyAudio installation failed"
```bash
# Windows
pip install pipwin
pipwin install pyaudio
# Linux
sudo apt-get install portaudio19-dev python3-pyaudio
pip install pyaudio
# Mac
brew install portaudio
pip install pyaudio
```
### "Whisper model download slow"
- First run downloads model (~150MB for base)
- Models cached in `~/.cache/whisper/`
- Use smaller model: `WHISPER_MODEL=tiny`
## Advanced Features
### Voice Fraud Detection (Optional)
Detect synthetic/deepfake voices:
```bash
# Enable in .env
VOICE_FRAUD_DETECTION=true
# Install additional dependency
pip install resemblyzer
```
Response includes:
```json
"voice_fraud": {
"is_synthetic": false,
"confidence": 0.85,
"risk_level": "low"
}
```
### Custom TTS Voice
Future: Replace gTTS with IndicTTS for better Indic language support.
### Streaming Audio
Future: Real-time audio streaming instead of record-then-send.
## Testing Checklist
- [ ] Install Phase 2 dependencies
- [ ] Set `PHASE_2_ENABLED=true`
- [ ] Start server
- [ ] Open voice UI
- [ ] Record voice message
- [ ] Verify transcription
- [ ] Verify AI reply (text)
- [ ] Verify AI reply (audio)
- [ ] Check metadata (language, confidence)
- [ ] Verify Phase 1 tests still pass
## Next Steps
1. **Review:** Read `PHASE_2_VOICE_IMPLEMENTATION_PLAN.md` for full details
2. **Install:** Run `pip install -r requirements-phase2.txt`
3. **Configure:** Copy settings from `.env.phase2.example` to `.env`
4. **Test:** Open `ui/voice.html` and try recording
5. **Deploy:** Set `PHASE_2_ENABLED=true` in production
## Support
- Full plan: `PHASE_2_VOICE_IMPLEMENTATION_PLAN.md`
- Issues: Check logs in `logs/app.log`
- Questions: Review implementation plan sections
---
**Phase 2 Status:** βœ… Planned, 🚧 Ready to Implement
**Estimated Implementation Time:** 17-21 hours
**Priority:** Optional (Phase 1 is complete and sufficient for competition)