File size: 6,270 Bytes
6a4a552 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 | # Phase 2: Voice Implementation - Quick Start Guide
## What is Phase 2?
Phase 2 adds **live two-way voice conversation** to the ScamShield AI honeypot:
- **You speak** (as scammer) β AI transcribes β processes β **AI speaks back**
- Completely isolated from Phase 1 (text honeypot)
- Optional feature (enabled via `PHASE_2_ENABLED=true`)
## Architecture
```
Voice Input (You) β ASR (Whisper) β Text
β
Phase 1 Honeypot (Unchanged)
β
Voice Output (AI) β TTS (gTTS) β Text Reply
```
**Key Point:** Phase 1 text honeypot is **not modified**. Voice is just input/output wrapper.
## Quick Setup
### 1. Install Dependencies
```bash
# Install Phase 2 dependencies
pip install -r requirements-phase2.txt
# Note: PyAudio may need system packages
# Windows: pip install pipwin && pipwin install pyaudio
# Linux: sudo apt-get install portaudio19-dev
# Mac: brew install portaudio
```
### 2. Configure Environment
```bash
# Add to your .env file
PHASE_2_ENABLED=true
WHISPER_MODEL=base
TTS_ENGINE=gtts
VOICE_FRAUD_DETECTION=false
```
### 3. Start Server
```bash
# Start FastAPI server (same as Phase 1)
python -m uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
```
### 4. Open Voice UI
```
Open in browser: http://localhost:8000/ui/voice.html
```
## Testing the Voice Feature
### Option 1: Record Live
1. Click **"Start Recording"**
2. Speak as a scammer (e.g., "Your account is blocked. Send OTP immediately.")
3. Click **"Stop Recording"**
4. Wait for AI to:
- Transcribe your voice
- Process through honeypot
- Reply with voice
### Option 2: Upload Audio File
1. Click **"Upload Audio File"**
2. Select a `.wav`, `.mp3`, or `.m4a` file
3. AI processes and replies
## API Endpoint
### POST `/api/v1/voice/engage`
**Request:**
```bash
curl -X POST "http://localhost:8000/api/v1/voice/engage" \
-H "x-api-key: dev-key-12345" \
-F "audio_file=@recording.wav" \
-F "session_id=voice-test-001" \
-F "language=auto"
```
**Response:**
```json
{
"session_id": "voice-test-001",
"scam_detected": true,
"scam_confidence": 0.92,
"scam_type": "financial_fraud",
"turn_count": 1,
"ai_reply_text": "Oh no! What should I do? Can you help me?",
"ai_reply_audio_url": "/api/v1/voice/audio/reply_xyz.mp3",
"transcription": {
"text": "Your account is blocked. Send OTP immediately.",
"language": "en",
"confidence": 0.95
},
"voice_fraud": null,
"extracted_intelligence": {
"upi_ids": [],
"bank_accounts": [],
"phone_numbers": [],
"urls": []
},
"processing_time_ms": 3450
}
```
## File Structure
```
app/
βββ voice/ # NEW: Phase 2 voice modules
β βββ __init__.py
β βββ asr.py # Whisper ASR
β βββ tts.py # gTTS text-to-speech
β βββ fraud_detector.py # Optional voice fraud detection
βββ api/
β βββ voice_endpoints.py # NEW: Voice API endpoints
β βββ voice_schemas.py # NEW: Voice API schemas
βββ ... (Phase 1 unchanged)
ui/
βββ voice.html # NEW: Voice UI
βββ voice.js # NEW: Voice UI logic
βββ voice.css # NEW: Voice UI styles
βββ ... (Phase 1 unchanged)
PHASE_2_VOICE_IMPLEMENTATION_PLAN.md # Full implementation plan
requirements-phase2.txt # Phase 2 dependencies
.env.phase2.example # Phase 2 config example
```
## Impact on Phase 1
**ZERO IMPACT:**
- β
Phase 1 text honeypot unchanged
- β
All existing tests pass
- β
Existing API endpoints unchanged
- β
Existing UI unchanged
- β
Phase 2 is opt-in (disabled by default)
## Performance
| Metric | Target | Notes |
|--------|--------|-------|
| ASR Latency | <2s | Whisper base model |
| TTS Latency | <1s | gTTS |
| Total Loop | <5s | Voice in β Voice out |
| Accuracy | >85% | Transcription WER |
## Troubleshooting
### "Voice API unavailable"
- Check `PHASE_2_ENABLED=true` in `.env`
- Verify dependencies installed: `pip list | grep whisper`
- Check logs: `tail -f logs/app.log`
### "Microphone access denied"
- Browser needs microphone permission
- Check browser settings β Privacy β Microphone
- Use HTTPS or localhost (required for `getUserMedia`)
### "PyAudio installation failed"
```bash
# Windows
pip install pipwin
pipwin install pyaudio
# Linux
sudo apt-get install portaudio19-dev python3-pyaudio
pip install pyaudio
# Mac
brew install portaudio
pip install pyaudio
```
### "Whisper model download slow"
- First run downloads model (~150MB for base)
- Models cached in `~/.cache/whisper/`
- Use smaller model: `WHISPER_MODEL=tiny`
## Advanced Features
### Voice Fraud Detection (Optional)
Detect synthetic/deepfake voices:
```bash
# Enable in .env
VOICE_FRAUD_DETECTION=true
# Install additional dependency
pip install resemblyzer
```
Response includes:
```json
"voice_fraud": {
"is_synthetic": false,
"confidence": 0.85,
"risk_level": "low"
}
```
### Custom TTS Voice
Future: Replace gTTS with IndicTTS for better Indic language support.
### Streaming Audio
Future: Real-time audio streaming instead of record-then-send.
## Testing Checklist
- [ ] Install Phase 2 dependencies
- [ ] Set `PHASE_2_ENABLED=true`
- [ ] Start server
- [ ] Open voice UI
- [ ] Record voice message
- [ ] Verify transcription
- [ ] Verify AI reply (text)
- [ ] Verify AI reply (audio)
- [ ] Check metadata (language, confidence)
- [ ] Verify Phase 1 tests still pass
## Next Steps
1. **Review:** Read `PHASE_2_VOICE_IMPLEMENTATION_PLAN.md` for full details
2. **Install:** Run `pip install -r requirements-phase2.txt`
3. **Configure:** Copy settings from `.env.phase2.example` to `.env`
4. **Test:** Open `ui/voice.html` and try recording
5. **Deploy:** Set `PHASE_2_ENABLED=true` in production
## Support
- Full plan: `PHASE_2_VOICE_IMPLEMENTATION_PLAN.md`
- Issues: Check logs in `logs/app.log`
- Questions: Review implementation plan sections
---
**Phase 2 Status:** β
Planned, π§ Ready to Implement
**Estimated Implementation Time:** 17-21 hours
**Priority:** Optional (Phase 1 is complete and sufficient for competition)
|