Spaces:

Gankit12
/

scam

Sleeping

File size: 6,270 Bytes

6a4a552

# Phase 2: Voice Implementation - Quick Start Guide

## What is Phase 2?

Phase 2 adds **live two-way voice conversation** to the ScamShield AI honeypot:

- **You speak** (as scammer) → AI transcribes → processes → **AI speaks back**
- Completely isolated from Phase 1 (text honeypot)
- Optional feature (enabled via `PHASE_2_ENABLED=true`)

## Architecture

```
Voice Input (You) → ASR (Whisper) → Text
                                      ↓
                              Phase 1 Honeypot (Unchanged)
                                      ↓
Voice Output (AI) ← TTS (gTTS) ← Text Reply
```

**Key Point:** Phase 1 text honeypot is **not modified**. Voice is just input/output wrapper.

## Quick Setup

### 1. Install Dependencies

```bash
# Install Phase 2 dependencies
pip install -r requirements-phase2.txt

# Note: PyAudio may need system packages
# Windows: pip install pipwin && pipwin install pyaudio
# Linux: sudo apt-get install portaudio19-dev
# Mac: brew install portaudio
```

### 2. Configure Environment

```bash
# Add to your .env file
PHASE_2_ENABLED=true
WHISPER_MODEL=base
TTS_ENGINE=gtts
VOICE_FRAUD_DETECTION=false
```

### 3. Start Server

```bash
# Start FastAPI server (same as Phase 1)
python -m uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
```

### 4. Open Voice UI

```
Open in browser: http://localhost:8000/ui/voice.html
```

## Testing the Voice Feature

### Option 1: Record Live

1. Click **"Start Recording"**
2. Speak as a scammer (e.g., "Your account is blocked. Send OTP immediately.")
3. Click **"Stop Recording"**
4. Wait for AI to:
   - Transcribe your voice
   - Process through honeypot
   - Reply with voice

### Option 2: Upload Audio File

1. Click **"Upload Audio File"**
2. Select a `.wav`, `.mp3`, or `.m4a` file
3. AI processes and replies

## API Endpoint

### POST `/api/v1/voice/engage`

**Request:**
```bash
curl -X POST "http://localhost:8000/api/v1/voice/engage" \
  -H "x-api-key: dev-key-12345" \
  -F "audio_file=@recording.wav" \
  -F "session_id=voice-test-001" \
  -F "language=auto"
```

**Response:**
```json
{
  "session_id": "voice-test-001",
  "scam_detected": true,
  "scam_confidence": 0.92,
  "scam_type": "financial_fraud",
  "turn_count": 1,
  "ai_reply_text": "Oh no! What should I do? Can you help me?",
  "ai_reply_audio_url": "/api/v1/voice/audio/reply_xyz.mp3",
  "transcription": {
    "text": "Your account is blocked. Send OTP immediately.",
    "language": "en",
    "confidence": 0.95
  },
  "voice_fraud": null,
  "extracted_intelligence": {
    "upi_ids": [],
    "bank_accounts": [],
    "phone_numbers": [],
    "urls": []
  },
  "processing_time_ms": 3450
}
```

## File Structure

```
app/
├── voice/                    # NEW: Phase 2 voice modules
│   ├── __init__.py
│   ├── asr.py               # Whisper ASR
│   ├── tts.py               # gTTS text-to-speech
│   └── fraud_detector.py    # Optional voice fraud detection
├── api/
│   ├── voice_endpoints.py   # NEW: Voice API endpoints
│   └── voice_schemas.py     # NEW: Voice API schemas
└── ... (Phase 1 unchanged)

ui/
├── voice.html               # NEW: Voice UI
├── voice.js                 # NEW: Voice UI logic
├── voice.css                # NEW: Voice UI styles
└── ... (Phase 1 unchanged)

PHASE_2_VOICE_IMPLEMENTATION_PLAN.md  # Full implementation plan
requirements-phase2.txt                # Phase 2 dependencies
.env.phase2.example                    # Phase 2 config example
```

## Impact on Phase 1

**ZERO IMPACT:**

- ✅ Phase 1 text honeypot unchanged
- ✅ All existing tests pass
- ✅ Existing API endpoints unchanged
- ✅ Existing UI unchanged
- ✅ Phase 2 is opt-in (disabled by default)

## Performance

| Metric | Target | Notes |
|--------|--------|-------|
| ASR Latency | <2s | Whisper base model |
| TTS Latency | <1s | gTTS |
| Total Loop | <5s | Voice in → Voice out |
| Accuracy | >85% | Transcription WER |

## Troubleshooting

### "Voice API unavailable"

- Check `PHASE_2_ENABLED=true` in `.env`
- Verify dependencies installed: `pip list | grep whisper`
- Check logs: `tail -f logs/app.log`

### "Microphone access denied"

- Browser needs microphone permission
- Check browser settings → Privacy → Microphone
- Use HTTPS or localhost (required for `getUserMedia`)

### "PyAudio installation failed"

```bash
# Windows
pip install pipwin
pipwin install pyaudio

# Linux
sudo apt-get install portaudio19-dev python3-pyaudio
pip install pyaudio

# Mac
brew install portaudio
pip install pyaudio
```

### "Whisper model download slow"

- First run downloads model (~150MB for base)
- Models cached in `~/.cache/whisper/`
- Use smaller model: `WHISPER_MODEL=tiny`

## Advanced Features

### Voice Fraud Detection (Optional)

Detect synthetic/deepfake voices:

```bash
# Enable in .env
VOICE_FRAUD_DETECTION=true

# Install additional dependency
pip install resemblyzer
```

Response includes:
```json
"voice_fraud": {
  "is_synthetic": false,
  "confidence": 0.85,
  "risk_level": "low"
}
```

### Custom TTS Voice

Future: Replace gTTS with IndicTTS for better Indic language support.

### Streaming Audio

Future: Real-time audio streaming instead of record-then-send.

## Testing Checklist

- [ ] Install Phase 2 dependencies
- [ ] Set `PHASE_2_ENABLED=true`
- [ ] Start server
- [ ] Open voice UI
- [ ] Record voice message
- [ ] Verify transcription
- [ ] Verify AI reply (text)
- [ ] Verify AI reply (audio)
- [ ] Check metadata (language, confidence)
- [ ] Verify Phase 1 tests still pass

## Next Steps

1. **Review:** Read `PHASE_2_VOICE_IMPLEMENTATION_PLAN.md` for full details
2. **Install:** Run `pip install -r requirements-phase2.txt`
3. **Configure:** Copy settings from `.env.phase2.example` to `.env`
4. **Test:** Open `ui/voice.html` and try recording
5. **Deploy:** Set `PHASE_2_ENABLED=true` in production

## Support

- Full plan: `PHASE_2_VOICE_IMPLEMENTATION_PLAN.md`
- Issues: Check logs in `logs/app.log`
- Questions: Review implementation plan sections

---

**Phase 2 Status:** ✅ Planned, 🚧 Ready to Implement

**Estimated Implementation Time:** 17-21 hours

**Priority:** Optional (Phase 1 is complete and sufficient for competition)