Spaces:

Gankit12
/

scam

Sleeping

App Files Files Community

scam / PHASE_2_README.md

Gankit12

Relative API URLs, docker-compose port fix, Phase 2 voice, HF deploy guide

6a4a552 about 1 month ago

preview code

raw

history blame contribute delete

6.27 kB

	# Phase 2: Voice Implementation - Quick Start Guide

	## What is Phase 2?

	Phase 2 adds live two-way voice conversation to the ScamShield AI honeypot:

	- You speak (as scammer) → AI transcribes → processes → AI speaks back
	- Completely isolated from Phase 1 (text honeypot)
	- Optional feature (enabled via `PHASE_2_ENABLED=true`)

	## Architecture

	```
	Voice Input (You) → ASR (Whisper) → Text
	↓
	Phase 1 Honeypot (Unchanged)
	↓
	Voice Output (AI) ← TTS (gTTS) ← Text Reply
	```

	Key Point: Phase 1 text honeypot is not modified. Voice is just input/output wrapper.

	## Quick Setup

	### 1. Install Dependencies

	```bash
	# Install Phase 2 dependencies
	pip install -r requirements-phase2.txt

	# Note: PyAudio may need system packages
	# Windows: pip install pipwin && pipwin install pyaudio
	# Linux: sudo apt-get install portaudio19-dev
	# Mac: brew install portaudio
	```

	### 2. Configure Environment

	```bash
	# Add to your .env file
	PHASE_2_ENABLED=true
	WHISPER_MODEL=base
	TTS_ENGINE=gtts
	VOICE_FRAUD_DETECTION=false
	```

	### 3. Start Server

	```bash
	# Start FastAPI server (same as Phase 1)
	python -m uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
	```

	### 4. Open Voice UI

	```
	Open in browser: http://localhost:8000/ui/voice.html
	```

	## Testing the Voice Feature

	### Option 1: Record Live

	1. Click "Start Recording"
	2. Speak as a scammer (e.g., "Your account is blocked. Send OTP immediately.")
	3. Click "Stop Recording"
	4. Wait for AI to:
	- Transcribe your voice
	- Process through honeypot
	- Reply with voice

	### Option 2: Upload Audio File

	1. Click "Upload Audio File"
	2. Select a `.wav`, `.mp3`, or `.m4a` file
	3. AI processes and replies

	## API Endpoint

	### POST `/api/v1/voice/engage`

	Request:
	```bash
	curl -X POST "http://localhost:8000/api/v1/voice/engage" \
	-H "x-api-key: dev-key-12345" \
	-F "audio_file=@recording.wav" \
	-F "session_id=voice-test-001" \
	-F "language=auto"
	```

	Response:
	```json
	{
	"session_id": "voice-test-001",
	"scam_detected": true,
	"scam_confidence": 0.92,
	"scam_type": "financial_fraud",
	"turn_count": 1,
	"ai_reply_text": "Oh no! What should I do? Can you help me?",
	"ai_reply_audio_url": "/api/v1/voice/audio/reply_xyz.mp3",
	"transcription": {
	"text": "Your account is blocked. Send OTP immediately.",
	"language": "en",
	"confidence": 0.95
	},
	"voice_fraud": null,
	"extracted_intelligence": {
	"upi_ids": [],
	"bank_accounts": [],
	"phone_numbers": [],
	"urls": []
	},
	"processing_time_ms": 3450
	}
	```

	## File Structure

	```
	app/
	├── voice/ # NEW: Phase 2 voice modules
	│ ├── __init__.py
	│ ├── asr.py # Whisper ASR
	│ ├── tts.py # gTTS text-to-speech
	│ └── fraud_detector.py # Optional voice fraud detection
	├── api/
	│ ├── voice_endpoints.py # NEW: Voice API endpoints
	│ └── voice_schemas.py # NEW: Voice API schemas
	└── ... (Phase 1 unchanged)

	ui/
	├── voice.html # NEW: Voice UI
	├── voice.js # NEW: Voice UI logic
	├── voice.css # NEW: Voice UI styles
	└── ... (Phase 1 unchanged)

	PHASE_2_VOICE_IMPLEMENTATION_PLAN.md # Full implementation plan
	requirements-phase2.txt # Phase 2 dependencies
	.env.phase2.example # Phase 2 config example
	```

	## Impact on Phase 1

	ZERO IMPACT:

	- ✅ Phase 1 text honeypot unchanged
	- ✅ All existing tests pass
	- ✅ Existing API endpoints unchanged
	- ✅ Existing UI unchanged
	- ✅ Phase 2 is opt-in (disabled by default)

	## Performance

	\| Metric \| Target \| Notes \|
	\|--------\|--------\|-------\|
	\| ASR Latency \| <2s \| Whisper base model \|
	\| TTS Latency \| <1s \| gTTS \|
	\| Total Loop \| <5s \| Voice in → Voice out \|
	\| Accuracy \| >85% \| Transcription WER \|

	## Troubleshooting

	### "Voice API unavailable"

	- Check `PHASE_2_ENABLED=true` in `.env`
	- Verify dependencies installed: `pip list \| grep whisper`
	- Check logs: `tail -f logs/app.log`

	### "Microphone access denied"

	- Browser needs microphone permission
	- Check browser settings → Privacy → Microphone
	- Use HTTPS or localhost (required for `getUserMedia`)

	### "PyAudio installation failed"

	```bash
	# Windows
	pip install pipwin
	pipwin install pyaudio

	# Linux
	sudo apt-get install portaudio19-dev python3-pyaudio
	pip install pyaudio

	# Mac
	brew install portaudio
	pip install pyaudio
	```

	### "Whisper model download slow"

	- First run downloads model (~150MB for base)
	- Models cached in `~/.cache/whisper/`
	- Use smaller model: `WHISPER_MODEL=tiny`

	## Advanced Features

	### Voice Fraud Detection (Optional)

	Detect synthetic/deepfake voices:

	```bash
	# Enable in .env
	VOICE_FRAUD_DETECTION=true

	# Install additional dependency
	pip install resemblyzer
	```

	Response includes:
	```json
	"voice_fraud": {
	"is_synthetic": false,
	"confidence": 0.85,
	"risk_level": "low"
	}
	```

	### Custom TTS Voice

	Future: Replace gTTS with IndicTTS for better Indic language support.

	### Streaming Audio

	Future: Real-time audio streaming instead of record-then-send.

	## Testing Checklist

	- [ ] Install Phase 2 dependencies
	- [ ] Set `PHASE_2_ENABLED=true`
	- [ ] Start server
	- [ ] Open voice UI
	- [ ] Record voice message
	- [ ] Verify transcription
	- [ ] Verify AI reply (text)
	- [ ] Verify AI reply (audio)
	- [ ] Check metadata (language, confidence)
	- [ ] Verify Phase 1 tests still pass

	## Next Steps

	1. Review: Read `PHASE_2_VOICE_IMPLEMENTATION_PLAN.md` for full details
	2. Install: Run `pip install -r requirements-phase2.txt`
	3. Configure: Copy settings from `.env.phase2.example` to `.env`
	4. Test: Open `ui/voice.html` and try recording
	5. Deploy: Set `PHASE_2_ENABLED=true` in production

	## Support

	- Full plan: `PHASE_2_VOICE_IMPLEMENTATION_PLAN.md`
	- Issues: Check logs in `logs/app.log`
	- Questions: Review implementation plan sections

	---

	Phase 2 Status: ✅ Planned, 🚧 Ready to Implement

	Estimated Implementation Time: 17-21 hours

	Priority: Optional (Phase 1 is complete and sufficient for competition)