scam / PHASE_2_CHECKLIST.md
Gankit12's picture
Relative API URLs, docker-compose port fix, Phase 2 voice, HF deploy guide
6a4a552
# Phase 2 Implementation Checklist
Track your progress implementing Phase 2 voice features.
## Setup & Dependencies
- [ ] Review `PHASE_2_VOICE_IMPLEMENTATION_PLAN.md`
- [ ] Review `PHASE_2_README.md`
- [ ] Install system dependencies (portaudio, ffmpeg)
- [ ] Install Python dependencies: `pip install -r requirements-phase2.txt`
- [ ] Copy Phase 2 settings from `.env.phase2.example` to `.env`
- [ ] Set `PHASE_2_ENABLED=true` in `.env`
- [ ] Verify Whisper model downloads successfully
## Core Modules
### ASR Module (`app/voice/asr.py`)
- [ ] Create `app/voice/asr.py`
- [ ] Implement `ASREngine` class
- [ ] Implement `transcribe()` method
- [ ] Add confidence calculation
- [ ] Add language detection
- [ ] Test with sample audio files
- [ ] Test with Hindi audio
- [ ] Test with English audio
- [ ] Test with Gujarati audio
- [ ] Verify latency <2s
### TTS Module (`app/voice/tts.py`)
- [ ] Create `app/voice/tts.py`
- [ ] Implement `TTSEngine` class
- [ ] Implement `synthesize()` method
- [ ] Add language mapping (en, hi, gu, etc.)
- [ ] Test with English text
- [ ] Test with Hindi text
- [ ] Test with Gujarati text
- [ ] Verify audio quality
- [ ] Verify latency <1s
### Voice Fraud Detector (Optional) (`app/voice/fraud_detector.py`)
- [ ] Create `app/voice/fraud_detector.py`
- [ ] Implement `VoiceFraudDetector` class
- [ ] Implement `detect_synthetic_voice()` method
- [ ] Add resemblyzer integration (if enabled)
- [ ] Test with synthetic audio
- [ ] Test with real audio
- [ ] Verify detection accuracy
## API Layer
### Voice Endpoints (`app/api/voice_endpoints.py`)
- [ ] Create `app/api/voice_endpoints.py`
- [ ] Implement `POST /api/v1/voice/engage`
- [ ] Add file upload handling
- [ ] Add ASR integration
- [ ] Add Phase 1 pipeline integration
- [ ] Add TTS integration
- [ ] Add voice fraud integration (optional)
- [ ] Implement `GET /api/v1/voice/audio/{filename}`
- [ ] Implement `GET /api/v1/voice/health`
- [ ] Add error handling
- [ ] Add logging
- [ ] Test with curl
- [ ] Test with Postman
### Voice Schemas (`app/api/voice_schemas.py`)
- [ ] Create `app/api/voice_schemas.py`
- [ ] Define `VoiceEngageRequest`
- [ ] Define `VoiceEngageResponse`
- [ ] Define `TranscriptionMetadata`
- [ ] Define `VoiceFraudMetadata`
- [ ] Add validation rules
- [ ] Test schema validation
## UI Layer
### Voice HTML (`ui/voice.html`)
- [ ] Create `ui/voice.html`
- [ ] Add header and title
- [ ] Add recording controls section
- [ ] Add recording status indicator
- [ ] Add start/stop buttons
- [ ] Add upload button
- [ ] Add session ID display
- [ ] Add conversation section
- [ ] Add message display area
- [ ] Add metadata section
- [ ] Add transcription display
- [ ] Add detection display
- [ ] Add voice fraud display (optional)
- [ ] Add intelligence section
- [ ] Test in Chrome
- [ ] Test in Firefox
- [ ] Test in Safari
### Voice JavaScript (`ui/voice.js`)
- [ ] Create `ui/voice.js`
- [ ] Implement `startRecording()`
- [ ] Implement `stopRecording()`
- [ ] Implement `uploadAudio()`
- [ ] Implement `sendAudioToAPI()`
- [ ] Implement `handleAPIResponse()`
- [ ] Implement `addMessage()`
- [ ] Implement `updateMetadata()`
- [ ] Implement `updateIntelligence()`
- [ ] Add error handling
- [ ] Test microphone access
- [ ] Test file upload
- [ ] Test API integration
- [ ] Test audio playback
### Voice CSS (`ui/voice.css`)
- [ ] Create `ui/voice.css`
- [ ] Style header
- [ ] Style recording controls
- [ ] Style recording status
- [ ] Style buttons
- [ ] Style conversation area
- [ ] Style messages (user/ai/system)
- [ ] Style metadata cards
- [ ] Style intelligence display
- [ ] Add responsive design
- [ ] Test on desktop
- [ ] Test on tablet
- [ ] Test on mobile
## Integration
### Main App Integration
- [ ] Update `app/main.py` to include voice router
- [ ] Add conditional import (only if `PHASE_2_ENABLED=true`)
- [ ] Add error handling for missing dependencies
- [ ] Test server startup with Phase 2 enabled
- [ ] Test server startup with Phase 2 disabled
- [ ] Verify Phase 1 endpoints still work
### Config Integration
- [ ] Update `app/config.py` with Phase 2 settings
- [ ] Add `PHASE_2_ENABLED` field
- [ ] Add `WHISPER_MODEL` field
- [ ] Add `TTS_ENGINE` field
- [ ] Add `VOICE_FRAUD_DETECTION` field
- [ ] Add `AUDIO_SAMPLE_RATE` field
- [ ] Add `AUDIO_CHUNK_DURATION` field
- [ ] Test config loading
### Environment Variables
- [ ] Update `.env.example` with Phase 2 variables
- [ ] Create `.env.phase2.example`
- [ ] Document all Phase 2 settings
- [ ] Test with different configurations
## Testing
### Unit Tests
- [ ] Create `tests/unit/test_voice_asr.py`
- [ ] Test ASR transcription
- [ ] Test language detection
- [ ] Test confidence calculation
- [ ] Create `tests/unit/test_voice_tts.py`
- [ ] Test TTS synthesis
- [ ] Test language mapping
- [ ] Create `tests/unit/test_voice_fraud.py` (optional)
- [ ] Test fraud detection
- [ ] Run all unit tests: `pytest tests/unit/test_voice_*.py`
### Integration Tests
- [ ] Create `tests/integration/test_voice_api.py`
- [ ] Test voice engage endpoint
- [ ] Test audio file upload
- [ ] Test transcription flow
- [ ] Test Phase 1 integration
- [ ] Test TTS flow
- [ ] Test audio download
- [ ] Test health endpoint
- [ ] Run integration tests: `pytest tests/integration/test_voice_api.py`
### End-to-End Tests
- [ ] Test full voice loop (record → transcribe → process → TTS → play)
- [ ] Test with English scam message
- [ ] Test with Hindi scam message
- [ ] Test with Gujarati scam message
- [ ] Test multi-turn conversation
- [ ] Test intelligence extraction from voice
- [ ] Test session persistence
- [ ] Verify latency <5s for full loop
### Regression Tests
- [ ] Run all Phase 1 tests: `pytest tests/`
- [ ] Verify Phase 1 text endpoints work
- [ ] Verify Phase 1 UI works
- [ ] Verify no breaking changes
## Performance
- [ ] Measure ASR latency
- [ ] Measure TTS latency
- [ ] Measure total loop latency
- [ ] Test with concurrent requests
- [ ] Test with large audio files
- [ ] Optimize if needed
- [ ] Document performance metrics
## Documentation
- [ ] Review `PHASE_2_VOICE_IMPLEMENTATION_PLAN.md`
- [ ] Review `PHASE_2_README.md`
- [ ] Add inline code comments
- [ ] Add docstrings to all functions
- [ ] Update main `README.md` with Phase 2 info
- [ ] Create API documentation for voice endpoints
- [ ] Add troubleshooting guide
- [ ] Add examples
## Deployment
### Docker
- [ ] Update `Dockerfile` with Phase 2 dependencies
- [ ] Add conditional installation
- [ ] Test Docker build
- [ ] Test Docker run with Phase 2 enabled
- [ ] Test Docker run with Phase 2 disabled
### Environment Setup
- [ ] Document system dependencies
- [ ] Document Python dependencies
- [ ] Create setup script (optional)
- [ ] Test on clean environment
- [ ] Test on Windows
- [ ] Test on Linux
- [ ] Test on Mac
### Production Readiness
- [ ] Add monitoring for voice endpoints
- [ ] Add logging for voice operations
- [ ] Add error tracking
- [ ] Add rate limiting
- [ ] Add audio file cleanup
- [ ] Add security headers
- [ ] Test with production settings
## Quality Assurance
### Code Quality
- [ ] Run linter: `flake8 app/voice/`
- [ ] Run type checker: `mypy app/voice/`
- [ ] Run formatter: `black app/voice/`
- [ ] Fix all linting errors
- [ ] Fix all type errors
- [ ] Review code for best practices
### Security
- [ ] Validate audio file uploads
- [ ] Add file size limits
- [ ] Add file type validation
- [ ] Sanitize file names
- [ ] Add rate limiting
- [ ] Test with malicious files
- [ ] Review security best practices
### Accessibility
- [ ] Test keyboard navigation
- [ ] Test screen reader compatibility
- [ ] Add ARIA labels
- [ ] Test with assistive technologies
## Final Checks
- [ ] All tests passing
- [ ] No linting errors
- [ ] Documentation complete
- [ ] Performance acceptable
- [ ] Security reviewed
- [ ] Phase 1 unaffected
- [ ] Ready for deployment
## Post-Implementation
- [ ] Demo video recorded
- [ ] User guide created
- [ ] Training materials prepared
- [ ] Feedback collected
- [ ] Issues documented
- [ ] Future improvements planned
---
## Progress Summary
**Total Tasks:** 200+
**Completed:** _____ / 200+
**In Progress:** _____
**Blocked:** _____
**Estimated Time Remaining:** _____ hours
---
## Notes
Use this space to track issues, blockers, or important decisions:
```
[Date] [Note]
-
-
-
```
---
**Last Updated:** [Date]
**Status:** 🚧 Not Started | 🟡 In Progress | ✅ Complete