Phase 2 Implementation - Quick Reference Card
π― 6 Prompts to Implement Phase 2
Copy these prompts one at a time to your AI assistant (Claude, ChatGPT, etc.)
β PROMPT 1: ASR Module (2 hours)
What: Create Whisper-based speech-to-text module
Output: app/voice/asr.py
Dependencies: pip install openai-whisper torchaudio
Copy this:
I'm implementing Phase 2 voice features for my ScamShield AI honeypot. Create app/voice/asr.py with ASREngine class using Whisper. Requirements: transcribe(audio_path, language) -> {text, language, confidence}, support multiple model sizes (tiny/base/small/medium/large), GPU if available, singleton pattern, type hints, docstrings, logging, error handling. Reference: PHASE_2_VOICE_IMPLEMENTATION_PLAN.md Step 2.1.
Test:
python -c "from app.voice.asr import get_asr_engine; print('β ASR OK')"
β PROMPT 2: TTS Module (2 hours)
What: Create gTTS-based text-to-speech module
Output: app/voice/tts.py
Dependencies: pip install gTTS
Copy this:
I'm implementing Phase 2 voice features for my ScamShield AI honeypot. Create app/voice/tts.py with TTSEngine class using gTTS. Requirements: synthesize(text, language, output_path) -> audio_file_path, support Indic languages (en, hi, gu, ta, te, bn, mr), auto-generate temp files, singleton pattern, type hints, docstrings, logging, error handling. Reference: PHASE_2_VOICE_IMPLEMENTATION_PLAN.md Step 2.2.
Test:
python -c "from app.voice.tts import get_tts_engine; print('β TTS OK')"
β PROMPT 3: Voice API (3 hours)
What: Create voice API endpoints
Output: app/api/voice_endpoints.py, app/api/voice_schemas.py
Dependencies: FastAPI (already installed)
Copy this:
I'm implementing Phase 2 voice features for my ScamShield AI honeypot. Create app/api/voice_schemas.py (VoiceEngageResponse, TranscriptionMetadata, VoiceFraudMetadata) and app/api/voice_endpoints.py with: POST /api/v1/voice/engage (audio_file upload β ASR β Phase 1 pipeline β TTS β audio response), GET /api/v1/voice/audio/{filename} (serve audio), GET /api/v1/voice/health. CRITICAL: Reuse Phase 1 code (detector, honeypot, extractor), do NOT modify Phase 1. Reference: PHASE_2_VOICE_IMPLEMENTATION_PLAN.md Step 3.
Test:
python -c "from app.api.voice_endpoints import router; print('β API OK')"
β PROMPT 4: Voice UI (4 hours)
What: Create voice chat interface
Output: ui/voice.html, ui/voice.js, ui/voice.css
Dependencies: None (vanilla JS)
Copy this:
I'm implementing Phase 2 voice features for my ScamShield AI honeypot. Create ui/voice.html (recording controls, conversation display, metadata, intelligence), ui/voice.js (MediaRecorder API, sendAudioToAPI, handleAPIResponse, updateMetadata), ui/voice.css (dark theme, recording status colors, message bubbles). Features: record audio, upload files, display transcription, play AI voice, show metadata. API: POST /api/v1/voice/engage with FormData. Reference: PHASE_2_VOICE_IMPLEMENTATION_PLAN.md Step 4.
Test:
Open http://localhost:8000/ui/voice.html in browser
(Server not running yet, just check UI renders)
β PROMPT 5: Integration (3 hours)
What: Integrate Phase 2 into main app
Output: Updated app/config.py, app/main.py, .env.example
Dependencies: None
Copy this:
I'm implementing Phase 2 voice features for my ScamShield AI honeypot. Update app/config.py (add PHASE_2_ENABLED, WHISPER_MODEL, TTS_ENGINE, VOICE_FRAUD_DETECTION, AUDIO_SAMPLE_RATE, AUDIO_CHUNK_DURATION), app/main.py (conditionally include voice router if PHASE_2_ENABLED=true with try/except), .env.example (add Phase 2 config section). CRITICAL: Phase 1 must work if Phase 2 disabled or fails to load. Reference: PHASE_2_VOICE_IMPLEMENTATION_PLAN.md Step 5.
Test:
# Set in .env
PHASE_2_ENABLED=true
# Start server
python -m uvicorn app.main:app --reload
# Check logs for "Phase 2 voice endpoints enabled"
β PROMPT 6: Testing (3 hours)
What: Create comprehensive tests
Output: tests/unit/test_voice_asr.py, tests/unit/test_voice_tts.py, tests/integration/test_voice_api.py
Dependencies: pytest (already installed)
Copy this:
I'm implementing Phase 2 voice features for my ScamShield AI honeypot. Create tests/unit/test_voice_asr.py (test ASREngine: initialization, transcription, confidence, error handling, singleton), tests/unit/test_voice_tts.py (test TTSEngine: initialization, synthesis, language mapping, temp files, error handling, singleton), tests/integration/test_voice_api.py (test voice endpoints: full flow, audio download, health check, auth, error handling, Phase 1 unaffected). Reference: PHASE_2_VOICE_IMPLEMENTATION_PLAN.md Testing Plan.
Test:
# Run Phase 2 tests
pytest tests/unit/test_voice_*.py -v
pytest tests/integration/test_voice_api.py -v
# Run ALL tests (verify Phase 1 still works)
pytest tests/ -v
π Implementation Checklist
[ ] PROMPT 1: ASR Module
[ ] Code generated
[ ] Import test passes
[ ] Basic transcription works
[ ] PROMPT 2: TTS Module
[ ] Code generated
[ ] Import test passes
[ ] Basic synthesis works
[ ] PROMPT 3: Voice API
[ ] Schemas generated
[ ] Endpoints generated
[ ] Import test passes
[ ] PROMPT 4: Voice UI
[ ] HTML generated
[ ] JavaScript generated
[ ] CSS generated
[ ] UI renders in browser
[ ] PROMPT 5: Integration
[ ] Config updated
[ ] Main app updated
[ ] .env.example updated
[ ] Server starts successfully
[ ] Voice endpoints accessible
[ ] PROMPT 6: Testing
[ ] Unit tests generated
[ ] Integration tests generated
[ ] All tests pass
[ ] Phase 1 tests still pass
[ ] FINAL VALIDATION
[ ] Can record voice in UI
[ ] Can upload audio file
[ ] AI responds with voice
[ ] Metadata displays correctly
[ ] Intelligence extracted
[ ] Phase 1 text chat still works
π Quick Start
Before You Begin
# 1. Backup your code
git add .
git commit -m "Backup before Phase 2"
# 2. Install dependencies
pip install -r requirements-phase2.txt
# 3. Read the plan (optional but recommended)
cat PHASE_2_VOICE_IMPLEMENTATION_PLAN.md
Implementation Flow
Start β PROMPT 1 β Test β PROMPT 2 β Test β PROMPT 3 β Test β
PROMPT 4 β Test β PROMPT 5 β Test β PROMPT 6 β Test β Done! β
After Each Prompt
- Review the generated code
- Test it works (see test commands above)
- Commit your changes
- Move to next prompt
π― Expected Results
After PROMPT 1 + 2 (Modules)
β app/voice/asr.py exists
β app/voice/tts.py exists
β Can import both modules
β Basic transcription/synthesis works
After PROMPT 3 (API)
β app/api/voice_endpoints.py exists
β app/api/voice_schemas.py exists
β Can import endpoints
β Ready for integration
After PROMPT 4 (UI)
β ui/voice.html exists
β ui/voice.js exists
β ui/voice.css exists
β UI renders in browser
After PROMPT 5 (Integration)
β Server starts with Phase 2 enabled
β Voice endpoints accessible
β Phase 1 still works
β Logs show "Phase 2 voice endpoints enabled"
After PROMPT 6 (Testing)
β All Phase 2 tests pass
β All Phase 1 tests pass
β No breaking changes
β Ready for production
π§ Troubleshooting
Common Issues
| Issue | Solution |
|---|---|
ImportError: whisper |
Run pip install openai-whisper |
ImportError: gTTS |
Run pip install gTTS |
| PyAudio install fails | See PHASE_2_README.md β Troubleshooting |
| Server won't start | Check logs: tail -f logs/app.log |
| Voice endpoint 404 | Verify PHASE_2_ENABLED=true in .env |
| Phase 1 tests fail | Phase 2 broke something - review changes |
| Audio not playing | Check audio URL in response |
| Microphone not working | Browser needs HTTPS or localhost |
Quick Fixes
# Reset if something breaks
git reset --hard HEAD
# Reinstall dependencies
pip install -r requirements-phase2.txt --force-reinstall
# Clear Python cache
find . -type d -name __pycache__ -exec rm -rf {} +
# Restart server
pkill -f uvicorn
python -m uvicorn app.main:app --reload
π Time Estimates
| Prompt | Component | Time |
|---|---|---|
| 1 | ASR Module | 2h |
| 2 | TTS Module | 2h |
| 3 | Voice API | 3h |
| 4 | Voice UI | 4h |
| 5 | Integration | 3h |
| 6 | Testing | 3h |
| Total | 17h |
Add 20% buffer: ~21 hours total
π Pro Tips
- Test after each prompt - Don't skip testing!
- Commit frequently - Easy to rollback if needed
- Read error messages - They usually tell you what's wrong
- Check logs -
tail -f logs/app.logis your friend - Ask for help - Provide error messages to AI assistant
- Take breaks - 17 hours is a lot, spread over 2-3 days
π Need Help?
Full Documentation
- Complete Plan:
PHASE_2_VOICE_IMPLEMENTATION_PLAN.md - Architecture:
PHASE_2_ARCHITECTURE.md - Quick Start:
PHASE_2_README.md - Checklist:
PHASE_2_CHECKLIST.md - Prompts:
PHASE_2_IMPLEMENTATION_PROMPTS.md(detailed versions)
Ask AI for Help
Template:
I'm implementing Phase 2 (PROMPT [number]) for ScamShield AI.
Error: [paste error]
Current code: [paste relevant code]
How do I fix this? Reference PHASE_2_VOICE_IMPLEMENTATION_PLAN.md if needed.
β Success!
When all 6 prompts are complete:
- β Voice recording works
- β AI responds with voice
- β Transcription displays
- β Metadata shows
- β Intelligence extracted
- β Phase 1 still works
- β All tests pass
You've successfully implemented Phase 2! π
Quick Reference for: PHASE_2_IMPLEMENTATION_PROMPTS.md
Start with: PROMPT 1 (ASR Module)
Estimated Time: 17-21 hours