# Phase 2 Voice Implementation - AI Assistant Prompts ## Overview This file contains 6 carefully crafted prompts to guide AI assistants (like Claude, ChatGPT, etc.) through implementing Phase 2 voice features step-by-step. **How to Use:** 1. Copy each prompt one at a time 2. Paste into your AI assistant 3. Review the generated code 4. Test before moving to the next prompt 5. Track progress in `PHASE_2_CHECKLIST.md` **Context Files to Attach:** - `PHASE_2_VOICE_IMPLEMENTATION_PLAN.md` (master plan) - `PHASE_2_ARCHITECTURE.md` (architecture) - `app/agent/honeypot.py` (existing honeypot) - `app/config.py` (existing config) --- ## 📋 PROMPT 1: ASR Module (Whisper Transcription) **Estimated Time:** 2 hours **Dependencies:** `pip install openai-whisper torchaudio` **Output:** `app/voice/asr.py` ### Prompt ``` I'm implementing Phase 2 voice features for my ScamShield AI honeypot project. I need to create an ASR (Automatic Speech Recognition) module using Whisper. CONTEXT: - This is Phase 2, which wraps around an existing Phase 1 text honeypot - Phase 1 must remain unchanged - The ASR module will transcribe audio to text, which then feeds into Phase 1 REQUIREMENTS: 1. Create file: app/voice/asr.py 2. Implement ASREngine class with: - __init__(model_size: str = "base") - Initialize Whisper model - _load_model() - Load Whisper model (tiny/base/small/medium/large) - transcribe(audio_path: str, language: Optional[str] = None) -> Dict Returns: {"text": str, "language": str, "confidence": float} - _calculate_confidence(result: Dict) -> float - Calculate confidence from Whisper output 3. Features: - Support multiple Whisper model sizes (configurable) - Auto-detect language or accept language hint - GPU support if available (cuda), else CPU - Return transcription with confidence score - Handle errors gracefully (return empty text with 0.0 confidence) 4. Singleton pattern: - get_asr_engine() -> ASREngine (global instance) 5. Code quality: - Type hints for all functions - Docstrings (Google style) - Logging using app.utils.logger.get_logger(__name__) - Error handling with try/except 6. Configuration from settings: - settings.WHISPER_MODEL (default: "base") - Auto-detect device (cuda/cpu) REFERENCE IMPLEMENTATION (from PHASE_2_VOICE_IMPLEMENTATION_PLAN.md): [See Step 2.1 in the plan] ACCEPTANCE CRITERIA: - [ ] ASREngine class created - [ ] Whisper model loads successfully - [ ] transcribe() returns correct format - [ ] Language detection works - [ ] Confidence calculation implemented - [ ] Singleton pattern works - [ ] Error handling present - [ ] Type hints and docstrings complete - [ ] Logging added Please generate the complete app/voice/asr.py file with production-ready code. ``` --- ## 📋 PROMPT 2: TTS Module (Text-to-Speech) **Estimated Time:** 2 hours **Dependencies:** `pip install gTTS` **Output:** `app/voice/tts.py` ### Prompt ``` I'm implementing Phase 2 voice features for my ScamShield AI honeypot. I need to create a TTS (Text-to-Speech) module using gTTS. CONTEXT: - This is Phase 2, which wraps around an existing Phase 1 text honeypot - Phase 1 generates text replies, TTS converts them to speech - The TTS module will convert AI text replies to audio files REQUIREMENTS: 1. Create file: app/voice/tts.py 2. Implement TTSEngine class with: - __init__() - Initialize TTS engine - synthesize(text: str, language: str = "en", output_path: Optional[str] = None) -> str Returns: Path to generated audio file - Language mapping for Indic languages (en, hi, gu, ta, te, bn, mr) 3. Features: - Support multiple languages (English, Hindi, Gujarati, Tamil, Telugu, Bengali, Marathi) - Auto-generate output path if not provided (use tempfile) - Return path to generated .mp3 file - Handle errors gracefully (raise exception with clear message) 4. Singleton pattern: - get_tts_engine() -> TTSEngine (global instance) 5. Code quality: - Type hints for all functions - Docstrings (Google style) - Logging using app.utils.logger.get_logger(__name__) - Error handling with try/except 6. Configuration from settings: - settings.TTS_ENGINE (default: "gtts") REFERENCE IMPLEMENTATION (from PHASE_2_VOICE_IMPLEMENTATION_PLAN.md): [See Step 2.2 in the plan] LANGUAGE MAPPING: - "english" -> "en" - "hindi" -> "hi" - "gujarati" -> "gu" - "tamil" -> "ta" - "telugu" -> "te" - "bengali" -> "bn" - "marathi" -> "mr" ACCEPTANCE CRITERIA: - [ ] TTSEngine class created - [ ] synthesize() generates audio files - [ ] Language mapping works for Indic languages - [ ] Temp file generation works - [ ] Singleton pattern works - [ ] Error handling present - [ ] Type hints and docstrings complete - [ ] Logging added Please generate the complete app/voice/tts.py file with production-ready code. ``` --- ## 📋 PROMPT 3: Voice API Endpoints **Estimated Time:** 3 hours **Dependencies:** FastAPI (already installed) **Output:** `app/api/voice_endpoints.py`, `app/api/voice_schemas.py` ### Prompt ``` I'm implementing Phase 2 voice features for my ScamShield AI honeypot. I need to create voice API endpoints that integrate with the existing Phase 1 text honeypot. CONTEXT: - Phase 1 has /api/v1/honeypot/engage (text endpoint) - DO NOT MODIFY - Phase 2 needs /api/v1/voice/engage (voice endpoint) - NEW - Voice endpoint: Audio in → ASR → Phase 1 pipeline → TTS → Audio out - Must reuse existing Phase 1 logic (detector, honeypot, extractor) REQUIREMENTS: 1. Create file: app/api/voice_schemas.py Implement Pydantic schemas: - TranscriptionMetadata (text, language, confidence) - VoiceFraudMetadata (is_synthetic, confidence, risk_level) - Optional - VoiceEngageResponse (session_id, scam_detected, scam_confidence, scam_type, turn_count, ai_reply_text, ai_reply_audio_url, transcription, voice_fraud, extracted_intelligence, processing_time_ms) 2. Create file: app/api/voice_endpoints.py Implement endpoints: A. POST /api/v1/voice/engage - Accept: multipart/form-data (audio_file, session_id, language) - Flow: 1. Save uploaded audio temporarily 2. Transcribe with ASR (app.voice.asr.get_asr_engine()) 3. Process through Phase 1 (REUSE existing code): - app.models.detector.get_detector().detect() - app.agent.honeypot.HoneypotAgent().engage() - app.models.extractor.extract_intelligence() 4. Convert reply to speech with TTS (app.voice.tts.get_tts_engine()) 5. Return VoiceEngageResponse with audio URL - Auth: x-api-key header (use existing verify_api_key) - Error handling: HTTPException with clear messages B. GET /api/v1/voice/audio/{filename} - Serve generated audio files from temp directory - Return FileResponse with audio/mpeg media type - 404 if file not found C. GET /api/v1/voice/health - Check ASR and TTS engine status - Return health info (model, device, engine type) 3. Router setup: - APIRouter with prefix="/api/v1/voice", tags=["voice"] - Export router for inclusion in main app 4. Code quality: - Type hints for all functions - Docstrings (Google style) - Logging using app.utils.logger.get_logger(__name__) - Error handling with try/except - Clean up temp files after processing CRITICAL: DO NOT MODIFY PHASE 1 CODE - Import and reuse: app.models.detector, app.agent.honeypot, app.models.extractor - Import and reuse: app.database.redis_client (session state) - Import and reuse: app.api.auth.verify_api_key REFERENCE IMPLEMENTATION (from PHASE_2_VOICE_IMPLEMENTATION_PLAN.md): [See Step 3.1 and 3.2 in the plan] ACCEPTANCE CRITERIA: - [ ] voice_schemas.py created with all schemas - [ ] voice_endpoints.py created with all endpoints - [ ] POST /voice/engage works end-to-end - [ ] Audio upload handling works - [ ] ASR integration works - [ ] Phase 1 integration works (no modifications to Phase 1) - [ ] TTS integration works - [ ] GET /voice/audio/{filename} serves files - [ ] GET /voice/health returns status - [ ] Error handling present - [ ] Type hints and docstrings complete - [ ] Logging added - [ ] Auth (x-api-key) works Please generate both files (voice_schemas.py and voice_endpoints.py) with production-ready code. ``` --- ## 📋 PROMPT 4: Voice UI (HTML + JavaScript + CSS) **Estimated Time:** 4 hours **Dependencies:** None (vanilla JS) **Output:** `ui/voice.html`, `ui/voice.js`, `ui/voice.css` ### Prompt ``` I'm implementing Phase 2 voice features for my ScamShield AI honeypot. I need to create a voice UI that allows users to record audio, send it to the API, and hear AI voice replies. CONTEXT: - Phase 1 has ui/index.html (text chat) - DO NOT MODIFY - Phase 2 needs ui/voice.html (voice chat) - NEW, SEPARATE - Voice UI: Record → Send to /api/v1/voice/engage → Display transcription + Play AI audio REQUIREMENTS: 1. Create file: ui/voice.html Features: - Header: "🎤 ScamShield AI - Voice Honeypot (Phase 2)" - Recording controls: - Status indicator (Ready/Recording/Processing) - "Start Recording" button - "Stop Recording" button - "Upload Audio File" button - Session ID display (read-only) - Conversation area: - Display user messages (transcription) - Display AI messages (text + audio player) - System messages (status updates) - Metadata section: - Transcription (text, language, confidence) - Detection (scam_detected, confidence, type) - Voice fraud (optional, if enabled) - Intelligence section: - Display extracted UPI, bank accounts, phone numbers, URLs 2. Create file: ui/voice.js Features: - startRecording(): Use MediaRecorder API to capture audio - stopRecording(): Stop recording and send to API - uploadAudio(): Allow file upload - sendAudioToAPI(): POST to /api/v1/voice/engage with FormData - handleAPIResponse(): Update UI with response - addMessage(): Add user/ai/system messages - updateMetadata(): Update transcription, detection, fraud info - updateIntelligence(): Display extracted intelligence - Audio playback: