scam / PHASE_2_IMPLEMENTATION_PROMPTS.md
Gankit12's picture
Relative API URLs, docker-compose port fix, Phase 2 voice, HF deploy guide
6a4a552

Phase 2 Voice Implementation - AI Assistant Prompts

Overview

This file contains 6 carefully crafted prompts to guide AI assistants (like Claude, ChatGPT, etc.) through implementing Phase 2 voice features step-by-step.

How to Use:

  1. Copy each prompt one at a time
  2. Paste into your AI assistant
  3. Review the generated code
  4. Test before moving to the next prompt
  5. Track progress in PHASE_2_CHECKLIST.md

Context Files to Attach:

  • PHASE_2_VOICE_IMPLEMENTATION_PLAN.md (master plan)
  • PHASE_2_ARCHITECTURE.md (architecture)
  • app/agent/honeypot.py (existing honeypot)
  • app/config.py (existing config)

πŸ“‹ PROMPT 1: ASR Module (Whisper Transcription)

Estimated Time: 2 hours
Dependencies: pip install openai-whisper torchaudio
Output: app/voice/asr.py

Prompt

I'm implementing Phase 2 voice features for my ScamShield AI honeypot project. I need to create an ASR (Automatic Speech Recognition) module using Whisper.

CONTEXT:
- This is Phase 2, which wraps around an existing Phase 1 text honeypot
- Phase 1 must remain unchanged
- The ASR module will transcribe audio to text, which then feeds into Phase 1

REQUIREMENTS:

1. Create file: app/voice/asr.py

2. Implement ASREngine class with:
   - __init__(model_size: str = "base") - Initialize Whisper model
   - _load_model() - Load Whisper model (tiny/base/small/medium/large)
   - transcribe(audio_path: str, language: Optional[str] = None) -> Dict
     Returns: {"text": str, "language": str, "confidence": float}
   - _calculate_confidence(result: Dict) -> float - Calculate confidence from Whisper output

3. Features:
   - Support multiple Whisper model sizes (configurable)
   - Auto-detect language or accept language hint
   - GPU support if available (cuda), else CPU
   - Return transcription with confidence score
   - Handle errors gracefully (return empty text with 0.0 confidence)

4. Singleton pattern:
   - get_asr_engine() -> ASREngine (global instance)

5. Code quality:
   - Type hints for all functions
   - Docstrings (Google style)
   - Logging using app.utils.logger.get_logger(__name__)
   - Error handling with try/except

6. Configuration from settings:
   - settings.WHISPER_MODEL (default: "base")
   - Auto-detect device (cuda/cpu)

REFERENCE IMPLEMENTATION (from PHASE_2_VOICE_IMPLEMENTATION_PLAN.md):
[See Step 2.1 in the plan]

ACCEPTANCE CRITERIA:
- [ ] ASREngine class created
- [ ] Whisper model loads successfully
- [ ] transcribe() returns correct format
- [ ] Language detection works
- [ ] Confidence calculation implemented
- [ ] Singleton pattern works
- [ ] Error handling present
- [ ] Type hints and docstrings complete
- [ ] Logging added

Please generate the complete app/voice/asr.py file with production-ready code.

πŸ“‹ PROMPT 2: TTS Module (Text-to-Speech)

Estimated Time: 2 hours
Dependencies: pip install gTTS
Output: app/voice/tts.py

Prompt

I'm implementing Phase 2 voice features for my ScamShield AI honeypot. I need to create a TTS (Text-to-Speech) module using gTTS.

CONTEXT:
- This is Phase 2, which wraps around an existing Phase 1 text honeypot
- Phase 1 generates text replies, TTS converts them to speech
- The TTS module will convert AI text replies to audio files

REQUIREMENTS:

1. Create file: app/voice/tts.py

2. Implement TTSEngine class with:
   - __init__() - Initialize TTS engine
   - synthesize(text: str, language: str = "en", output_path: Optional[str] = None) -> str
     Returns: Path to generated audio file
   - Language mapping for Indic languages (en, hi, gu, ta, te, bn, mr)

3. Features:
   - Support multiple languages (English, Hindi, Gujarati, Tamil, Telugu, Bengali, Marathi)
   - Auto-generate output path if not provided (use tempfile)
   - Return path to generated .mp3 file
   - Handle errors gracefully (raise exception with clear message)

4. Singleton pattern:
   - get_tts_engine() -> TTSEngine (global instance)

5. Code quality:
   - Type hints for all functions
   - Docstrings (Google style)
   - Logging using app.utils.logger.get_logger(__name__)
   - Error handling with try/except

6. Configuration from settings:
   - settings.TTS_ENGINE (default: "gtts")

REFERENCE IMPLEMENTATION (from PHASE_2_VOICE_IMPLEMENTATION_PLAN.md):
[See Step 2.2 in the plan]

LANGUAGE MAPPING:
- "english" -> "en"
- "hindi" -> "hi"
- "gujarati" -> "gu"
- "tamil" -> "ta"
- "telugu" -> "te"
- "bengali" -> "bn"
- "marathi" -> "mr"

ACCEPTANCE CRITERIA:
- [ ] TTSEngine class created
- [ ] synthesize() generates audio files
- [ ] Language mapping works for Indic languages
- [ ] Temp file generation works
- [ ] Singleton pattern works
- [ ] Error handling present
- [ ] Type hints and docstrings complete
- [ ] Logging added

Please generate the complete app/voice/tts.py file with production-ready code.

πŸ“‹ PROMPT 3: Voice API Endpoints

Estimated Time: 3 hours
Dependencies: FastAPI (already installed)
Output: app/api/voice_endpoints.py, app/api/voice_schemas.py

Prompt

I'm implementing Phase 2 voice features for my ScamShield AI honeypot. I need to create voice API endpoints that integrate with the existing Phase 1 text honeypot.

CONTEXT:
- Phase 1 has /api/v1/honeypot/engage (text endpoint) - DO NOT MODIFY
- Phase 2 needs /api/v1/voice/engage (voice endpoint) - NEW
- Voice endpoint: Audio in β†’ ASR β†’ Phase 1 pipeline β†’ TTS β†’ Audio out
- Must reuse existing Phase 1 logic (detector, honeypot, extractor)

REQUIREMENTS:

1. Create file: app/api/voice_schemas.py

Implement Pydantic schemas:
- TranscriptionMetadata (text, language, confidence)
- VoiceFraudMetadata (is_synthetic, confidence, risk_level) - Optional
- VoiceEngageResponse (session_id, scam_detected, scam_confidence, scam_type, turn_count, ai_reply_text, ai_reply_audio_url, transcription, voice_fraud, extracted_intelligence, processing_time_ms)

2. Create file: app/api/voice_endpoints.py

Implement endpoints:

A. POST /api/v1/voice/engage
   - Accept: multipart/form-data (audio_file, session_id, language)
   - Flow:
     1. Save uploaded audio temporarily
     2. Transcribe with ASR (app.voice.asr.get_asr_engine())
     3. Process through Phase 1 (REUSE existing code):
        - app.models.detector.get_detector().detect()
        - app.agent.honeypot.HoneypotAgent().engage()
        - app.models.extractor.extract_intelligence()
     4. Convert reply to speech with TTS (app.voice.tts.get_tts_engine())
     5. Return VoiceEngageResponse with audio URL
   - Auth: x-api-key header (use existing verify_api_key)
   - Error handling: HTTPException with clear messages

B. GET /api/v1/voice/audio/{filename}
   - Serve generated audio files from temp directory
   - Return FileResponse with audio/mpeg media type
   - 404 if file not found

C. GET /api/v1/voice/health
   - Check ASR and TTS engine status
   - Return health info (model, device, engine type)

3. Router setup:
   - APIRouter with prefix="/api/v1/voice", tags=["voice"]
   - Export router for inclusion in main app

4. Code quality:
   - Type hints for all functions
   - Docstrings (Google style)
   - Logging using app.utils.logger.get_logger(__name__)
   - Error handling with try/except
   - Clean up temp files after processing

CRITICAL: DO NOT MODIFY PHASE 1 CODE
- Import and reuse: app.models.detector, app.agent.honeypot, app.models.extractor
- Import and reuse: app.database.redis_client (session state)
- Import and reuse: app.api.auth.verify_api_key

REFERENCE IMPLEMENTATION (from PHASE_2_VOICE_IMPLEMENTATION_PLAN.md):
[See Step 3.1 and 3.2 in the plan]

ACCEPTANCE CRITERIA:
- [ ] voice_schemas.py created with all schemas
- [ ] voice_endpoints.py created with all endpoints
- [ ] POST /voice/engage works end-to-end
- [ ] Audio upload handling works
- [ ] ASR integration works
- [ ] Phase 1 integration works (no modifications to Phase 1)
- [ ] TTS integration works
- [ ] GET /voice/audio/{filename} serves files
- [ ] GET /voice/health returns status
- [ ] Error handling present
- [ ] Type hints and docstrings complete
- [ ] Logging added
- [ ] Auth (x-api-key) works

Please generate both files (voice_schemas.py and voice_endpoints.py) with production-ready code.

πŸ“‹ PROMPT 4: Voice UI (HTML + JavaScript + CSS)

Estimated Time: 4 hours
Dependencies: None (vanilla JS)
Output: ui/voice.html, ui/voice.js, ui/voice.css

Prompt

I'm implementing Phase 2 voice features for my ScamShield AI honeypot. I need to create a voice UI that allows users to record audio, send it to the API, and hear AI voice replies.

CONTEXT:
- Phase 1 has ui/index.html (text chat) - DO NOT MODIFY
- Phase 2 needs ui/voice.html (voice chat) - NEW, SEPARATE
- Voice UI: Record β†’ Send to /api/v1/voice/engage β†’ Display transcription + Play AI audio

REQUIREMENTS:

1. Create file: ui/voice.html

Features:
- Header: "🎀 ScamShield AI - Voice Honeypot (Phase 2)"
- Recording controls:
  - Status indicator (Ready/Recording/Processing)
  - "Start Recording" button
  - "Stop Recording" button
  - "Upload Audio File" button
  - Session ID display (read-only)
- Conversation area:
  - Display user messages (transcription)
  - Display AI messages (text + audio player)
  - System messages (status updates)
- Metadata section:
  - Transcription (text, language, confidence)
  - Detection (scam_detected, confidence, type)
  - Voice fraud (optional, if enabled)
- Intelligence section:
  - Display extracted UPI, bank accounts, phone numbers, URLs

2. Create file: ui/voice.js

Features:
- startRecording(): Use MediaRecorder API to capture audio
- stopRecording(): Stop recording and send to API
- uploadAudio(): Allow file upload
- sendAudioToAPI(): POST to /api/v1/voice/engage with FormData
- handleAPIResponse(): Update UI with response
- addMessage(): Add user/ai/system messages
- updateMetadata(): Update transcription, detection, fraud info
- updateIntelligence(): Display extracted intelligence
- Audio playback: <audio controls> for AI replies

3. Create file: ui/voice.css

Features:
- Dark theme (consistent with Phase 1)
- Recording status indicator (colors: ready=white, recording=red, processing=yellow)
- Button styles (primary, secondary, tertiary)
- Message bubbles (user=right, ai=left, system=center)
- Metadata cards with labels and values
- Responsive design

4. Code quality:
- Vanilla JavaScript (no frameworks)
- Clean, readable code
- Error handling (microphone access, API errors)
- Console logging for debugging

API INTEGRATION:
- Endpoint: POST /api/v1/voice/engage
- Headers: x-api-key: "dev-key-12345"
- FormData: audio_file (blob), session_id (string), language (string)
- Response: VoiceEngageResponse (see voice_schemas.py)

REFERENCE IMPLEMENTATION (from PHASE_2_VOICE_IMPLEMENTATION_PLAN.md):
[See Step 4.1, 4.2, 4.3 in the plan]

ACCEPTANCE CRITERIA:
- [ ] voice.html created with all sections
- [ ] voice.js created with all functions
- [ ] voice.css created with all styles
- [ ] Recording works (MediaRecorder API)
- [ ] File upload works
- [ ] API integration works
- [ ] Transcription displays correctly
- [ ] AI audio plays correctly
- [ ] Metadata updates correctly
- [ ] Intelligence displays correctly
- [ ] Error handling present
- [ ] UI looks professional (dark theme)
- [ ] Responsive design works

Please generate all three files (voice.html, voice.js, voice.css) with production-ready code.

πŸ“‹ PROMPT 5: Integration & Configuration

Estimated Time: 3 hours
Dependencies: None
Output: Updated app/main.py, app/config.py, .env.example

Prompt

I'm implementing Phase 2 voice features for my ScamShield AI honeypot. I need to integrate the voice module into the main app without breaking Phase 1.

CONTEXT:
- Phase 1 is working perfectly - MUST NOT BREAK
- Phase 2 voice module is ready (ASR, TTS, endpoints, UI)
- Need to conditionally load Phase 2 only if PHASE_2_ENABLED=true
- If Phase 2 fails to load, Phase 1 should still work

REQUIREMENTS:

1. Update file: app/config.py

Add Phase 2 settings to Settings class:
- PHASE_2_ENABLED: bool = Field(default=False, description="Enable Phase 2 voice features")
- WHISPER_MODEL: str = Field(default="base", description="Whisper model size (tiny, base, small, medium, large)")
- TTS_ENGINE: str = Field(default="gtts", description="TTS engine (gtts, indic_tts)")
- VOICE_FRAUD_DETECTION: bool = Field(default=False, description="Enable voice fraud detection")
- AUDIO_SAMPLE_RATE: int = Field(default=16000, description="Audio sample rate in Hz")
- AUDIO_CHUNK_DURATION: int = Field(default=5, description="Audio chunk duration in seconds")

2. Update file: app/main.py

Add conditional Phase 2 router inclusion:
```python
# After existing router inclusions
if getattr(settings, "PHASE_2_ENABLED", False):
    try:
        from app.api.voice_endpoints import router as voice_router
        app.include_router(voice_router)
        logger.info("Phase 2 voice endpoints enabled")
    except ImportError as e:
        logger.warning(f"Phase 2 voice endpoints unavailable: {e}")
    except Exception as e:
        logger.error(f"Failed to load Phase 2: {e}")
  1. Update file: .env.example

Add Phase 2 configuration section:

# ========================================
# PHASE 2: VOICE FEATURES (OPTIONAL)
# ========================================
# Enable Phase 2 voice features (default: false)
PHASE_2_ENABLED=false

# Whisper ASR Configuration
WHISPER_MODEL=base
# Options: tiny, base, small, medium, large
# Larger models = better accuracy but slower

# TTS Configuration
TTS_ENGINE=gtts
# Options: gtts (Google TTS - free)

# Voice Fraud Detection (Optional)
VOICE_FRAUD_DETECTION=false
# Set to true to enable synthetic voice detection

# Audio Settings
AUDIO_SAMPLE_RATE=16000
AUDIO_CHUNK_DURATION=5
  1. Code quality:
  • Minimal changes to existing code
  • Graceful degradation (Phase 1 works if Phase 2 fails)
  • Clear logging messages
  • No breaking changes

CRITICAL REQUIREMENTS:

  • DO NOT modify any Phase 1 code beyond adding the router
  • Phase 2 must be opt-in (default: disabled)
  • If Phase 2 fails to load, log warning but continue
  • Phase 1 must work even if Phase 2 dependencies are missing

REFERENCE IMPLEMENTATION (from PHASE_2_VOICE_IMPLEMENTATION_PLAN.md): [See Step 5.1 and 5.2 in the plan]

ACCEPTANCE CRITERIA:

  • app/config.py updated with Phase 2 settings
  • app/main.py updated with conditional router inclusion
  • .env.example updated with Phase 2 config
  • Phase 1 still works with PHASE_2_ENABLED=false
  • Phase 2 loads with PHASE_2_ENABLED=true
  • Graceful degradation if Phase 2 fails
  • Logging messages clear
  • No breaking changes to Phase 1

Please provide the exact changes needed for each file (show before/after or provide complete updated sections).


---

## πŸ“‹ PROMPT 6: Testing & Validation

**Estimated Time:** 3 hours  
**Dependencies:** pytest (already installed)  
**Output:** `tests/unit/test_voice_asr.py`, `tests/unit/test_voice_tts.py`, `tests/integration/test_voice_api.py`

### Prompt

I'm implementing Phase 2 voice features for my ScamShield AI honeypot. I need to create comprehensive tests to ensure everything works correctly.

CONTEXT:

  • Phase 2 is implemented (ASR, TTS, endpoints, UI, integration)
  • Need unit tests for ASR and TTS modules
  • Need integration tests for voice API endpoints
  • Need to verify Phase 1 is not affected

REQUIREMENTS:

  1. Create file: tests/unit/test_voice_asr.py

Test ASREngine:

  • test_asr_engine_initialization() - Verify model loads
  • test_asr_transcribe_english() - Test English transcription
  • test_asr_transcribe_hindi() - Test Hindi transcription (if sample available)
  • test_asr_confidence_calculation() - Test confidence scoring
  • test_asr_error_handling() - Test with invalid audio
  • test_asr_singleton() - Verify singleton pattern
  1. Create file: tests/unit/test_voice_tts.py

Test TTSEngine:

  • test_tts_engine_initialization() - Verify engine initializes
  • test_tts_synthesize_english() - Test English synthesis
  • test_tts_synthesize_hindi() - Test Hindi synthesis
  • test_tts_language_mapping() - Test language code mapping
  • test_tts_temp_file_generation() - Test auto file path
  • test_tts_error_handling() - Test with invalid input
  • test_tts_singleton() - Verify singleton pattern
  1. Create file: tests/integration/test_voice_api.py

Test Voice API:

  • test_voice_engage_endpoint() - Test full voice flow
    • Upload sample audio
    • Verify transcription in response
    • Verify AI reply text in response
    • Verify audio URL in response
    • Verify metadata (scam_detected, confidence, etc.)
  • test_voice_audio_download() - Test audio file serving
  • test_voice_health_endpoint() - Test health check
  • test_voice_auth_required() - Test x-api-key authentication
  • test_voice_invalid_audio() - Test error handling
  • test_phase_1_unaffected() - Verify Phase 1 endpoints still work
  1. Test fixtures:
  • Create sample audio files (tests/fixtures/audio/):
    • sample_scam_en.wav (English scam message)
    • sample_scam_hi.wav (Hindi scam message, if available)
    • invalid_audio.txt (non-audio file for error testing)
  1. Code quality:
  • Use pytest fixtures
  • Mock external dependencies where appropriate
  • Clear test names and docstrings
  • Assertions with descriptive messages
  • Test both success and failure cases

CRITICAL: Test Phase 1 Isolation

  • Run all existing Phase 1 tests
  • Verify they still pass
  • Verify Phase 1 endpoints work with PHASE_2_ENABLED=false

REFERENCE IMPLEMENTATION (from PHASE_2_VOICE_IMPLEMENTATION_PLAN.md): [See Testing Plan section]

ACCEPTANCE CRITERIA:

  • test_voice_asr.py created with all tests
  • test_voice_tts.py created with all tests
  • test_voice_api.py created with all tests
  • All ASR tests pass
  • All TTS tests pass
  • All voice API tests pass
  • Phase 1 tests still pass
  • Test fixtures created
  • Code coverage >80%
  • Clear test documentation

Please generate all three test files with production-ready test code. Include instructions for creating sample audio fixtures.


---

## 🎯 Implementation Workflow

### Step-by-Step Process

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ PROMPT 1: ASR Module β”‚ β”‚ β”œβ”€ Generate app/voice/asr.py β”‚ β”‚ β”œβ”€ Test: python -c "from app.voice.asr import get_asr_engine; print('OK')" β”‚ └─ βœ“ Checkpoint: ASR module works β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ PROMPT 2: TTS Module β”‚ β”‚ β”œβ”€ Generate app/voice/tts.py β”‚ β”‚ β”œβ”€ Test: python -c "from app.voice.tts import get_tts_engine; print('OK')" β”‚ └─ βœ“ Checkpoint: TTS module works β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ PROMPT 3: Voice API β”‚ β”‚ β”œβ”€ Generate app/api/voice_schemas.py β”‚ β”‚ β”œβ”€ Generate app/api/voice_endpoints.py β”‚ β”‚ β”œβ”€ Test: Check imports work β”‚ β”‚ └─ βœ“ Checkpoint: API code ready (not integrated yet) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ PROMPT 4: Voice UI β”‚ β”‚ β”œβ”€ Generate ui/voice.html β”‚ β”‚ β”œβ”€ Generate ui/voice.js β”‚ β”‚ β”œβ”€ Generate ui/voice.css β”‚ β”‚ β”œβ”€ Test: Open voice.html in browser β”‚ β”‚ └─ βœ“ Checkpoint: UI renders (API not connected yet) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ PROMPT 5: Integration β”‚ β”‚ β”œβ”€ Update app/config.py β”‚ β”‚ β”œβ”€ Update app/main.py β”‚ β”‚ β”œβ”€ Update .env.example β”‚ β”‚ β”œβ”€ Set PHASE_2_ENABLED=true in .env β”‚ β”‚ β”œβ”€ Test: Start server, check logs β”‚ β”‚ └─ βœ“ Checkpoint: Phase 2 integrated, server starts β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ PROMPT 6: Testing β”‚ β”‚ β”œβ”€ Generate tests/unit/test_voice_asr.py β”‚ β”‚ β”œβ”€ Generate tests/unit/test_voice_tts.py β”‚ β”‚ β”œβ”€ Generate tests/integration/test_voice_api.py β”‚ β”‚ β”œβ”€ Run: pytest tests/unit/test_voice_*.py β”‚ β”‚ β”œβ”€ Run: pytest tests/integration/test_voice_api.py β”‚ β”‚ β”œβ”€ Run: pytest tests/ (all tests, including Phase 1) β”‚ β”‚ └─ βœ“ Checkpoint: All tests pass β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό βœ… PHASE 2 COMPLETE!


---

## πŸ“Š Progress Tracking

### Checklist

Use this to track your progress:

- [ ] **PROMPT 1 COMPLETE** - ASR Module
  - [ ] app/voice/asr.py created
  - [ ] ASR module imports successfully
  - [ ] Basic transcription test works

- [ ] **PROMPT 2 COMPLETE** - TTS Module
  - [ ] app/voice/tts.py created
  - [ ] TTS module imports successfully
  - [ ] Basic synthesis test works

- [ ] **PROMPT 3 COMPLETE** - Voice API
  - [ ] app/api/voice_schemas.py created
  - [ ] app/api/voice_endpoints.py created
  - [ ] Imports work (no integration yet)

- [ ] **PROMPT 4 COMPLETE** - Voice UI
  - [ ] ui/voice.html created
  - [ ] ui/voice.js created
  - [ ] ui/voice.css created
  - [ ] UI renders in browser

- [ ] **PROMPT 5 COMPLETE** - Integration
  - [ ] app/config.py updated
  - [ ] app/main.py updated
  - [ ] .env.example updated
  - [ ] Server starts with Phase 2 enabled
  - [ ] Voice endpoints accessible

- [ ] **PROMPT 6 COMPLETE** - Testing
  - [ ] Unit tests created
  - [ ] Integration tests created
  - [ ] All tests pass
  - [ ] Phase 1 tests still pass

---

## 🚨 Important Notes

### Before Starting

1. **Backup your code:**
   ```bash
   git add .
   git commit -m "Backup before Phase 2 implementation"
  1. Install dependencies:

    pip install -r requirements-phase2.txt
    
  2. Read the plan:

    • Review PHASE_2_VOICE_IMPLEMENTATION_PLAN.md
    • Understand the architecture in PHASE_2_ARCHITECTURE.md

During Implementation

  1. Test after each prompt:

    • Don't move to the next prompt until the current one works
    • Run basic tests to verify functionality
    • Check logs for errors
  2. Track progress:

    • Update PHASE_2_CHECKLIST.md as you complete tasks
    • Mark prompts complete in this file
  3. Ask for help:

    • If a prompt doesn't work, ask the AI to debug
    • Provide error messages and logs
    • Reference the implementation plan

After Completion

  1. Full testing:

    # Test Phase 2
    pytest tests/unit/test_voice_*.py
    pytest tests/integration/test_voice_api.py
    
    # Test Phase 1 (verify no breaking changes)
    pytest tests/
    
  2. Manual testing:

    • Open http://localhost:8000/ui/voice.html
    • Record a voice message
    • Verify AI responds with voice
  3. Documentation:

    • Update main README.md with Phase 2 info
    • Document any issues or deviations from plan

πŸŽ“ Tips for Success

Working with AI Assistants

  1. Provide context:

    • Attach relevant files (config, existing code)
    • Mention you're following a specific plan
    • Reference the implementation plan sections
  2. Be specific:

    • If code doesn't work, provide exact error messages
    • Ask for specific fixes, not rewrites
    • Request explanations for unclear parts
  3. Iterate:

    • Review generated code before using it
    • Test incrementally
    • Ask for improvements if needed

Common Issues

Issue Solution Prompt to Use
Import errors Check dependencies installed "I'm getting ImportError: [error]. How do I fix this?"
Whisper slow Use smaller model "Change WHISPER_MODEL to 'tiny' in the code"
Audio not playing Check file path "Debug audio file serving in voice_endpoints.py"
Phase 1 broken Revert changes "Show me how to make Phase 2 truly optional"

πŸ“ž Support

If You Get Stuck

  1. Check the plan:

    • PHASE_2_VOICE_IMPLEMENTATION_PLAN.md has detailed explanations
    • PHASE_2_ARCHITECTURE.md shows how components fit together
  2. Check logs:

    tail -f logs/app.log
    
  3. Ask the AI:

    • "I'm stuck on [step]. Here's my error: [error]. How do I fix it?"
    • Provide context from the implementation plan

Getting Help from AI

Good prompt:

I'm implementing PROMPT 3 (Voice API) from PHASE_2_IMPLEMENTATION_PROMPTS.md.

I'm getting this error:
[paste error]

Here's my current code:
[paste relevant code]

How do I fix this? Reference the implementation plan if needed.

Bad prompt:

It doesn't work. Fix it.

βœ… Success Criteria

Phase 2 is complete when:

  • All 6 prompts executed successfully
  • All generated code works
  • Server starts with PHASE_2_ENABLED=true
  • Voice UI accessible at /ui/voice.html
  • Can record voice and get AI voice reply
  • All tests pass (Phase 1 + Phase 2)
  • No breaking changes to Phase 1
  • Documentation updated

πŸŽ‰ You're Ready!

Next Steps:

  1. Start with PROMPT 1 (ASR Module)
  2. Copy the prompt to your AI assistant
  3. Review the generated code
  4. Test it works
  5. Move to PROMPT 2

Estimated Total Time: 17-21 hours

You've got this! πŸš€


Created: 2026-02-10

For: ScamShield AI - Phase 2 Voice Implementation

Start with: PROMPT 1 (ASR Module)