# PRD.md — AI Subtitle Generator MVP # Goal Build a simple web app where users can: 1. Upload a video 2. Generate English subtitles using AI speech-to-text 3. Translate subtitles into: * Malayalam * Tamil * Hindi 4. Download `.srt` subtitle files The MVP should be: * Extremely simple * Fast to build * Vibecoding-friendly * Localhost only --- # Core Features ## 1. Upload Video Support: * `.mp4` * `.mov` * `.mkv` * `.webm` --- ## 2. Extract Audio Use FFmpeg to extract audio from the uploaded video. Example: ```bash ffmpeg -i input.mp4 -ar 16000 -ac 1 output.wav ``` --- ## 3. Speech to Text Use local: ```python faster-whisper ``` Generate: * English transcript * English `.srt` * Timestamps ### MVP Decision The MVP will use local Faster-Whisper instead of cloud APIs. Why? * Free * Fast enough for short videos * Better privacy * Works offline * Easy localhost setup * Easy to vibecode ### Suggested Model Start with: ```python base ``` Upgrade later if needed: * `small` * `medium` --- ### Example ```python from faster_whisper import WhisperModel model = WhisperModel("base") segments, info = model.transcribe("audio.wav") ``` --- --- ## 4. Translate Subtitles Use a small translation adapter layer. The app should NOT directly depend on one translation provider. This makes it easy to: * start simple * swap providers later * experiment with better translation models --- ## MVP Translation Provider Start with: ```python deep-translator ``` Translate English subtitles into: * Malayalam (`ml`) * Tamil (`ta`) * Hindi (`hi`) --- ## Future Translation Provider Later we can swap in: * IndicTrans2 * LibreTranslate * OpenAI models * Other local translation models without changing the main application flow. --- ## Suggested Adapter Design ```text services/ └── translators/ ├── base.py ├── deep_translator_adapter.py └── indictrans_adapter.py ``` --- ## Example Interface ```python class Translator: def translate(self, text: str, target_lang: str) -> str: pass ``` --- ## Example MVP Usage ```python translator = DeepTranslatorAdapter() translated = translator.translate(text, "ml") ``` --- --- ## 5. Generate `.srt` Generate downloadable subtitle files. Example: ```srt 1 00:00:01,000 --> 00:00:03,000 Hello everyone ``` --- # Tech Stack ## Backend * FastAPI ## Frontend * HTML * CSS * Minimal JavaScript * Jinja2 Templates ## AI/Processing * Faster-Whisper * FFmpeg * deep-translator * pysrt --- # Simple Architecture ```text Upload Video ↓ Extract Audio ↓ Whisper Transcription ↓ Translate Text ↓ Generate .srt ↓ Download File ``` --- # Suggested Folder Structure ```text app/ ├── main.py ├── templates/ │ └── index.html ├── static/ │ └── styles.css ├── uploads/ ├── subtitles/ └── services/ ├── transcribe.py ├── translate.py └── srt_generator.py ``` --- # Main UI Single page with: * Upload input * Language dropdown * Generate button * Loading spinner * Download links --- # Main API ## Generate Subtitles ```http POST /generate-subtitles ``` Inputs: * video file * target language Outputs: * English `.srt` * Translated `.srt` --- # Suggested Dependencies ```txt fastapi uvicorn jinja2 python-multipart faster-whisper ffmpeg-python deep-translator pysrt ``` --- # Run Locally ```bash uvicorn app.main:app --reload ``` --- # MVP Rules * Keep everything in ONE FastAPI app * Store files locally * Use sync processing * No authentication * No database * No React * No Docker initially * No microservices * No overengineering --- # Build Order 1. Upload video 2. Extract audio 3. Generate English transcript 4. Generate English `.srt` 5. Add translation 6. Generate translated `.srt` 7. Improve UI later --- # Success Criteria The MVP is successful if: * Video upload works * English subtitles are generated * Translation works * `.srt` download works * End-to-end pipeline works locally