| # PRD.md β AI Subtitle Generator MVP |
|
|
| # Goal |
|
|
| Build a simple web app where users can: |
|
|
| 1. Upload a video |
| 2. Generate English subtitles using AI speech-to-text |
| 3. Translate subtitles into: |
|
|
| * Malayalam |
| * Tamil |
| * Hindi |
| 4. Download `.srt` subtitle files |
|
|
| The MVP should be: |
|
|
| * Extremely simple |
| * Fast to build |
| * Vibecoding-friendly |
| * Localhost only |
|
|
| --- |
|
|
| # Core Features |
|
|
| ## 1. Upload Video |
|
|
| Support: |
|
|
| * `.mp4` |
| * `.mov` |
| * `.mkv` |
| * `.webm` |
|
|
| --- |
|
|
| ## 2. Extract Audio |
|
|
| Use FFmpeg to extract audio from the uploaded video. |
|
|
| Example: |
|
|
| ```bash |
| ffmpeg -i input.mp4 -ar 16000 -ac 1 output.wav |
| ``` |
|
|
| --- |
|
|
| ## 3. Speech to Text |
|
|
| Use local: |
|
|
| ```python |
| faster-whisper |
| ``` |
|
|
| Generate: |
|
|
| * English transcript |
| * English `.srt` |
| * Timestamps |
|
|
| ### MVP Decision |
|
|
| The MVP will use local Faster-Whisper instead of cloud APIs. |
|
|
| Why? |
|
|
| * Free |
| * Fast enough for short videos |
| * Better privacy |
| * Works offline |
| * Easy localhost setup |
| * Easy to vibecode |
|
|
| ### Suggested Model |
|
|
| Start with: |
|
|
| ```python |
| base |
| ``` |
|
|
| Upgrade later if needed: |
|
|
| * `small` |
| * `medium` |
|
|
| --- |
|
|
| ### Example |
|
|
| ```python |
| from faster_whisper import WhisperModel |
| |
| model = WhisperModel("base") |
| segments, info = model.transcribe("audio.wav") |
| ``` |
|
|
| --- |
|
|
| --- |
|
|
| ## 4. Translate Subtitles |
|
|
| Use a small translation adapter layer. |
|
|
| The app should NOT directly depend on one translation provider. |
|
|
| This makes it easy to: |
|
|
| * start simple |
| * swap providers later |
| * experiment with better translation models |
|
|
| --- |
|
|
| ## MVP Translation Provider |
|
|
| Start with: |
|
|
| ```python |
| deep-translator |
| ``` |
|
|
| Translate English subtitles into: |
|
|
| * Malayalam (`ml`) |
| * Tamil (`ta`) |
| * Hindi (`hi`) |
|
|
| --- |
|
|
| ## Future Translation Provider |
|
|
| Later we can swap in: |
|
|
| * IndicTrans2 |
| * LibreTranslate |
| * OpenAI models |
| * Other local translation models |
|
|
| without changing the main application flow. |
|
|
| --- |
|
|
| ## Suggested Adapter Design |
|
|
| ```text |
| services/ |
| βββ translators/ |
| βββ base.py |
| βββ deep_translator_adapter.py |
| βββ indictrans_adapter.py |
| ``` |
|
|
| --- |
|
|
| ## Example Interface |
|
|
| ```python |
| class Translator: |
| def translate(self, text: str, target_lang: str) -> str: |
| pass |
| ``` |
|
|
| --- |
|
|
| ## Example MVP Usage |
|
|
| ```python |
| translator = DeepTranslatorAdapter() |
| translated = translator.translate(text, "ml") |
| ``` |
|
|
| --- |
|
|
| --- |
|
|
| ## 5. Generate `.srt` |
|
|
| Generate downloadable subtitle files. |
|
|
| Example: |
|
|
| ```srt |
| 1 |
| 00:00:01,000 --> 00:00:03,000 |
| Hello everyone |
| ``` |
|
|
| --- |
|
|
| # Tech Stack |
|
|
| ## Backend |
|
|
| * FastAPI |
|
|
| ## Frontend |
|
|
| * HTML |
| * CSS |
| * Minimal JavaScript |
| * Jinja2 Templates |
|
|
| ## AI/Processing |
|
|
| * Faster-Whisper |
| * FFmpeg |
| * deep-translator |
| * pysrt |
|
|
| --- |
|
|
| # Simple Architecture |
|
|
| ```text |
| Upload Video |
| β |
| Extract Audio |
| β |
| Whisper Transcription |
| β |
| Translate Text |
| β |
| Generate .srt |
| β |
| Download File |
| ``` |
|
|
| --- |
|
|
| # Suggested Folder Structure |
|
|
| ```text |
| app/ |
| βββ main.py |
| βββ templates/ |
| β βββ index.html |
| βββ static/ |
| β βββ styles.css |
| βββ uploads/ |
| βββ subtitles/ |
| βββ services/ |
| βββ transcribe.py |
| βββ translate.py |
| βββ srt_generator.py |
| ``` |
|
|
| --- |
|
|
| # Main UI |
|
|
| Single page with: |
|
|
| * Upload input |
| * Language dropdown |
| * Generate button |
| * Loading spinner |
| * Download links |
|
|
| --- |
|
|
| # Main API |
|
|
| ## Generate Subtitles |
|
|
| ```http |
| POST /generate-subtitles |
| ``` |
|
|
| Inputs: |
|
|
| * video file |
| * target language |
|
|
| Outputs: |
|
|
| * English `.srt` |
| * Translated `.srt` |
|
|
| --- |
|
|
| # Suggested Dependencies |
|
|
| ```txt |
| fastapi |
| uvicorn |
| jinja2 |
| python-multipart |
| faster-whisper |
| ffmpeg-python |
| deep-translator |
| pysrt |
| ``` |
|
|
| --- |
|
|
| # Run Locally |
|
|
| ```bash |
| uvicorn app.main:app --reload |
| ``` |
|
|
| --- |
|
|
| # MVP Rules |
|
|
| * Keep everything in ONE FastAPI app |
| * Store files locally |
| * Use sync processing |
| * No authentication |
| * No database |
| * No React |
| * No Docker initially |
| * No microservices |
| * No overengineering |
|
|
| --- |
|
|
| # Build Order |
|
|
| 1. Upload video |
| 2. Extract audio |
| 3. Generate English transcript |
| 4. Generate English `.srt` |
| 5. Add translation |
| 6. Generate translated `.srt` |
| 7. Improve UI later |
|
|
| --- |
|
|
| # Success Criteria |
|
|
| The MVP is successful if: |
|
|
| * Video upload works |
| * English subtitles are generated |
| * Translation works |
| * `.srt` download works |
| * End-to-end pipeline works locally |
|
|