PRD.md β AI Subtitle Generator MVP
Goal
Build a simple web app where users can:
Upload a video
Generate English subtitles using AI speech-to-text
Translate subtitles into:
- Malayalam
- Tamil
- Hindi
Download
.srtsubtitle files
The MVP should be:
- Extremely simple
- Fast to build
- Vibecoding-friendly
- Localhost only
Core Features
1. Upload Video
Support:
.mp4.mov.mkv.webm
2. Extract Audio
Use FFmpeg to extract audio from the uploaded video.
Example:
ffmpeg -i input.mp4 -ar 16000 -ac 1 output.wav
3. Speech to Text
Use local:
faster-whisper
Generate:
- English transcript
- English
.srt - Timestamps
MVP Decision
The MVP will use local Faster-Whisper instead of cloud APIs.
Why?
- Free
- Fast enough for short videos
- Better privacy
- Works offline
- Easy localhost setup
- Easy to vibecode
Suggested Model
Start with:
base
Upgrade later if needed:
smallmedium
Example
from faster_whisper import WhisperModel
model = WhisperModel("base")
segments, info = model.transcribe("audio.wav")
4. Translate Subtitles
Use a small translation adapter layer.
The app should NOT directly depend on one translation provider.
This makes it easy to:
- start simple
- swap providers later
- experiment with better translation models
MVP Translation Provider
Start with:
deep-translator
Translate English subtitles into:
- Malayalam (
ml) - Tamil (
ta) - Hindi (
hi)
Future Translation Provider
Later we can swap in:
- IndicTrans2
- LibreTranslate
- OpenAI models
- Other local translation models
without changing the main application flow.
Suggested Adapter Design
services/
βββ translators/
βββ base.py
βββ deep_translator_adapter.py
βββ indictrans_adapter.py
Example Interface
class Translator:
def translate(self, text: str, target_lang: str) -> str:
pass
Example MVP Usage
translator = DeepTranslatorAdapter()
translated = translator.translate(text, "ml")
5. Generate .srt
Generate downloadable subtitle files.
Example:
1
00:00:01,000 --> 00:00:03,000
Hello everyone
Tech Stack
Backend
- FastAPI
Frontend
- HTML
- CSS
- Minimal JavaScript
- Jinja2 Templates
AI/Processing
- Faster-Whisper
- FFmpeg
- deep-translator
- pysrt
Simple Architecture
Upload Video
β
Extract Audio
β
Whisper Transcription
β
Translate Text
β
Generate .srt
β
Download File
Suggested Folder Structure
app/
βββ main.py
βββ templates/
β βββ index.html
βββ static/
β βββ styles.css
βββ uploads/
βββ subtitles/
βββ services/
βββ transcribe.py
βββ translate.py
βββ srt_generator.py
Main UI
Single page with:
- Upload input
- Language dropdown
- Generate button
- Loading spinner
- Download links
Main API
Generate Subtitles
POST /generate-subtitles
Inputs:
- video file
- target language
Outputs:
- English
.srt - Translated
.srt
Suggested Dependencies
fastapi
uvicorn
jinja2
python-multipart
faster-whisper
ffmpeg-python
deep-translator
pysrt
Run Locally
uvicorn app.main:app --reload
MVP Rules
- Keep everything in ONE FastAPI app
- Store files locally
- Use sync processing
- No authentication
- No database
- No React
- No Docker initially
- No microservices
- No overengineering
Build Order
- Upload video
- Extract audio
- Generate English transcript
- Generate English
.srt - Add translation
- Generate translated
.srt - Improve UI later
Success Criteria
The MVP is successful if:
- Video upload works
- English subtitles are generated
- Translation works
.srtdownload works- End-to-end pipeline works locally