Subtrans / PRD.md
arjun-ms's picture
Initial commit: Subtrans Subtitle Pipeline
57bbccb
# PRD.md β€” AI Subtitle Generator MVP
# Goal
Build a simple web app where users can:
1. Upload a video
2. Generate English subtitles using AI speech-to-text
3. Translate subtitles into:
* Malayalam
* Tamil
* Hindi
4. Download `.srt` subtitle files
The MVP should be:
* Extremely simple
* Fast to build
* Vibecoding-friendly
* Localhost only
---
# Core Features
## 1. Upload Video
Support:
* `.mp4`
* `.mov`
* `.mkv`
* `.webm`
---
## 2. Extract Audio
Use FFmpeg to extract audio from the uploaded video.
Example:
```bash
ffmpeg -i input.mp4 -ar 16000 -ac 1 output.wav
```
---
## 3. Speech to Text
Use local:
```python
faster-whisper
```
Generate:
* English transcript
* English `.srt`
* Timestamps
### MVP Decision
The MVP will use local Faster-Whisper instead of cloud APIs.
Why?
* Free
* Fast enough for short videos
* Better privacy
* Works offline
* Easy localhost setup
* Easy to vibecode
### Suggested Model
Start with:
```python
base
```
Upgrade later if needed:
* `small`
* `medium`
---
### Example
```python
from faster_whisper import WhisperModel
model = WhisperModel("base")
segments, info = model.transcribe("audio.wav")
```
---
---
## 4. Translate Subtitles
Use a small translation adapter layer.
The app should NOT directly depend on one translation provider.
This makes it easy to:
* start simple
* swap providers later
* experiment with better translation models
---
## MVP Translation Provider
Start with:
```python
deep-translator
```
Translate English subtitles into:
* Malayalam (`ml`)
* Tamil (`ta`)
* Hindi (`hi`)
---
## Future Translation Provider
Later we can swap in:
* IndicTrans2
* LibreTranslate
* OpenAI models
* Other local translation models
without changing the main application flow.
---
## Suggested Adapter Design
```text
services/
└── translators/
β”œβ”€β”€ base.py
β”œβ”€β”€ deep_translator_adapter.py
└── indictrans_adapter.py
```
---
## Example Interface
```python
class Translator:
def translate(self, text: str, target_lang: str) -> str:
pass
```
---
## Example MVP Usage
```python
translator = DeepTranslatorAdapter()
translated = translator.translate(text, "ml")
```
---
---
## 5. Generate `.srt`
Generate downloadable subtitle files.
Example:
```srt
1
00:00:01,000 --> 00:00:03,000
Hello everyone
```
---
# Tech Stack
## Backend
* FastAPI
## Frontend
* HTML
* CSS
* Minimal JavaScript
* Jinja2 Templates
## AI/Processing
* Faster-Whisper
* FFmpeg
* deep-translator
* pysrt
---
# Simple Architecture
```text
Upload Video
↓
Extract Audio
↓
Whisper Transcription
↓
Translate Text
↓
Generate .srt
↓
Download File
```
---
# Suggested Folder Structure
```text
app/
β”œβ”€β”€ main.py
β”œβ”€β”€ templates/
β”‚ └── index.html
β”œβ”€β”€ static/
β”‚ └── styles.css
β”œβ”€β”€ uploads/
β”œβ”€β”€ subtitles/
└── services/
β”œβ”€β”€ transcribe.py
β”œβ”€β”€ translate.py
└── srt_generator.py
```
---
# Main UI
Single page with:
* Upload input
* Language dropdown
* Generate button
* Loading spinner
* Download links
---
# Main API
## Generate Subtitles
```http
POST /generate-subtitles
```
Inputs:
* video file
* target language
Outputs:
* English `.srt`
* Translated `.srt`
---
# Suggested Dependencies
```txt
fastapi
uvicorn
jinja2
python-multipart
faster-whisper
ffmpeg-python
deep-translator
pysrt
```
---
# Run Locally
```bash
uvicorn app.main:app --reload
```
---
# MVP Rules
* Keep everything in ONE FastAPI app
* Store files locally
* Use sync processing
* No authentication
* No database
* No React
* No Docker initially
* No microservices
* No overengineering
---
# Build Order
1. Upload video
2. Extract audio
3. Generate English transcript
4. Generate English `.srt`
5. Add translation
6. Generate translated `.srt`
7. Improve UI later
---
# Success Criteria
The MVP is successful if:
* Video upload works
* English subtitles are generated
* Translation works
* `.srt` download works
* End-to-end pipeline works locally