# PRD.md — AI Subtitle Generator MVP

# Goal

Build a simple web app where users can:

1. Upload a video
2. Generate English subtitles using AI speech-to-text
3. Translate subtitles into:

   * Malayalam
   * Tamil
   * Hindi
4. Download `.srt` subtitle files

The MVP should be:

* Extremely simple
* Fast to build
* Vibecoding-friendly
* Localhost only

---

# Core Features

## 1. Upload Video

Support:

* `.mp4`
* `.mov`
* `.mkv`
* `.webm`

---

## 2. Extract Audio

Use FFmpeg to extract audio from the uploaded video.

Example:

```bash
ffmpeg -i input.mp4 -ar 16000 -ac 1 output.wav
```

---

## 3. Speech to Text

Use local:

```python
faster-whisper
```

Generate:

* English transcript
* English `.srt`
* Timestamps

### MVP Decision

The MVP will use local Faster-Whisper instead of cloud APIs.

Why?

* Free
* Fast enough for short videos
* Better privacy
* Works offline
* Easy localhost setup
* Easy to vibecode

### Suggested Model

Start with:

```python
base
```

Upgrade later if needed:

* `small`
* `medium`

---

### Example

```python
from faster_whisper import WhisperModel

model = WhisperModel("base")
segments, info = model.transcribe("audio.wav")
```

---

---

## 4. Translate Subtitles

Use a small translation adapter layer.

The app should NOT directly depend on one translation provider.

This makes it easy to:

* start simple
* swap providers later
* experiment with better translation models

---

## MVP Translation Provider

Start with:

```python
deep-translator
```

Translate English subtitles into:

* Malayalam (`ml`)
* Tamil (`ta`)
* Hindi (`hi`)

---

## Future Translation Provider

Later we can swap in:

* IndicTrans2
* LibreTranslate
* OpenAI models
* Other local translation models

without changing the main application flow.

---

## Suggested Adapter Design

```text
services/
└── translators/
    ├── base.py
    ├── deep_translator_adapter.py
    └── indictrans_adapter.py
```

---

## Example Interface

```python
class Translator:
    def translate(self, text: str, target_lang: str) -> str:
        pass
```

---

## Example MVP Usage

```python
translator = DeepTranslatorAdapter()
translated = translator.translate(text, "ml")
```

---

---

## 5. Generate `.srt`

Generate downloadable subtitle files.

Example:

```srt
1
00:00:01,000 --> 00:00:03,000
Hello everyone
```

---

# Tech Stack

## Backend

* FastAPI

## Frontend

* HTML
* CSS
* Minimal JavaScript
* Jinja2 Templates

## AI/Processing

* Faster-Whisper
* FFmpeg
* deep-translator
* pysrt

---

# Simple Architecture

```text
Upload Video
   ↓
Extract Audio
   ↓
Whisper Transcription
   ↓
Translate Text
   ↓
Generate .srt
   ↓
Download File
```

---

# Suggested Folder Structure

```text
app/
├── main.py
├── templates/
│   └── index.html
├── static/
│   └── styles.css
├── uploads/
├── subtitles/
└── services/
    ├── transcribe.py
    ├── translate.py
    └── srt_generator.py
```

---

# Main UI

Single page with:

* Upload input
* Language dropdown
* Generate button
* Loading spinner
* Download links

---

# Main API

## Generate Subtitles

```http
POST /generate-subtitles
```

Inputs:

* video file
* target language

Outputs:

* English `.srt`
* Translated `.srt`

---

# Suggested Dependencies

```txt
fastapi
uvicorn
jinja2
python-multipart
faster-whisper
ffmpeg-python
deep-translator
pysrt
```

---

# Run Locally

```bash
uvicorn app.main:app --reload
```

---

# MVP Rules

* Keep everything in ONE FastAPI app
* Store files locally
* Use sync processing
* No authentication
* No database
* No React
* No Docker initially
* No microservices
* No overengineering

---

# Build Order

1. Upload video
2. Extract audio
3. Generate English transcript
4. Generate English `.srt`
5. Add translation
6. Generate translated `.srt`
7. Improve UI later

---

# Success Criteria

The MVP is successful if:

* Video upload works
* English subtitles are generated
* Translation works
* `.srt` download works
* End-to-end pipeline works locally