Subtrans / PRD.md
arjun-ms's picture
Initial commit: Subtrans Subtitle Pipeline
57bbccb

PRD.md β€” AI Subtitle Generator MVP

Goal

Build a simple web app where users can:

  1. Upload a video

  2. Generate English subtitles using AI speech-to-text

  3. Translate subtitles into:

    • Malayalam
    • Tamil
    • Hindi
  4. Download .srt subtitle files

The MVP should be:

  • Extremely simple
  • Fast to build
  • Vibecoding-friendly
  • Localhost only

Core Features

1. Upload Video

Support:

  • .mp4
  • .mov
  • .mkv
  • .webm

2. Extract Audio

Use FFmpeg to extract audio from the uploaded video.

Example:

ffmpeg -i input.mp4 -ar 16000 -ac 1 output.wav

3. Speech to Text

Use local:

faster-whisper

Generate:

  • English transcript
  • English .srt
  • Timestamps

MVP Decision

The MVP will use local Faster-Whisper instead of cloud APIs.

Why?

  • Free
  • Fast enough for short videos
  • Better privacy
  • Works offline
  • Easy localhost setup
  • Easy to vibecode

Suggested Model

Start with:

base

Upgrade later if needed:

  • small
  • medium

Example

from faster_whisper import WhisperModel

model = WhisperModel("base")
segments, info = model.transcribe("audio.wav")


4. Translate Subtitles

Use a small translation adapter layer.

The app should NOT directly depend on one translation provider.

This makes it easy to:

  • start simple
  • swap providers later
  • experiment with better translation models

MVP Translation Provider

Start with:

deep-translator

Translate English subtitles into:

  • Malayalam (ml)
  • Tamil (ta)
  • Hindi (hi)

Future Translation Provider

Later we can swap in:

  • IndicTrans2
  • LibreTranslate
  • OpenAI models
  • Other local translation models

without changing the main application flow.


Suggested Adapter Design

services/
└── translators/
    β”œβ”€β”€ base.py
    β”œβ”€β”€ deep_translator_adapter.py
    └── indictrans_adapter.py

Example Interface

class Translator:
    def translate(self, text: str, target_lang: str) -> str:
        pass

Example MVP Usage

translator = DeepTranslatorAdapter()
translated = translator.translate(text, "ml")


5. Generate .srt

Generate downloadable subtitle files.

Example:

1
00:00:01,000 --> 00:00:03,000
Hello everyone

Tech Stack

Backend

  • FastAPI

Frontend

  • HTML
  • CSS
  • Minimal JavaScript
  • Jinja2 Templates

AI/Processing

  • Faster-Whisper
  • FFmpeg
  • deep-translator
  • pysrt

Simple Architecture

Upload Video
   ↓
Extract Audio
   ↓
Whisper Transcription
   ↓
Translate Text
   ↓
Generate .srt
   ↓
Download File

Suggested Folder Structure

app/
β”œβ”€β”€ main.py
β”œβ”€β”€ templates/
β”‚   └── index.html
β”œβ”€β”€ static/
β”‚   └── styles.css
β”œβ”€β”€ uploads/
β”œβ”€β”€ subtitles/
└── services/
    β”œβ”€β”€ transcribe.py
    β”œβ”€β”€ translate.py
    └── srt_generator.py

Main UI

Single page with:

  • Upload input
  • Language dropdown
  • Generate button
  • Loading spinner
  • Download links

Main API

Generate Subtitles

POST /generate-subtitles

Inputs:

  • video file
  • target language

Outputs:

  • English .srt
  • Translated .srt

Suggested Dependencies

fastapi
uvicorn
jinja2
python-multipart
faster-whisper
ffmpeg-python
deep-translator
pysrt

Run Locally

uvicorn app.main:app --reload

MVP Rules

  • Keep everything in ONE FastAPI app
  • Store files locally
  • Use sync processing
  • No authentication
  • No database
  • No React
  • No Docker initially
  • No microservices
  • No overengineering

Build Order

  1. Upload video
  2. Extract audio
  3. Generate English transcript
  4. Generate English .srt
  5. Add translation
  6. Generate translated .srt
  7. Improve UI later

Success Criteria

The MVP is successful if:

  • Video upload works
  • English subtitles are generated
  • Translation works
  • .srt download works
  • End-to-end pipeline works locally