Spaces:

arjun-ms
/

Subtrans

Sleeping

App Files Files Community

Subtrans / PRD.md

arjun-ms

Initial commit: Subtrans Subtitle Pipeline

57bbccb 12 days ago

preview code

raw

history blame contribute delete

4.08 kB

	# PRD.md — AI Subtitle Generator MVP

	# Goal

	Build a simple web app where users can:

	1. Upload a video
	2. Generate English subtitles using AI speech-to-text
	3. Translate subtitles into:

	* Malayalam
	* Tamil
	* Hindi
	4. Download `.srt` subtitle files

	The MVP should be:

	* Extremely simple
	* Fast to build
	* Vibecoding-friendly
	* Localhost only

	---

	# Core Features

	## 1. Upload Video

	Support:

	* `.mp4`
	* `.mov`
	* `.mkv`
	* `.webm`

	---

	## 2. Extract Audio

	Use FFmpeg to extract audio from the uploaded video.

	Example:

	```bash
	ffmpeg -i input.mp4 -ar 16000 -ac 1 output.wav
	```

	---

	## 3. Speech to Text

	Use local:

	```python
	faster-whisper
	```

	Generate:

	* English transcript
	* English `.srt`
	* Timestamps

	### MVP Decision

	The MVP will use local Faster-Whisper instead of cloud APIs.

	Why?

	* Free
	* Fast enough for short videos
	* Better privacy
	* Works offline
	* Easy localhost setup
	* Easy to vibecode

	### Suggested Model

	Start with:

	```python
	base
	```

	Upgrade later if needed:

	* `small`
	* `medium`

	---

	### Example

	```python
	from faster_whisper import WhisperModel

	model = WhisperModel("base")
	segments, info = model.transcribe("audio.wav")
	```

	---

	---

	## 4. Translate Subtitles

	Use a small translation adapter layer.

	The app should NOT directly depend on one translation provider.

	This makes it easy to:

	* start simple
	* swap providers later
	* experiment with better translation models

	---

	## MVP Translation Provider

	Start with:

	```python
	deep-translator
	```

	Translate English subtitles into:

	* Malayalam (`ml`)
	* Tamil (`ta`)
	* Hindi (`hi`)

	---

	## Future Translation Provider

	Later we can swap in:

	* IndicTrans2
	* LibreTranslate
	* OpenAI models
	* Other local translation models

	without changing the main application flow.

	---

	## Suggested Adapter Design

	```text
	services/
	└── translators/
	├── base.py
	├── deep_translator_adapter.py
	└── indictrans_adapter.py
	```

	---

	## Example Interface

	```python
	class Translator:
	def translate(self, text: str, target_lang: str) -> str:
	pass
	```

	---

	## Example MVP Usage

	```python
	translator = DeepTranslatorAdapter()
	translated = translator.translate(text, "ml")
	```

	---

	---

	## 5. Generate `.srt`

	Generate downloadable subtitle files.

	Example:

	```srt
	1
	00:00:01,000 --> 00:00:03,000
	Hello everyone
	```

	---

	# Tech Stack

	## Backend

	* FastAPI

	## Frontend

	* HTML
	* CSS
	* Minimal JavaScript
	* Jinja2 Templates

	## AI/Processing

	* Faster-Whisper
	* FFmpeg
	* deep-translator
	* pysrt

	---

	# Simple Architecture

	```text
	Upload Video
	↓
	Extract Audio
	↓
	Whisper Transcription
	↓
	Translate Text
	↓
	Generate .srt
	↓
	Download File
	```

	---

	# Suggested Folder Structure

	```text
	app/
	├── main.py
	├── templates/
	│ └── index.html
	├── static/
	│ └── styles.css
	├── uploads/
	├── subtitles/
	└── services/
	├── transcribe.py
	├── translate.py
	└── srt_generator.py
	```

	---

	# Main UI

	Single page with:

	* Upload input
	* Language dropdown
	* Generate button
	* Loading spinner
	* Download links

	---

	# Main API

	## Generate Subtitles

	```http
	POST /generate-subtitles
	```

	Inputs:

	* video file
	* target language

	Outputs:

	* English `.srt`
	* Translated `.srt`

	---

	# Suggested Dependencies

	```txt
	fastapi
	uvicorn
	jinja2
	python-multipart
	faster-whisper
	ffmpeg-python
	deep-translator
	pysrt
	```

	---

	# Run Locally

	```bash
	uvicorn app.main:app --reload
	```

	---

	# MVP Rules

	* Keep everything in ONE FastAPI app
	* Store files locally
	* Use sync processing
	* No authentication
	* No database
	* No React
	* No Docker initially
	* No microservices
	* No overengineering

	---

	# Build Order

	1. Upload video
	2. Extract audio
	3. Generate English transcript
	4. Generate English `.srt`
	5. Add translation
	6. Generate translated `.srt`
	7. Improve UI later

	---

	# Success Criteria

	The MVP is successful if:

	* Video upload works
	* English subtitles are generated
	* Translation works
	* `.srt` download works
	* End-to-end pipeline works locally