title: Subtrans
emoji: π₯
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
Subtrans
A high-precision AI pipeline for automated subtitle generation and translation with context-aware self-correction.
π Key Features
- Offline Transcription: Uses local
faster-whisper(mediummodel) with Phonetic Bias to correctly recognize technical terms (Naukri, NotebookLM). - Precision Patching: A dedicated LLM pass (Gemini) that detects and corrects low-confidence entities (names/brands) in the English source.
- Multi-Engine Translation:
- Google Translate (
deep-translator): Fast, literal translation. - Groq Cloud LLM (
Llama 3.3 70B): Idiomatic, natural conversational translations. - Gemini 1.5/2.5 Pro & Flash: High-capacity translation using Full-Context Batching (entire file in one request) and Glossary Support.
- Google Translate (
- Content Isolation: Secure
<l>tag escrow for transcript content to prevent LLM instruction leakage. - Automated Self-Correction Pass: Post-translation quality audit using Gemini 3.1 Pro or Llama 3.3 70B.
π οΈ Setup & Installation
1. Prerequisites
Ensure you have Python 3.10+ and FFmpeg installed on your system.
- FFmpeg (Windows): Install via Scoop (
scoop install ffmpeg) or Chocolatey (choco install ffmpeg). - FFmpeg (macOS):
brew install ffmpeg - FFmpeg (Linux):
sudo apt install ffmpeg
2. Install Dependencies
Clone the repository and install the required dependencies:
pip install -r requirements.txt
3. Environment Configuration
Create a .env file in the root directory and add your Groq API Key:
GROQ_API_KEY=your_groq_api_key_here
π» How to Run
Start the Application Server
Run the local FastAPI server using uvicorn:
uvicorn app.main:app --reload
Once running, open your browser and navigate to: http://localhost:8000
π§ͺ Running Tests & Validation
All tests are placed under the app/tests/ directory and can be executed as follows:
Run the Entire Test Suite
Verify pipeline logic, translators, and validation engine:
pytest app/tests
Run Transcription & Model Accuracy Test
Verify transcription accuracy on a test clip using the Whisper medium model:
python app/tests/test_medium_accuracy.py
Run Automated Pipeline Tests
Run a full end-to-end batch test on multiple videos with built-in logging and transcription reuse:
python app/tests/run_batch_tests.py
Note: This script will prompt you to reuse previous transcriptions to save time and API costs.
Core Test Suite
Verify specific components (Translators, Precision Patch, Glossary):
pytest app/tests/test_precision_patch.py
pytest app/tests/test_glossary_and_context.py
π Project Structure
app/services/: Core logic (Transcribe, Patch, Validate).app/services/translators/: Plugin-based LLM adapters.app/tests/: Integration tests and therun_batch_tests.pyrunner.app/tests/experimental/: Archive for research and one-off debugging scripts.findings/: Detailed development logs and architectural research results.
