Spaces:

sammoftah
/

Video-localizer

Sleeping

App Files Files Community

Video-localizer / README.md

sammoftah

Trim README (remove deploy/how-to, MCP, troubleshooting sections)

75f963b verified about 1 month ago

preview code

raw

history blame contribute delete

4.49 kB

	---
	title: Global Video Localizer
	emoji: 🌍
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 4.44.1
	app_file: app.py
	pinned: false
	tags:
	- building-mcp-track-consumer
	- building-mcp-track-enterprise
	- building-mcp-track-developer
	- mcp-in-action-track-consumer
	- mcp-in-action-track-enterprise
	- mcp-in-action-track-developer
	- video
	- translation
	- tts
	- whisper
	- elevenlabs
	- gradio
	- mcp
	- ai-agents
	- multilingual
	- automation
	license: mit
	---

	# 🌍 Global Video Localizer

	Break language barriers. Reach global audiences. One video, infinite possibilities.

	## What This Does

	Global Video Localizer automates video dubbing. Upload a video, select a language, and get a professionally dubbed version in minutes. No studios. No voice actors. No waiting.

	It works completely free using open source AI models. You can use it right now without any API keys. If you want premium voice quality, you can optionally add your ElevenLabs API key in the UI.

	## The Problem It Solves

	Content creators, educators, and businesses face a massive challenge: reaching global audiences. Traditional video dubbing costs thousands of dollars per video and takes weeks. Most content never gets localized because it's simply too expensive and time-consuming.

	This app changes that. It makes professional video localization accessible to everyone, instantly.

	## Why It's Smart

	This is the first fully automated video localization system that works end-to-end with zero manual intervention. It combines state-of-the-art AI models in a seamless pipeline: your video becomes audio, audio becomes text, text gets translated, translation becomes voice, and voice syncs perfectly with your original video.

	The intelligent fallback system ensures it always works. If one service is unavailable, it automatically uses the next best option. You never get stuck with a silent video.

	## How It Works

	1. Extract & Transcribe: AI listens to your video with local Whisper (runs on the Space/host)
	2. Translate: Deep Translator (Google) with optional NLLB via HF Inference if `HF_TOKEN` is set
	3. Generate Voice: High-quality AI voices match the tone and emotion of the original
	- Primary: ElevenLabs (premium, optional; requires API key and package available)
	- Fallback: EdgeTTS (high quality, free, networked)
	- Fallback: Coqui TTS (local neural TTS, if installed)
	- Fallback: gTTS (reliable backup, networked)
	4. Sync & Merge: Perfect timing ensures the new audio matches your video frame-by-frame

	All of this happens automatically. You just upload and wait a few minutes.

	## Technical Capabilities

	- MCP Integration: Full Model Context Protocol server implementation, allowing Claude and other AI agents to localize videos programmatically
	- Multi-Modal Pipeline: Seamlessly processes video → audio → text → translation → voice → video in a single automated workflow
	- Intelligent Fallback System: Multiple TTS providers ensure reliability
	- Audio Processing: Advanced time-stretching and synchronization ensures perfect lip-sync and timing
	- Privacy-Aware: Transcription is local (Whisper); translation and most TTS fallbacks call external services unless Coqui is installed
	- Language Support: 8 languages with native-quality voices for each
	- Open Source Foundation: Built on open source models, works completely free without any API keys

	## Supported Languages

	- 🇪🇸 Spanish
	- 🇫🇷 French
	- 🇩🇪 German
	- 🇮🇹 Italian
	- 🇯🇵 Japanese
	- 🇨🇳 Chinese
	- 🇮🇳 Hindi
	- 🇸🇦 Arabic


	### Outputs & Privacy

	- ElevenLabs API keys are per-request only and never stored on the server; env var `ELEVENLABS_API_KEY` is optional for private deployments.
	- All generated videos are written to `outputs/` (oldest files auto-pruned to keep disk usage in check); temp workdirs are cleaned after each job.
	- Jobs run through a small Gradio queue to avoid overlapping heavy runs on shared Spaces.

	## Tech Stack

	- MCP: Model Context Protocol server for Claude integration
	- Whisper (Local): State-of-the-art speech recognition (offline, reliable)
	- Deep Translator: Reliable multilingual translation
	- ElevenLabs: Premium professional-grade voice synthesis (optional)
	- EdgeTTS: High-quality neural voices (open source, free)
	- Coqui TTS: Local neural TTS (fallback)
	- gTTS: Reliable backup TTS
	- MoviePy/FFmpeg: Video processing engine