Spaces:

sammoftah
/

Video-localizer

Sleeping

sammoftah commited on Jan 6

Commit

2ae4d9f

verified ·

1 Parent(s): 8e82e67

Clarify How It Works and Technical Capabilities to match current pipeline

Files changed (1) hide show

README.md CHANGED Viewed

@@ -51,13 +51,13 @@ The intelligent fallback system ensures it always works. If one service is unava
 ## How It Works
-1. **Extract & Transcribe**: AI listens to your video and understands every word using local Whisper models
-2. **Translate**: Context-aware translation preserves meaning and nuance using Deep Translator and NLLB
 3. **Generate Voice**: High-quality AI voices match the tone and emotion of the original
-   - Primary: ElevenLabs (premium, optional)
-   - Fallback: EdgeTTS (high quality, free, open source)
-   - Fallback: Coqui TTS (local neural TTS)
-   - Fallback: gTTS (reliable backup)
 4. **Sync & Merge**: Perfect timing ensures the new audio matches your video frame-by-frame
 All of this happens automatically. You just upload and wait a few minutes.
@@ -68,7 +68,7 @@ All of this happens automatically. You just upload and wait a few minutes.
 - **Multi-Modal Pipeline**: Seamlessly processes video → audio → text → translation → voice → video in a single automated workflow
 - **Intelligent Fallback System**: Multiple TTS providers ensure reliability
 - **Audio Processing**: Advanced time-stretching and synchronization ensures perfect lip-sync and timing
-- **Privacy-First**: Local Whisper model runs on your device, keeping your content private
 - **Language Support**: 8 languages with native-quality voices for each
 - **Open Source Foundation**: Built on open source models, works completely free without any API keys

 ## How It Works
+1. **Extract & Transcribe**: AI listens to your video with local Whisper (runs on the Space/host)
+2. **Translate**: Deep Translator (Google) with optional NLLB via HF Inference if `HF_TOKEN` is set
 3. **Generate Voice**: High-quality AI voices match the tone and emotion of the original
+   - Primary: ElevenLabs (premium, optional; requires API key and package available)
+   - Fallback: EdgeTTS (high quality, free, networked)
+   - Fallback: Coqui TTS (local neural TTS, if installed)
+   - Fallback: gTTS (reliable backup, networked)
 4. **Sync & Merge**: Perfect timing ensures the new audio matches your video frame-by-frame
 All of this happens automatically. You just upload and wait a few minutes.
 - **Multi-Modal Pipeline**: Seamlessly processes video → audio → text → translation → voice → video in a single automated workflow
 - **Intelligent Fallback System**: Multiple TTS providers ensure reliability
 - **Audio Processing**: Advanced time-stretching and synchronization ensures perfect lip-sync and timing
+- **Privacy-Aware**: Transcription is local (Whisper); translation and most TTS fallbacks call external services unless Coqui is installed
 - **Language Support**: 8 languages with native-quality voices for each
 - **Open Source Foundation**: Built on open source models, works completely free without any API keys