Spaces:
Sleeping
Sleeping
Clarify How It Works and Technical Capabilities to match current pipeline
Browse files
README.md
CHANGED
|
@@ -51,13 +51,13 @@ The intelligent fallback system ensures it always works. If one service is unava
|
|
| 51 |
|
| 52 |
## How It Works
|
| 53 |
|
| 54 |
-
1. **Extract & Transcribe**: AI listens to your video
|
| 55 |
-
2. **Translate**:
|
| 56 |
3. **Generate Voice**: High-quality AI voices match the tone and emotion of the original
|
| 57 |
-
- Primary: ElevenLabs (premium, optional)
|
| 58 |
-
- Fallback: EdgeTTS (high quality, free,
|
| 59 |
-
- Fallback: Coqui TTS (local neural TTS)
|
| 60 |
-
- Fallback: gTTS (reliable backup)
|
| 61 |
4. **Sync & Merge**: Perfect timing ensures the new audio matches your video frame-by-frame
|
| 62 |
|
| 63 |
All of this happens automatically. You just upload and wait a few minutes.
|
|
@@ -68,7 +68,7 @@ All of this happens automatically. You just upload and wait a few minutes.
|
|
| 68 |
- **Multi-Modal Pipeline**: Seamlessly processes video β audio β text β translation β voice β video in a single automated workflow
|
| 69 |
- **Intelligent Fallback System**: Multiple TTS providers ensure reliability
|
| 70 |
- **Audio Processing**: Advanced time-stretching and synchronization ensures perfect lip-sync and timing
|
| 71 |
-
- **Privacy-
|
| 72 |
- **Language Support**: 8 languages with native-quality voices for each
|
| 73 |
- **Open Source Foundation**: Built on open source models, works completely free without any API keys
|
| 74 |
|
|
|
|
| 51 |
|
| 52 |
## How It Works
|
| 53 |
|
| 54 |
+
1. **Extract & Transcribe**: AI listens to your video with local Whisper (runs on the Space/host)
|
| 55 |
+
2. **Translate**: Deep Translator (Google) with optional NLLB via HF Inference if `HF_TOKEN` is set
|
| 56 |
3. **Generate Voice**: High-quality AI voices match the tone and emotion of the original
|
| 57 |
+
- Primary: ElevenLabs (premium, optional; requires API key and package available)
|
| 58 |
+
- Fallback: EdgeTTS (high quality, free, networked)
|
| 59 |
+
- Fallback: Coqui TTS (local neural TTS, if installed)
|
| 60 |
+
- Fallback: gTTS (reliable backup, networked)
|
| 61 |
4. **Sync & Merge**: Perfect timing ensures the new audio matches your video frame-by-frame
|
| 62 |
|
| 63 |
All of this happens automatically. You just upload and wait a few minutes.
|
|
|
|
| 68 |
- **Multi-Modal Pipeline**: Seamlessly processes video β audio β text β translation β voice β video in a single automated workflow
|
| 69 |
- **Intelligent Fallback System**: Multiple TTS providers ensure reliability
|
| 70 |
- **Audio Processing**: Advanced time-stretching and synchronization ensures perfect lip-sync and timing
|
| 71 |
+
- **Privacy-Aware**: Transcription is local (Whisper); translation and most TTS fallbacks call external services unless Coqui is installed
|
| 72 |
- **Language Support**: 8 languages with native-quality voices for each
|
| 73 |
- **Open Source Foundation**: Built on open source models, works completely free without any API keys
|
| 74 |
|