sammoftah commited on
Commit
2ae4d9f
Β·
verified Β·
1 Parent(s): 8e82e67

Clarify How It Works and Technical Capabilities to match current pipeline

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -51,13 +51,13 @@ The intelligent fallback system ensures it always works. If one service is unava
51
 
52
  ## How It Works
53
 
54
- 1. **Extract & Transcribe**: AI listens to your video and understands every word using local Whisper models
55
- 2. **Translate**: Context-aware translation preserves meaning and nuance using Deep Translator and NLLB
56
  3. **Generate Voice**: High-quality AI voices match the tone and emotion of the original
57
- - Primary: ElevenLabs (premium, optional)
58
- - Fallback: EdgeTTS (high quality, free, open source)
59
- - Fallback: Coqui TTS (local neural TTS)
60
- - Fallback: gTTS (reliable backup)
61
  4. **Sync & Merge**: Perfect timing ensures the new audio matches your video frame-by-frame
62
 
63
  All of this happens automatically. You just upload and wait a few minutes.
@@ -68,7 +68,7 @@ All of this happens automatically. You just upload and wait a few minutes.
68
  - **Multi-Modal Pipeline**: Seamlessly processes video β†’ audio β†’ text β†’ translation β†’ voice β†’ video in a single automated workflow
69
  - **Intelligent Fallback System**: Multiple TTS providers ensure reliability
70
  - **Audio Processing**: Advanced time-stretching and synchronization ensures perfect lip-sync and timing
71
- - **Privacy-First**: Local Whisper model runs on your device, keeping your content private
72
  - **Language Support**: 8 languages with native-quality voices for each
73
  - **Open Source Foundation**: Built on open source models, works completely free without any API keys
74
 
 
51
 
52
  ## How It Works
53
 
54
+ 1. **Extract & Transcribe**: AI listens to your video with local Whisper (runs on the Space/host)
55
+ 2. **Translate**: Deep Translator (Google) with optional NLLB via HF Inference if `HF_TOKEN` is set
56
  3. **Generate Voice**: High-quality AI voices match the tone and emotion of the original
57
+ - Primary: ElevenLabs (premium, optional; requires API key and package available)
58
+ - Fallback: EdgeTTS (high quality, free, networked)
59
+ - Fallback: Coqui TTS (local neural TTS, if installed)
60
+ - Fallback: gTTS (reliable backup, networked)
61
  4. **Sync & Merge**: Perfect timing ensures the new audio matches your video frame-by-frame
62
 
63
  All of this happens automatically. You just upload and wait a few minutes.
 
68
  - **Multi-Modal Pipeline**: Seamlessly processes video β†’ audio β†’ text β†’ translation β†’ voice β†’ video in a single automated workflow
69
  - **Intelligent Fallback System**: Multiple TTS providers ensure reliability
70
  - **Audio Processing**: Advanced time-stretching and synchronization ensures perfect lip-sync and timing
71
+ - **Privacy-Aware**: Transcription is local (Whisper); translation and most TTS fallbacks call external services unless Coqui is installed
72
  - **Language Support**: 8 languages with native-quality voices for each
73
  - **Open Source Foundation**: Built on open source models, works completely free without any API keys
74