prasanacodes commited on
Commit
04a936d
Β·
verified Β·
1 Parent(s): fa7dba3

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -0
README.md ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Audio/Video Translation Toolkit
3
+ emoji: πŸš€
4
+ colorFrom: indigo
5
+ colorTo: purple
6
+ sdk: gradio
7
+ python_version: 3.10.0
8
+ app_file: app.py
9
+ tags:
10
+ - translation
11
+ - audio
12
+ - video
13
+ - speech-synthesis
14
+ - voice-cloning
15
+ - gradio
16
+ models:
17
+ - openai/whisper-large-v3
18
+ - JaesungHuh/voice-gender-classifier
19
+ - ai4bharat/IndicF5
20
+ preload_from_hub:
21
+ - openai/whisper-large-v3
22
+ - JaesungHuh/voice-gender-classifier
23
+ - ai4bharat/IndicF5
24
+ ---
25
+
26
+ # πŸš€ Audio/Video Translation Toolkit
27
+
28
+ This application provides a complete pipeline for translating the audio of a video or audio file from English to various Indian languages. It handles everything from vocal separation and transcription to translation, speech synthesis, and voice cloning.
29
+
30
+ ---
31
+
32
+ ## ## Key Features πŸ› οΈ
33
+
34
+ * **🎬 Full Video Translation:** Upload a video, and the app will extract the audio, translate it, and merge it back into the original video.
35
+ * **🎡 Full Audio Translation:** Translate standalone audio files.
36
+ * **πŸ—£οΈ Vocal Separation:** Isolate vocals from background music before processing.
37
+ * **✍️ Transcription & Pace Detection:** Uses Whisper to transcribe the audio and determine the original speaker's pace.
38
+ * **🌐 Multi-Lingual Translation:** Translate text to Tamil, Telugu, or Hindi using either local models or the Sarvam API.
39
+ * **πŸ”Š Speech Synthesis:** Generate new speech in the target language using models from `ai4bharat`.
40
+ * **🧬 Voice Cloning:** Clone the voice from the original speaker onto the newly synthesized audio for a more natural result.
41
+
42
+ ---
43
+
44
+ ## ## How to Use the Main Pipeline
45
+
46
+ 1. Navigate to the **Translate Video** or **Translate Audio** tab.
47
+ 2. Upload your file.
48
+ 3. Select the **Target Language**.
49
+ 4. Choose the **Translation Via** method (`local` or `api`).
50
+ 5. Click the **Translate** button and wait for the process to complete.
51
+
52
+ ---
53
+
54
+ ## ## ‼️ Important Setup Note for Duplication
55
+
56
+ This Space relies on several local modules and data files that are not installed via `pip`. If you are duplicating this Space, you **must** manually upload the following directories to the root of your repository for the application to function correctly:
57
+
58
+ * `gender/` (Contains the gender prediction model code)
59
+ * `openvoice/` (Contains the voice cloning API and extractor code)
60
+ * `checkpoints_v2/` (Contains the pre-trained model checkpoints for voice cloning)
61
+ * `reference/` (Contains the reference audio and text files for speech synthesis)
62
+
63
+ Without these directories, the application will fail with `ImportError` or `FileNotFoundError`.