STTR commited on
Commit
40e1a06
·
1 Parent(s): 30d00e8

Add complete Gradio UI with voice translation

Browse files
Files changed (2) hide show
  1. README.md +70 -12
  2. app.py +196 -128
README.md CHANGED
@@ -1,8 +1,8 @@
1
  ---
2
- title: STTR - Speech Translation
3
  emoji: 🌍
4
- colorFrom: blue
5
- colorTo: purple
6
  sdk: gradio
7
  sdk_version: "4.44.0"
8
  app_file: app.py
@@ -11,14 +11,72 @@ license: mit
11
  hardware: t4-small
12
  ---
13
 
14
- # 🌍 STTR - Speech & Translation API
15
 
16
- ## Meta AI Models:
17
- - 🎤 **SeamlessM4T v2 Large** - STT (101 languages)
18
- - 🌍 **NLLB-200** - Translation (200 languages + Darija!)
19
- - 🎭 **SeamlessExpressive** - Expressive Speech Translation (preserves tone!)
20
 
21
- ## API Endpoints:
22
- - `/stt` - Speech-to-Text
23
- - `/translate` - Text Translation
24
- - `/expressive` - Expressive Speech-to-Speech Translation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Instant Translat - AI Voice Translation
3
  emoji: 🌍
4
+ colorFrom: purple
5
+ colorTo: blue
6
  sdk: gradio
7
  sdk_version: "4.44.0"
8
  app_file: app.py
 
11
  hardware: t4-small
12
  ---
13
 
14
+ # 🌍 Instant Translat - AI Voice Translation
15
 
16
+ **Real-time voice translation with AI - 200+ languages including Moroccan Darija**
 
 
 
17
 
18
+ ## Features
19
+
20
+ - 🎤 **Speech-to-Text** - SeamlessM4T v2 Large (101 languages)
21
+ - 🌍 **Translation** - NLLB-200 (200 languages + Moroccan Darija)
22
+ - 🔊 **Text-to-Speech** - Fish Audio S1 (Natural voice)
23
+ - 🎭 **Voice Cloning** - Hear translation in your own voice!
24
+ - 🧠 **Smart Mode** - Auto language detection
25
+
26
+ ## 🌍 Supported Languages
27
+
28
+ - 🇲🇦 **Moroccan Arabic (Darija)** - الدارجة المغربية
29
+ - 🇸🇦 Arabic (MSA)
30
+ - 🇫🇷 French
31
+ - 🇬🇧 English
32
+ - 🇪🇸 Spanish
33
+ - 🇩🇪 German
34
+ - 🇮🇹 Italian
35
+ - 🇵🇹 Portuguese
36
+ - 🇨🇳 Chinese
37
+ - 🇯🇵 Japanese
38
+ - 🇰🇷 Korean
39
+ - 🇷🇺 Russian
40
+ - And 190+ more languages!
41
+
42
+ ## 🎯 How to Use
43
+
44
+ 1. **Select Languages**: Choose your source and target languages
45
+ 2. **Record**: Click the microphone button and speak clearly
46
+ 3. **Translate**: Click "Translate" button
47
+ 4. **Listen**: Hear the translation with natural voice
48
+ 5. **Voice Clone**: Enable to hear translation in your own voice!
49
+
50
+ ## 🔧 Technology
51
+
52
+ - **STT**: Meta's SeamlessM4T v2 Large
53
+ - **Translation**: Meta's NLLB-200
54
+ - **TTS**: Fish Audio S1
55
+ - **Voice Cloning**: Fish Audio API
56
+ - **Framework**: Gradio + PyTorch
57
+
58
+ ## 🔒 Privacy & Security
59
+
60
+ - ✅ No data stored
61
+ - ✅ Real-time processing
62
+ - ✅ Secure API calls
63
+ - ✅ Open source
64
+
65
+ ## 📱 Use Cases
66
+
67
+ - 🗣️ Real-time conversations
68
+ - 📚 Language learning
69
+ - 🌐 Travel assistance
70
+ - 💼 Business meetings
71
+ - 🎓 Education
72
+
73
+ ## 🚀 Coming Soon
74
+
75
+ - 💳 Premium features with Apple Pay & Google Pay
76
+ - 📱 Mobile app (iOS & Android)
77
+ - 🎯 More languages
78
+ - 🔊 More voice options
79
+
80
+ ---
81
+
82
+ **Made with ❤️ using Meta AI models**
app.py CHANGED
@@ -4,32 +4,31 @@ from transformers import (
4
  SeamlessM4Tv2ForSpeechToText,
5
  AutoModelForSeq2SeqLM,
6
  AutoTokenizer,
7
- SeamlessM4Tv2Model,
8
  )
9
  import torch
10
  import numpy as np
11
- import torchaudio
 
12
 
13
  # ============================================================
14
- # 🚀 Device Setup
15
  # ============================================================
16
-
17
  device = "cuda" if torch.cuda.is_available() else "cpu"
18
  print(f"🖥️ Device: {device}")
19
 
20
  # ============================================================
21
- # 📥 Load Models
22
  # ============================================================
23
 
24
- # 1. SeamlessM4T v2 Large for STT
25
- print("📥 Loading SeamlessM4T v2 Large (STT)...")
26
  STT_MODEL = "facebook/seamless-m4t-v2-large"
27
  stt_processor = AutoProcessor.from_pretrained(STT_MODEL)
28
  stt_model = SeamlessM4Tv2ForSpeechToText.from_pretrained(STT_MODEL)
29
  stt_model = stt_model.to(device).eval()
30
  print("✅ SeamlessM4T v2 Large loaded!")
31
 
32
- # 2. NLLB-200 for Translation
33
  print("📥 Loading NLLB-200...")
34
  NLLB_MODEL = "facebook/nllb-200-distilled-600M"
35
  nllb_tokenizer = AutoTokenizer.from_pretrained(NLLB_MODEL)
@@ -37,60 +36,67 @@ nllb_model = AutoModelForSeq2SeqLM.from_pretrained(NLLB_MODEL)
37
  nllb_model = nllb_model.to(device).eval()
38
  print("✅ NLLB-200 loaded!")
39
 
40
- # 3. SeamlessExpressive for Expressive Speech Translation
41
- print("📥 Loading SeamlessExpressive...")
42
- EXPRESSIVE_MODEL = "facebook/seamless-expressive"
43
- try:
44
- exp_processor = AutoProcessor.from_pretrained(EXPRESSIVE_MODEL)
45
- exp_model = SeamlessM4Tv2Model.from_pretrained(EXPRESSIVE_MODEL)
46
- exp_model = exp_model.to(device).eval()
47
- EXPRESSIVE_AVAILABLE = True
48
- print("✅ SeamlessExpressive loaded!")
49
- except Exception as e:
50
- EXPRESSIVE_AVAILABLE = False
51
- print(f"⚠️ SeamlessExpressive not available: {e}")
52
-
53
  print("🎉 All models ready!")
54
 
55
  # ============================================================
56
  # Language Codes
57
  # ============================================================
58
-
59
  NLLB_LANGS = {
60
- "English": "eng_Latn", "French": "fra_Latn", "Arabic": "arb_Arab",
61
- "Moroccan Arabic": "ary_Arab", "Spanish": "spa_Latn", "German": "deu_Latn",
62
- "Italian": "ita_Latn", "Portuguese": "por_Latn", "Chinese": "zho_Hans",
63
- "Japanese": "jpn_Jpan", "Korean": "kor_Hang", "Russian": "rus_Cyrl",
64
- "Turkish": "tur_Latn", "Dutch": "nld_Latn", "Hindi": "hin_Deva",
 
 
 
 
 
 
 
 
 
 
65
  }
66
 
67
  STT_LANGS = {
68
- "English": "eng", "French": "fra", "Arabic": "arb", "Spanish": "spa",
69
- "German": "deu", "Italian": "ita", "Portuguese": "por", "Chinese": "cmn",
70
- "Japanese": "jpn", "Korean": "kor", "Russian": "rus", "Turkish": "tur",
 
 
 
 
 
 
 
 
 
71
  }
72
 
73
- EXPRESSIVE_LANGS = ["English", "French", "German", "Spanish", "Italian", "Chinese"]
 
74
 
75
  # ============================================================
76
- # STT Function (SeamlessM4T v2 Large)
77
  # ============================================================
78
 
79
- def stt(audio, src_lang):
80
- """Speech-to-Text using SeamlessM4T v2 Large"""
81
  if audio is None:
82
- return "No audio provided"
83
 
84
  try:
 
85
  if isinstance(audio, tuple):
86
  sample_rate, audio_data = audio
87
  audio_data = audio_data.astype(np.float32)
88
  if np.abs(audio_data).max() > 1.0:
89
  audio_data = audio_data / 32768.0
90
  else:
91
- return "Invalid audio format"
92
 
93
- src_code = STT_LANGS.get(src_lang, "eng")
94
 
95
  inputs = stt_processor(
96
  audios=audio_data,
@@ -105,28 +111,16 @@ def stt(audio, src_lang):
105
  generate_speech=False
106
  )
107
 
108
- text = stt_processor.decode(output_tokens[0].tolist(), skip_special_tokens=True)
109
- return text
110
- except Exception as e:
111
- return f"Error: {str(e)}"
112
-
113
- # ============================================================
114
- # Translation Function (NLLB-200)
115
- # ============================================================
116
-
117
- def translate(text, src_lang, tgt_lang):
118
- """Translation using NLLB-200"""
119
- if not text or not text.strip():
120
- return ""
121
-
122
- try:
123
- src_code = NLLB_LANGS.get(src_lang, "eng_Latn")
124
- tgt_code = NLLB_LANGS.get(tgt_lang, "fra_Latn")
125
 
126
- nllb_tokenizer.src_lang = src_code
127
- inputs = nllb_tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512).to(device)
 
128
 
129
- forced_bos_token_id = nllb_tokenizer.convert_tokens_to_ids(tgt_code)
 
 
 
130
 
131
  with torch.no_grad():
132
  outputs = nllb_model.generate(
@@ -136,91 +130,165 @@ def translate(text, src_lang, tgt_lang):
136
  num_beams=5
137
  )
138
 
139
- return nllb_tokenizer.decode(outputs[0], skip_special_tokens=True)
140
- except Exception as e:
141
- return f"Error: {str(e)}"
 
 
 
 
 
 
 
142
 
143
- # ============================================================
144
- # Expressive Speech Translation (SeamlessExpressive)
145
- # ============================================================
 
 
 
 
 
146
 
147
- def expressive_translate(audio, src_lang, tgt_lang):
148
- """Expressive Speech-to-Speech Translation"""
149
- if not EXPRESSIVE_AVAILABLE:
150
- return None, "SeamlessExpressive not available"
151
-
152
- if audio is None:
153
- return None, "No audio provided"
154
 
155
  try:
156
- if isinstance(audio, tuple):
157
- sample_rate, audio_data = audio
158
- audio_data = audio_data.astype(np.float32)
159
- if np.abs(audio_data).max() > 1.0:
160
- audio_data = audio_data / 32768.0
161
- else:
162
- return None, "Invalid audio format"
163
-
164
- src_code = STT_LANGS.get(src_lang, "eng")
165
- tgt_code = STT_LANGS.get(tgt_lang, "fra")
166
-
167
- inputs = exp_processor(
168
- audios=audio_data,
169
- sampling_rate=sample_rate,
170
- return_tensors="pt"
171
- ).to(device)
172
 
173
- with torch.no_grad():
174
- output = exp_model.generate(
175
- **inputs,
176
- tgt_lang=tgt_code,
177
- return_intermediate_token_ids=True
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
178
  )
179
 
180
- # Get audio output
181
- audio_output = output.audio_sequences[0].cpu().numpy()
182
-
183
- # Get text
184
- text = exp_processor.decode(output.sequences[0].tolist(), skip_special_tokens=True)
185
-
186
- return (16000, audio_output), text
187
 
188
- except Exception as e:
189
- return None, f"Error: {str(e)}"
 
190
 
191
  # ============================================================
192
  # Gradio Interface
193
  # ============================================================
194
 
195
- with gr.Blocks(title="STTR API", theme=gr.themes.Soft()) as demo:
196
- gr.Markdown("# 🌍 STTR - Speech & Translation API")
197
- gr.Markdown("**Meta AI Models:** SeamlessM4T v2 Large + NLLB-200 + SeamlessExpressive")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
198
 
199
- with gr.Tab("🎤 Speech-to-Text"):
200
- stt_audio = gr.Audio(label="Audio", type="numpy")
201
- stt_lang = gr.Dropdown(list(STT_LANGS.keys()), label="Language", value="English")
202
- stt_output = gr.Textbox(label="Transcription", lines=3)
203
- stt_btn = gr.Button("🎤 Transcribe", variant="primary")
204
- stt_btn.click(stt, [stt_audio, stt_lang], stt_output, api_name="stt")
205
 
206
- with gr.Tab("🌍 Translation"):
207
- trans_text = gr.Textbox(label="Text", lines=3)
208
- with gr.Row():
209
- trans_src = gr.Dropdown(list(NLLB_LANGS.keys()), label="From", value="English")
210
- trans_tgt = gr.Dropdown(list(NLLB_LANGS.keys()), label="To", value="French")
211
- trans_output = gr.Textbox(label="Translation", lines=3)
212
- trans_btn = gr.Button("🌍 Translate", variant="primary")
213
- trans_btn.click(translate, [trans_text, trans_src, trans_tgt], trans_output, api_name="translate")
214
 
215
- with gr.Tab("🎭 Expressive (S2S)"):
216
- gr.Markdown("**SeamlessExpressive** - Preserves tone, emotion & style!")
217
- exp_audio = gr.Audio(label="Input Audio", type="numpy")
218
- with gr.Row():
219
- exp_src = gr.Dropdown(EXPRESSIVE_LANGS, label="From", value="English")
220
- exp_tgt = gr.Dropdown(EXPRESSIVE_LANGS, label="To", value="French")
221
- exp_output_audio = gr.Audio(label="Translated Audio")
222
- exp_output_text = gr.Textbox(label="Translated Text")
223
- exp_btn = gr.Button("🎭 Translate with Expression", variant="primary")
224
- exp_btn.click(expressive_translate, [exp_audio, exp_src, exp_tgt], [exp_output_audio, exp_output_text], api_name="expressive")
225
 
226
- demo.launch()
 
 
4
  SeamlessM4Tv2ForSpeechToText,
5
  AutoModelForSeq2SeqLM,
6
  AutoTokenizer,
 
7
  )
8
  import torch
9
  import numpy as np
10
+ import requests
11
+ import os
12
 
13
  # ============================================================
14
+ # Device Setup
15
  # ============================================================
 
16
  device = "cuda" if torch.cuda.is_available() else "cpu"
17
  print(f"🖥️ Device: {device}")
18
 
19
  # ============================================================
20
+ # Load Models
21
  # ============================================================
22
 
23
+ # SeamlessM4T v2 Large for STT
24
+ print("📥 Loading SeamlessM4T v2 Large...")
25
  STT_MODEL = "facebook/seamless-m4t-v2-large"
26
  stt_processor = AutoProcessor.from_pretrained(STT_MODEL)
27
  stt_model = SeamlessM4Tv2ForSpeechToText.from_pretrained(STT_MODEL)
28
  stt_model = stt_model.to(device).eval()
29
  print("✅ SeamlessM4T v2 Large loaded!")
30
 
31
+ # NLLB-200 for Translation
32
  print("📥 Loading NLLB-200...")
33
  NLLB_MODEL = "facebook/nllb-200-distilled-600M"
34
  nllb_tokenizer = AutoTokenizer.from_pretrained(NLLB_MODEL)
 
36
  nllb_model = nllb_model.to(device).eval()
37
  print("✅ NLLB-200 loaded!")
38
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
  print("🎉 All models ready!")
40
 
41
  # ============================================================
42
  # Language Codes
43
  # ============================================================
 
44
  NLLB_LANGS = {
45
+ "🇲🇦 Moroccan Arabic (Darija)": "ary_Arab",
46
+ "🇸🇦 Arabic": "arb_Arab",
47
+ "🇫🇷 French": "fra_Latn",
48
+ "🇬🇧 English": "eng_Latn",
49
+ "🇪🇸 Spanish": "spa_Latn",
50
+ "🇩🇪 German": "deu_Latn",
51
+ "🇮🇹 Italian": "ita_Latn",
52
+ "🇵🇹 Portuguese": "por_Latn",
53
+ "🇨🇳 Chinese": "zho_Hans",
54
+ "🇯🇵 Japanese": "jpn_Jpan",
55
+ "🇰🇷 Korean": "kor_Hang",
56
+ "🇷🇺 Russian": "rus_Cyrl",
57
+ "🇹🇷 Turkish": "tur_Latn",
58
+ "🇳🇱 Dutch": "nld_Latn",
59
+ "🇮🇳 Hindi": "hin_Deva",
60
  }
61
 
62
  STT_LANGS = {
63
+ "🇲🇦 Moroccan Arabic (Darija)": "arb",
64
+ "🇸🇦 Arabic": "arb",
65
+ "🇫🇷 French": "fra",
66
+ "🇬🇧 English": "eng",
67
+ "🇪🇸 Spanish": "spa",
68
+ "🇩🇪 German": "deu",
69
+ "🇮🇹 Italian": "ita",
70
+ "🇵🇹 Portuguese": "por",
71
+ "🇨🇳 Chinese": "cmn",
72
+ "🇯🇵 Japanese": "jpn",
73
+ "🇰🇷 Korean": "kor",
74
+ "🇷🇺 Russian": "rus",
75
  }
76
 
77
+ # Fish Audio API
78
+ FISH_AUDIO_API_KEY = os.environ.get('FISH_AUDIO_API_KEY', '')
79
 
80
  # ============================================================
81
+ # Functions
82
  # ============================================================
83
 
84
+ def translate_audio(audio, source_lang, target_lang, enable_voice_clone):
85
+ """Complete translation pipeline"""
86
  if audio is None:
87
+ return None, " Please record audio first"
88
 
89
  try:
90
+ # 1. STT
91
  if isinstance(audio, tuple):
92
  sample_rate, audio_data = audio
93
  audio_data = audio_data.astype(np.float32)
94
  if np.abs(audio_data).max() > 1.0:
95
  audio_data = audio_data / 32768.0
96
  else:
97
+ return None, "Invalid audio format"
98
 
99
+ src_code = STT_LANGS.get(source_lang, "eng")
100
 
101
  inputs = stt_processor(
102
  audios=audio_data,
 
111
  generate_speech=False
112
  )
113
 
114
+ transcript = stt_processor.decode(output_tokens[0].tolist(), skip_special_tokens=True)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
115
 
116
+ # 2. Translation
117
+ src_nllb = NLLB_LANGS.get(source_lang, "eng_Latn")
118
+ tgt_nllb = NLLB_LANGS.get(target_lang, "fra_Latn")
119
 
120
+ nllb_tokenizer.src_lang = src_nllb
121
+ inputs = nllb_tokenizer(transcript, return_tensors="pt", padding=True, truncation=True, max_length=512).to(device)
122
+
123
+ forced_bos_token_id = nllb_tokenizer.convert_tokens_to_ids(tgt_nllb)
124
 
125
  with torch.no_grad():
126
  outputs = nllb_model.generate(
 
130
  num_beams=5
131
  )
132
 
133
+ translation = nllb_tokenizer.decode(outputs[0], skip_special_tokens=True)
134
+
135
+ # 3. TTS with Fish Audio
136
+ tts_audio = None
137
+ if FISH_AUDIO_API_KEY:
138
+ tts_audio = generate_tts(translation, enable_voice_clone, audio if enable_voice_clone else None)
139
+
140
+ result_text = f"""
141
+ ### 🎤 {source_lang}
142
+ {transcript}
143
 
144
+ ### 🌍 {target_lang}
145
+ {translation}
146
+ """
147
+
148
+ return tts_audio, result_text
149
+
150
+ except Exception as e:
151
+ return None, f"❌ Error: {str(e)}"
152
 
153
+ def generate_tts(text, clone_voice=False, reference_audio=None):
154
+ """Generate TTS using Fish Audio"""
155
+ if not FISH_AUDIO_API_KEY:
156
+ return None
 
 
 
157
 
158
  try:
159
+ headers = {'Authorization': f'Bearer {FISH_AUDIO_API_KEY}'}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
160
 
161
+ if clone_voice and reference_audio:
162
+ # Voice cloning
163
+ import tempfile
164
+ import scipy.io.wavfile as wavfile
165
+
166
+ with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as f:
167
+ wavfile.write(f.name, reference_audio[0], reference_audio[1])
168
+ audio_path = f.name
169
+
170
+ with open(audio_path, 'rb') as f:
171
+ files = {'reference_audio': ('ref.wav', f.read(), 'audio/wav')}
172
+
173
+ data = {
174
+ 'text': text,
175
+ 'format': 'mp3',
176
+ 'mp3_bitrate': '192',
177
+ 'latency': 'balanced',
178
+ 'normalize': 'true',
179
+ }
180
+
181
+ response = requests.post(
182
+ 'https://api.fish.audio/v1/tts',
183
+ headers=headers,
184
+ files=files,
185
+ data=data,
186
+ timeout=120
187
+ )
188
+
189
+ os.remove(audio_path)
190
+ else:
191
+ # Standard TTS
192
+ payload = {
193
+ 'text': text,
194
+ 'format': 'mp3',
195
+ 'mp3_bitrate': 192,
196
+ }
197
+
198
+ response = requests.post(
199
+ 'https://api.fish.audio/v1/tts',
200
+ headers=headers,
201
+ json=payload,
202
+ timeout=60
203
  )
204
 
205
+ if response.status_code == 200:
206
+ import tempfile
207
+ with tempfile.NamedTemporaryFile(suffix='.mp3', delete=False) as f:
208
+ f.write(response.content)
209
+ return f.name
 
 
210
 
211
+ return None
212
+ except:
213
+ return None
214
 
215
  # ============================================================
216
  # Gradio Interface
217
  # ============================================================
218
 
219
+ with gr.Blocks(theme=gr.themes.Soft(), title="Instant Translat") as demo:
220
+ gr.Markdown("""
221
+ # 🌍 Instant Translat - AI Voice Translation
222
+ **Real-time voice translation powered by Meta AI**
223
+
224
+ - 🎤 **STT**: SeamlessM4T v2 Large (101 languages)
225
+ - 🌍 **Translation**: NLLB-200 (200 languages + Darija)
226
+ - 🔊 **TTS**: Fish Audio S1 (Natural voice)
227
+ - 🎭 **Voice Cloning**: Your voice in any language
228
+ """)
229
+
230
+ with gr.Row():
231
+ with gr.Column(scale=1):
232
+ audio_input = gr.Audio(
233
+ label="🎤 Record Your Voice",
234
+ type="numpy",
235
+ sources=["microphone"]
236
+ )
237
+
238
+ source_lang = gr.Dropdown(
239
+ choices=list(NLLB_LANGS.keys()),
240
+ value="🇲🇦 Moroccan Arabic (Darija)",
241
+ label="🗣️ Source Language"
242
+ )
243
+
244
+ target_lang = gr.Dropdown(
245
+ choices=list(NLLB_LANGS.keys()),
246
+ value="🇬🇧 English",
247
+ label="🎯 Target Language"
248
+ )
249
+
250
+ voice_clone = gr.Checkbox(
251
+ label="🎭 Clone Voice (Use your voice for translation)",
252
+ value=True
253
+ )
254
+
255
+ translate_btn = gr.Button(
256
+ "🌍 Translate",
257
+ variant="primary",
258
+ size="lg"
259
+ )
260
+
261
+ with gr.Column(scale=1):
262
+ audio_output = gr.Audio(label="🔊 Translation Audio")
263
+ text_output = gr.Markdown(label="📝 Translation Text")
264
+
265
+ translate_btn.click(
266
+ translate_audio,
267
+ inputs=[audio_input, source_lang, target_lang, voice_clone],
268
+ outputs=[audio_output, text_output]
269
+ )
270
 
271
+ gr.Markdown("""
272
+ ## 🎯 How to Use
273
+ 1. **Select Languages**: Choose your source and target languages
274
+ 2. **Record**: Click the microphone and speak clearly
275
+ 3. **Translate**: Click the translate button
276
+ 4. **Listen**: Hear the translation in natural voice (or your cloned voice!)
277
 
278
+ ## 🌍 Supported Languages
279
+ - 🇲🇦 **Moroccan Darija** (Moroccan Arabic)
280
+ - 🇸🇦 Arabic (MSA)
281
+ - 🇫🇷 French
282
+ - 🇬🇧 English
283
+ - 🇪🇸 Spanish
284
+ - 🇩🇪 German
285
+ - And 190+ more languages!
286
 
287
+ ## 🔒 Privacy
288
+ - No data is stored
289
+ - Real-time processing
290
+ - Secure API calls
291
+ """)
 
 
 
 
 
292
 
293
+ if __name__ == "__main__":
294
+ demo.launch()