ducnguyen1978 commited on
Commit
3639861
·
verified ·
1 Parent(s): 9c98b62

Upload 2 files

Browse files
Files changed (2) hide show
  1. IFRAME_GUIDE.md +119 -0
  2. app.py +1626 -0
IFRAME_GUIDE.md ADDED
@@ -0,0 +1,119 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🎤 Voice Studio - iFrame Integration Guide
2
+
3
+ ## Vấn đề Microphone trong iFrame
4
+
5
+ Khi embed ứng dụng Voice Studio vào iframe, microphone có thể không hoạt động do chính sách bảo mật của trình duyệt.
6
+
7
+ ## ✅ Giải pháp cho Website
8
+
9
+ ### 1. Thêm Permissions Policy vào iframe tag:
10
+
11
+ ```html
12
+ <iframe
13
+ src="http://localhost:7864"
14
+ width="100%"
15
+ height="800"
16
+ allow="microphone *; camera *; display-capture *; autoplay *"
17
+ permissions-policy="microphone=*, camera=*, display-capture=*, autoplay=*"
18
+ sandbox="allow-same-origin allow-scripts allow-forms allow-popups allow-modals allow-presentation"
19
+ frameborder="0">
20
+ </iframe>
21
+ ```
22
+
23
+ ### 2. Cấu hình HTTPS (Khuyến nghị):
24
+
25
+ Microphone chỉ hoạt động tốt trên HTTPS. Nếu có thể, hãy deploy ứng dụng qua HTTPS.
26
+
27
+ ### 3. Alternative HTML cho iframe:
28
+
29
+ ```html
30
+ <!DOCTYPE html>
31
+ <html>
32
+ <head>
33
+ <meta charset="UTF-8">
34
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
35
+ <title>Voice Studio Demo</title>
36
+ </head>
37
+ <body style="margin: 0; padding: 0; overflow: hidden;">
38
+ <iframe
39
+ id="voice-studio"
40
+ src="http://localhost:7864"
41
+ width="100%"
42
+ height="100vh"
43
+ allow="microphone *; camera *; display-capture *; autoplay *"
44
+ permissions-policy="microphone=*, camera=*, display-capture=*, autoplay=*"
45
+ sandbox="allow-same-origin allow-scripts allow-forms allow-popups allow-modals"
46
+ frameborder="0"
47
+ style="border: none;">
48
+ </iframe>
49
+
50
+ <script>
51
+ // Handle microphone permission fallback
52
+ window.addEventListener('message', function(event) {
53
+ if (event.data.type === 'microphone_error') {
54
+ const fallbackDiv = document.createElement('div');
55
+ fallbackDiv.innerHTML = `
56
+ <div style="
57
+ position: fixed;
58
+ top: 20px;
59
+ right: 20px;
60
+ background: #ff6b6b;
61
+ color: white;
62
+ padding: 15px;
63
+ border-radius: 8px;
64
+ z-index: 9999;
65
+ box-shadow: 0 4px 12px rgba(0,0,0,0.2);
66
+ ">
67
+ <strong>🎤 Microphone Blocked</strong><br>
68
+ <button onclick="window.open('${event.data.url}', '_blank')"
69
+ style="margin-top: 10px; padding: 8px 16px; background: white; color: #ff6b6b; border: none; border-radius: 4px; cursor: pointer;">
70
+ Open in New Window
71
+ </button>
72
+ </div>
73
+ `;
74
+ document.body.appendChild(fallbackDiv);
75
+ }
76
+ });
77
+ </script>
78
+ </body>
79
+ </html>
80
+ ```
81
+
82
+ ## 🔧 Server Configuration
83
+
84
+ Nếu bạn host ứng dụng trên server riêng, thêm headers sau:
85
+
86
+ ```python
87
+ # Trong Flask/Django
88
+ response.headers['Permissions-Policy'] = 'microphone=*, camera=*, display-capture=*'
89
+ response.headers['Feature-Policy'] = 'microphone \'self\' *; camera \'self\' *'
90
+ response.headers['X-Frame-Options'] = 'ALLOWALL'
91
+
92
+ # Hoặc trong nginx.conf
93
+ add_header Permissions-Policy "microphone=*, camera=*, display-capture=*";
94
+ add_header Feature-Policy "microphone 'self' *; camera 'self' *";
95
+ add_header X-Frame-Options "ALLOWALL";
96
+ ```
97
+
98
+ ## 📱 Mobile Considerations
99
+
100
+ Trên mobile, microphone trong iframe có thể bị hạn chế nhiều hơn. Khuyến nghị:
101
+
102
+ 1. Hiển thị nút "Open in New Window" rõ ràng
103
+ 2. Detect mobile và hiển thị thông báo phù hợp
104
+ 3. Fallback to file upload nếu microphone không hoạt động
105
+
106
+ ## 🚀 Best Practices
107
+
108
+ 1. **HTTPS Required**: Deploy qua HTTPS để microphone hoạt động tốt nhất
109
+ 2. **User Feedback**: Luôn hiển thị thông báo rõ ràng khi microphone bị block
110
+ 3. **Fallback Options**: Cung cấp tùy chọn upload file thay thế
111
+ 4. **Test Cross-Browser**: Test trên Chrome, Firefox, Safari, Edge
112
+
113
+ ## 🔍 Debugging
114
+
115
+ Mở Developer Console để xem logs:
116
+ - `Running in iframe - requesting microphone permissions`
117
+ - `Microphone access granted/denied`
118
+
119
+ Nếu vẫn có vấn đề, users có thể click "🔗 Open in New Window" để sử dụng đầy đủ tính năng.
app.py ADDED
@@ -0,0 +1,1626 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ # -*- coding: utf-8 -*-
3
+
4
+ import os
5
+ import sys
6
+
7
+ # Set UTF-8 encoding for Windows
8
+ if sys.platform == 'win32':
9
+ import codecs
10
+ sys.stdout = codecs.getwriter('utf-8')(sys.stdout.detach())
11
+ sys.stderr = codecs.getwriter('utf-8')(sys.stderr.detach())
12
+
13
+ import numpy as np
14
+ import gradio as gr
15
+ import google.generativeai as genai
16
+ from gtts import gTTS, lang
17
+ import tempfile
18
+ import soundfile as sf
19
+ # Kokoro not used - removed for performance
20
+ import time
21
+ import base64
22
+ import edge_tts
23
+ import asyncio
24
+ import io
25
+
26
+ # Librosa not used - removed for performance
27
+
28
+ # Try to import python-docx for Word export
29
+ try:
30
+ from docx import Document
31
+ DOCX_AVAILABLE = True
32
+ except ImportError:
33
+ DOCX_AVAILABLE = False
34
+
35
+ # Configure Gemini API
36
+ GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")
37
+ if GEMINI_API_KEY:
38
+ genai.configure(api_key=GEMINI_API_KEY)
39
+
40
+ # Language configurations for Audio Translation (simplified)
41
+ GTTS_LANGUAGES = lang.tts_langs()
42
+ GTTS_LANGUAGES['ja'] = 'Japanese'
43
+
44
+ SUPPORTED_LANGUAGES = sorted(list(GTTS_LANGUAGES.values()))
45
+
46
+ # Voice mapping for Edge TTS - defined once for performance
47
+ VOICE_MAP = {
48
+ "🇻🇳 HoaiMy - Nữ Việt Chuẩn": "vi-VN-HoaiMyNeural",
49
+ "🇻🇳 NamMinh - Nam Việt Chuẩn": "vi-VN-NamMinhNeural",
50
+ "🇺🇸 Aria - Nữ Mỹ": "en-US-AriaNeural",
51
+ "🇺🇸 Guy - Nam Mỹ": "en-US-GuyNeural",
52
+ "🇬🇧 Sonia - Nữ Anh": "en-GB-SoniaNeural",
53
+ "🇬🇧 Ryan - Nam Anh": "en-GB-RyanNeural",
54
+ "🇩🇪 Katja - Deutsche Frau": "de-DE-KatjaNeural",
55
+ "🇩🇪 Conrad - Deutscher Mann": "de-DE-ConradNeural",
56
+ "🇫🇷 Denise - Française": "fr-FR-DeniseNeural",
57
+ "🇫🇷 Henri - Français": "fr-FR-HenriNeural",
58
+ "🇪🇸 Elvira - Española": "es-ES-ElviraNeural",
59
+ "🇪🇸 Alvaro - Español": "es-ES-AlvaroNeural",
60
+ "🇮🇹 Elsa - Italiana": "it-IT-ElsaNeural",
61
+ "🇮🇹 Diego - Italiano": "it-IT-DiegoNeural",
62
+ "🇯🇵 Nanami - 日本女性": "ja-JP-NanamiNeural",
63
+ "🇯🇵 Keita - 日本男性": "ja-JP-KeitaNeural",
64
+ "🇰🇷 SunHi - 한국 여성": "ko-KR-SunHiNeural",
65
+ "🇰🇷 BongJin - 한국 남성": "ko-KR-BongJinNeural",
66
+ "🇨🇳 Xiaoxiao - 中文女声": "zh-CN-XiaoxiaoNeural",
67
+ "🇨🇳 Yunxi - 中文男声": "zh-CN-YunxiNeural",
68
+ "🇷🇺 Svetlana - Русская": "ru-RU-SvetlanaNeural",
69
+ "🇷🇺 Dmitry - Русский": "ru-RU-DmitryNeural",
70
+ "🇵🇹 Francisca - Portuguesa": "pt-BR-FranciscaNeural",
71
+ "🇵🇹 Antonio - Português": "pt-BR-AntonioNeural",
72
+ "🇸🇦 Zariyah - عربية": "ar-SA-ZariyahNeural",
73
+ "🇸🇦 Hamed - عربي": "ar-SA-HamedNeural"
74
+ }
75
+
76
+ def detect_language(text):
77
+ """Detect language of input text"""
78
+ if not text.strip():
79
+ return "unknown"
80
+
81
+ text_lower = text.lower()
82
+
83
+ # Vietnamese detection
84
+ vietnamese_chars = 'àáạảãâầấậẩẫăằắặẳẵèéẹẻẽêềếệểễìíịỉĩòóọỏõôồốộổỗơờớợởỡùúụủũưừứựửữỳýỵỷỹđ'
85
+ if any(char in text for char in vietnamese_chars):
86
+ return "vietnamese"
87
+
88
+ # German detection
89
+ german_words = ['der', 'die', 'das', 'und', 'ist', 'ich', 'bin', 'haben', 'sein', 'werden']
90
+ german_chars = 'äöüß'
91
+ if any(word in text_lower for word in german_words) or any(char in text for char in german_chars):
92
+ return "german"
93
+
94
+ # English detection
95
+ english_words = ['the', 'and', 'is', 'are', 'have', 'has', 'will', 'would', 'can', 'could']
96
+ if any(word in text_lower for word in english_words):
97
+ return "english"
98
+
99
+ return "english"
100
+
101
+ async def generate_speech(text, voice_name, rate):
102
+ """Generate speech using Edge TTS"""
103
+ communicate = edge_tts.Communicate(text, voice_name, rate=f"{rate:+.0%}")
104
+
105
+ # Create in-memory buffer
106
+ audio_buffer = io.BytesIO()
107
+
108
+ async for chunk in communicate.stream():
109
+ if chunk["type"] == "audio":
110
+ audio_buffer.write(chunk["data"])
111
+
112
+ audio_buffer.seek(0)
113
+ return audio_buffer.getvalue()
114
+
115
+ def create_text_file(content, file_format="txt", filename_prefix="translated_text"):
116
+ """
117
+ Create a downloadable text file from content in TXT or DOCX format
118
+ """
119
+ if not content or content.startswith("Lỗi:"):
120
+ return None
121
+
122
+ try:
123
+ if file_format.lower() == "docx" and DOCX_AVAILABLE:
124
+ # Create Word document
125
+ fd, temp_file_path = tempfile.mkstemp(suffix=".docx", prefix=f"{filename_prefix}_")
126
+ os.close(fd)
127
+
128
+ doc = Document()
129
+ doc.add_heading('Nội dung đã dịch', 0)
130
+ doc.add_paragraph(content)
131
+ doc.save(temp_file_path)
132
+
133
+ return temp_file_path
134
+ else:
135
+ # Create TXT file (default)
136
+ fd, temp_file_path = tempfile.mkstemp(suffix=".txt", prefix=f"{filename_prefix}_")
137
+ os.close(fd)
138
+
139
+ with open(temp_file_path, 'w', encoding='utf-8') as f:
140
+ f.write(content)
141
+
142
+ return temp_file_path
143
+ except Exception as e:
144
+ return None
145
+
146
+ def create_audio_voice_studio(text, voice_selection, speed):
147
+ """Voice Studio functionality"""
148
+ if not text.strip():
149
+ return "❌ Vui lòng nhập văn bản / Please enter text / Bitte Text eingeben"
150
+
151
+ try:
152
+ # Use global VOICE_MAP for performance (avoiding recreation on each call)
153
+ voice_name = VOICE_MAP.get(voice_selection, "vi-VN-HoaiMyNeural")
154
+ text_limited = text[:1000] if len(text) > 1000 else text
155
+
156
+ # Convert speed (0.5-2.0) to rate percentage (-50% to +100%)
157
+ rate_percent = (speed - 1.0)
158
+
159
+ # Generate speech using Edge TTS
160
+ audio_data = asyncio.run(generate_speech(text_limited, voice_name, rate_percent))
161
+
162
+ # Convert to base64
163
+ audio_base64 = base64.b64encode(audio_data).decode('utf-8')
164
+
165
+ timestamp = int(time.time())
166
+ filename = f"voice_{voice_name}_{speed}x_{timestamp}.mp3"
167
+
168
+ # Detect language
169
+ detected_lang = detect_language(text_limited)
170
+
171
+ # Mobile-optimized HTML player
172
+ html_player = f'''
173
+ <div style="
174
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
175
+ border-radius: 20px;
176
+ padding: 20px;
177
+ margin: 10px 0;
178
+ box-shadow: 0 8px 32px rgba(0,0,0,0.2);
179
+ color: white;
180
+ text-align: center;
181
+ ">
182
+ <div style="margin-bottom: 20px;">
183
+ <h3 style="color: #fff; margin: 0 0 15px 0; font-size: 1.3em; text-shadow: 1px 1px 2px rgba(0,0,0,0.3);">
184
+ 🎵 Âm thanh hoàn thành!
185
+ </h3>
186
+ <div style="
187
+ background: rgba(255,255,255,0.2);
188
+ border-radius: 12px;
189
+ padding: 12px;
190
+ font-size: 0.9em;
191
+ line-height: 1.5;
192
+ backdrop-filter: blur(10px);
193
+ ">
194
+ <div><strong>🎭 Giọng:</strong> {voice_selection}</div>
195
+ <div><strong>⚡ Tốc độ:</strong> {speed:.1f}x | <strong>🌍 Ngôn ngữ:</strong> {detected_lang.title()}</div>
196
+ <div><strong>📝 Độ dài:</strong> {len(text_limited)} ký tự</div>
197
+ </div>
198
+ </div>
199
+
200
+ <audio controls style="
201
+ width: 100%;
202
+ max-width: 100%;
203
+ height: 50px;
204
+ margin: 20px 0;
205
+ border-radius: 25px;
206
+ background: rgba(255,255,255,0.95);
207
+ box-shadow: 0 4px 15px rgba(0,0,0,0.2);
208
+ ">
209
+ <source src="data:audio/mpeg;base64,{audio_base64}" type="audio/mpeg">
210
+ Trình duyệt không hỗ trợ audio.
211
+ </audio>
212
+
213
+ <div style="
214
+ display: flex;
215
+ justify-content: center;
216
+ margin-top: 20px;
217
+ ">
218
+ <a href="data:audio/mpeg;base64,{audio_base64}" download="{filename}"
219
+ style="
220
+ background: linear-gradient(45deg, #28a745, #20c997);
221
+ color: white;
222
+ padding: 15px 30px;
223
+ text-decoration: none;
224
+ border-radius: 25px;
225
+ font-weight: 700;
226
+ font-size: 1.1em;
227
+ display: flex;
228
+ align-items: center;
229
+ justify-content: center;
230
+ box-shadow: 0 4px 15px rgba(40,167,69,0.3);
231
+ transition: all 0.3s ease;
232
+ min-height: 48px;
233
+ min-width: 200px;
234
+ "
235
+ ontouchstart=""
236
+ onmouseover="this.style.transform='translateY(-2px)'; this.style.boxShadow='0 6px 20px rgba(40,167,69,0.4)'"
237
+ onmouseout="this.style.transform='translateY(0)'; this.style.boxShadow='0 4px 15px rgba(40,167,69,0.3)'">
238
+ 📥 TẢI XUỐNG MP3
239
+ </a>
240
+ </div>
241
+ </div>
242
+ '''
243
+
244
+ return html_player
245
+
246
+ except Exception as e:
247
+ return f"❌ Error: {str(e)}"
248
+
249
+ # Language mapping for voices - defined once for performance
250
+ VOICE_TO_LANGUAGE = {
251
+ # Vietnamese
252
+ "🇻🇳 HoaiMy - Nữ Việt Chuẩn": "Vietnamese",
253
+ "🇻🇳 NamMinh - Nam Việt Chuẩn": "Vietnamese",
254
+ # English
255
+ "🇺🇸 Aria - Nữ Mỹ": "English",
256
+ "🇺🇸 Guy - Nam Mỹ": "English",
257
+ "🇬🇧 Sonia - Nữ Anh": "English",
258
+ "🇬🇧 Ryan - Nam Anh": "English",
259
+ # German
260
+ "🇩🇪 Katja - Deutsche Frau": "German",
261
+ "🇩🇪 Conrad - Deutscher Mann": "German",
262
+ # French
263
+ "🇫🇷 Denise - Française": "French",
264
+ "🇫🇷 Henri - Français": "French",
265
+ # Spanish
266
+ "🇪🇸 Elvira - Española": "Spanish",
267
+ "🇪🇸 Alvaro - Español": "Spanish",
268
+ # Italian
269
+ "🇮🇹 Elsa - Italiana": "Italian",
270
+ "🇮🇹 Diego - Italiano": "Italian",
271
+ # Japanese
272
+ "🇯🇵 Nanami - 日本女性": "Japanese",
273
+ "🇯🇵 Keita - 日本男性": "Japanese",
274
+ # Korean
275
+ "🇰🇷 SunHi - 한국 여성": "Korean",
276
+ "🇰🇷 BongJin - 한국 남성": "Korean",
277
+ # Chinese
278
+ "🇨🇳 Xiaoxiao - 中文女声": "Chinese",
279
+ "🇨🇳 Yunxi - 中文男声": "Chinese",
280
+ # Russian
281
+ "🇷🇺 Svetlana - Русская": "Russian",
282
+ "🇷🇺 Dmitry - Русский": "Russian",
283
+ # Portuguese
284
+ "🇵🇹 Francisca - Portuguesa": "Portuguese",
285
+ "🇵🇹 Antonio - Português": "Portuguese",
286
+ # Arabic
287
+ "🇸🇦 Zariyah - عربية": "Arabic",
288
+ "🇸🇦 Hamed - عربي": "Arabic"
289
+ }
290
+
291
+ def get_target_language_from_voice(voice_selection):
292
+ """Map voice selection to target language for translation"""
293
+ return VOICE_TO_LANGUAGE.get(voice_selection, "Vietnamese")
294
+
295
+ def translate_text_with_gemini(text, target_language):
296
+ """Translate text using Gemini API"""
297
+ try:
298
+ if not GEMINI_API_KEY:
299
+ return f"Lỗi: Cần cấu hình GEMINI_API_KEY"
300
+
301
+ if not text.strip():
302
+ return ""
303
+
304
+ model = genai.GenerativeModel("gemini-2.0-flash")
305
+
306
+ prompt = f"""Translate the following text to {target_language}. Return ONLY the translated text, nothing else:
307
+
308
+ {text}"""
309
+
310
+ response = model.generate_content(prompt)
311
+ translated_text = response.text.strip()
312
+
313
+ # Clean up any unwanted text that might be included
314
+ if translated_text.lower().startswith("translation:"):
315
+ translated_text = translated_text[12:].strip()
316
+ if translated_text.lower().startswith("here is"):
317
+ lines = translated_text.split('\n')
318
+ if len(lines) > 1:
319
+ translated_text = '\n'.join(lines[1:]).strip()
320
+
321
+ return translated_text
322
+
323
+ except Exception as e:
324
+ return f"Lỗi dịch thuật: {str(e)}"
325
+
326
+ def translate_audio(audio_file, target_country, voice_selection, text_format="txt"):
327
+ """
328
+ Transcribe, translate and synthesize audio to target language with Voice Studio integration
329
+ """
330
+ try:
331
+ if not GEMINI_API_KEY:
332
+ return "Lỗi: Cần cấu hình GEMINI_API_KEY", "Không xác định", "", target_country, None, "", "", None
333
+
334
+ if audio_file is None:
335
+ return "Lỗi: Vui lòng tải lên file audio", "Không xác định", "", target_country, None, "", "", None
336
+
337
+ # Get target language from voice selection
338
+ target_language = get_target_language_from_voice(voice_selection)
339
+
340
+ # Transcribe audio using Gemini
341
+ model = genai.GenerativeModel("gemini-2.0-flash")
342
+
343
+ # Read audio file
344
+ with open(audio_file, 'rb') as f:
345
+ audio_data = f.read()
346
+
347
+ # Create audio blob
348
+ audio_blob = {
349
+ 'mime_type': 'audio/wav',
350
+ 'data': audio_data
351
+ }
352
+
353
+ # Single API call for transcription and translation (optimized for speed)
354
+ combined_prompt = f"""You are a professional transcriber and translator. Process this audio in one step:
355
+
356
+ 1. Transcribe the audio accurately in its original language
357
+ 2. Identify the source language
358
+ 3. Translate to {target_language} preserving meaning and cultural context
359
+
360
+ Format your response exactly as:
361
+ LANGUAGE: [detected language]
362
+ TRANSCRIPT: [original transcription]
363
+ TRANSLATION: [translation to {target_language}]"""
364
+
365
+ response = model.generate_content([combined_prompt, audio_blob])
366
+ full_response = response.text.strip()
367
+
368
+ # Parse combined response
369
+ try:
370
+ if "LANGUAGE:" in full_response and "TRANSCRIPT:" in full_response and "TRANSLATION:" in full_response:
371
+ lines = full_response.split('\n')
372
+ detected_lang = ""
373
+ transcription = ""
374
+ translated_text = ""
375
+
376
+ for line in lines:
377
+ if line.startswith("LANGUAGE:"):
378
+ detected_lang = line.replace("LANGUAGE:", "").strip()
379
+ elif line.startswith("TRANSCRIPT:"):
380
+ transcription = line.replace("TRANSCRIPT:", "").strip()
381
+ elif line.startswith("TRANSLATION:"):
382
+ translated_text = line.replace("TRANSLATION:", "").strip()
383
+ else:
384
+ # Fallback parsing
385
+ detected_lang = "Không xác định"
386
+ transcription = full_response.split("TRANSCRIPT:")[-1].split("TRANSLATION:")[0].strip() if "TRANSCRIPT:" in full_response else full_response
387
+ translated_text = full_response.split("TRANSLATION:")[-1].strip() if "TRANSLATION:" in full_response else transcription
388
+ except:
389
+ # Emergency fallback
390
+ detected_lang = "Không xác định"
391
+ transcription = full_response
392
+ translated_text = full_response
393
+
394
+ # Generate audio using Edge TTS (use global VOICE_MAP for performance)
395
+ edge_voice = VOICE_MAP.get(voice_selection, "vi-VN-HoaiMyNeural")
396
+ audio_data = asyncio.run(generate_speech(translated_text, edge_voice, 0.0))
397
+
398
+ # Save audio file
399
+ fd, temp_output_path = tempfile.mkstemp(suffix=".wav", prefix="translated_audio_")
400
+ os.close(fd)
401
+
402
+ # Write raw audio data to temporary file
403
+ with open(temp_output_path, 'wb') as f:
404
+ f.write(audio_data)
405
+
406
+ # Create text file for download
407
+ text_file_path = create_text_file(translated_text, text_format, "translated_content")
408
+
409
+ return transcription, detected_lang, translated_text, target_language, temp_output_path, transcription, translated_text, text_file_path
410
+
411
+ except Exception as e:
412
+ # Get target language for error response
413
+ target_language = get_target_language_from_voice(voice_selection) if 'voice_selection' in locals() else "Vietnamese"
414
+ return f"Lỗi: {str(e)}", "Lỗi", "", target_language, None, "", "", None
415
+
416
+ # Voice choices organized by country - ONLY OFFICIAL VOICES
417
+ voice_choices_by_country = {
418
+ "🇻🇳 Việt Nam": [
419
+ "🇻🇳 HoaiMy - Nữ Việt Chuẩn",
420
+ "🇻🇳 NamMinh - Nam Việt Chuẩn"
421
+ ],
422
+ "🇺🇸 Hoa Kỳ": [
423
+ "🇺🇸 Aria - Nữ Mỹ",
424
+ "🇺🇸 Guy - Nam Mỹ"
425
+ ],
426
+ "🇬🇧 Anh": [
427
+ "🇬🇧 Sonia - Nữ Anh",
428
+ "🇬🇧 Ryan - Nam Anh"
429
+ ],
430
+ "🇩🇪 Đức": [
431
+ "🇩🇪 Katja - Deutsche Frau",
432
+ "🇩🇪 Conrad - Deutscher Mann"
433
+ ],
434
+ "🇫🇷 Pháp": [
435
+ "🇫🇷 Denise - Française",
436
+ "🇫🇷 Henri - Français"
437
+ ],
438
+ "🇪🇸 Tây Ban Nha": [
439
+ "🇪🇸 Elvira - Española",
440
+ "🇪🇸 Alvaro - Español"
441
+ ],
442
+ "🇮🇹 Ý": [
443
+ "🇮🇹 Elsa - Italiana",
444
+ "🇮🇹 Diego - Italiano"
445
+ ],
446
+ "🇯🇵 Nhật Bản": [
447
+ "🇯🇵 Nanami - 日本女性",
448
+ "🇯🇵 Keita - 日本男性"
449
+ ],
450
+ "🇰🇷 Hàn Quốc": [
451
+ "🇰🇷 SunHi - 한국 여성",
452
+ "🇰🇷 BongJin - 한국 남성"
453
+ ],
454
+ "🇨🇳 Trung Quốc": [
455
+ "🇨🇳 Xiaoxiao - 中文女声",
456
+ "🇨🇳 Yunxi - 中文男声"
457
+ ],
458
+ "🇷🇺 Nga": [
459
+ "🇷🇺 Svetlana - Русская",
460
+ "🇷🇺 Dmitry - Русский"
461
+ ],
462
+ "🇵🇹 Bồ Đào Nha": [
463
+ "🇵🇹 Francisca - Portuguesa",
464
+ "🇵🇹 Antonio - Português"
465
+ ],
466
+ "🇸🇦 Ả Rập": [
467
+ "🇸🇦 Zariyah - عربية",
468
+ "🇸🇦 Hamed - عربي"
469
+ ]
470
+ }
471
+
472
+ def update_voices(country):
473
+ """Update voice choices based on selected country"""
474
+ if country in voice_choices_by_country:
475
+ voices = voice_choices_by_country[country]
476
+ return gr.Dropdown(choices=voices, value=voices[0])
477
+ else:
478
+ # Default to Vietnamese voices
479
+ default_voices = voice_choices_by_country["🇻🇳 Việt Nam"]
480
+ return gr.Dropdown(choices=default_voices, value=default_voices[0])
481
+
482
+ # Lightweight CSS - optimized for performance
483
+ css = """
484
+ * {
485
+ font-family: system-ui, -apple-system, 'Segoe UI', Arial, sans-serif;
486
+ }
487
+
488
+ .gradio-container {
489
+ max-width: 1200px;
490
+ margin: 0 auto;
491
+ position: relative;
492
+ }
493
+
494
+ /* Critical fix for dropdown interaction */
495
+ .gradio-container * {
496
+ pointer-events: auto;
497
+ }
498
+
499
+ /* Hide Gradio footer */
500
+ .footer {
501
+ display: none !important;
502
+ }
503
+
504
+ /* Custom footer to cover Gradio attribution */
505
+ .custom-footer {
506
+ position: fixed;
507
+ bottom: 0;
508
+ left: 0;
509
+ right: 0;
510
+ background: linear-gradient(135deg, #4A90E2 0%, #2E86AB 70%, #FF8A65 85%, #FF6B9D 100%);
511
+ color: white;
512
+ padding: 15px;
513
+ text-align: center;
514
+ font-weight: bold;
515
+ z-index: 1000;
516
+ box-shadow: 0 -2px 10px rgba(0,0,0,0.1);
517
+ }
518
+
519
+ /* Add padding to body to account for fixed footer */
520
+ body {
521
+ padding-bottom: 60px;
522
+ }
523
+
524
+ /* Mobile-first responsive design */
525
+ .input-card {
526
+ background: rgba(255,255,255,0.95);
527
+ border-radius: 16px;
528
+ padding: 16px;
529
+ margin: 10px 0;
530
+ box-shadow: 0 4px 20px rgba(0,0,0,0.1);
531
+ backdrop-filter: blur(10px);
532
+ }
533
+
534
+ .output-area {
535
+ background: rgba(255,255,255,0.95);
536
+ border-radius: 16px;
537
+ padding: 16px;
538
+ margin: 15px 0;
539
+ min-height: 200px;
540
+ box-shadow: 0 4px 20px rgba(0,0,0,0.1);
541
+ }
542
+
543
+ .examples-section {
544
+ background: rgba(255,255,255,0.9);
545
+ border-radius: 16px;
546
+ padding: 16px;
547
+ margin: 20px 0;
548
+ }
549
+
550
+ .main-header {
551
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
552
+ color: white;
553
+ padding: 20px;
554
+ border-radius: 10px;
555
+ margin-bottom: 20px;
556
+ text-align: center;
557
+ }
558
+
559
+ .feature-box {
560
+ background: #f8f9fa;
561
+ padding: 15px;
562
+ border-radius: 8px;
563
+ margin: 10px 0;
564
+ border-left: 4px solid #667eea;
565
+ }
566
+
567
+ .status-indicator {
568
+ display: inline-block;
569
+ padding: 5px 10px;
570
+ border-radius: 15px;
571
+ font-size: 12px;
572
+ font-weight: bold;
573
+ margin: 5px;
574
+ }
575
+
576
+ .status-success {
577
+ background-color: #d4edda;
578
+ color: #155724;
579
+ }
580
+
581
+ .status-processing {
582
+ background-color: #fff3cd;
583
+ color: #856404;
584
+ }
585
+
586
+ .comparison-section {
587
+ border: 1px solid #e0e0e0;
588
+ border-radius: 8px;
589
+ padding: 15px;
590
+ margin: 10px 0;
591
+ background: #fafafa;
592
+ }
593
+
594
+ .language-label {
595
+ font-weight: bold;
596
+ color: #667eea;
597
+ padding: 5px 10px;
598
+ background: #f0f2ff;
599
+ border-radius: 15px;
600
+ display: inline-block;
601
+ margin-bottom: 10px;
602
+ font-size: 14px;
603
+ }
604
+
605
+ .content-compare {
606
+ background: white;
607
+ border: 1px solid #ddd;
608
+ border-radius: 6px;
609
+ padding: 12px;
610
+ min-height: 120px;
611
+ font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
612
+ line-height: 1.5;
613
+ }
614
+
615
+ /* Reset any problematic dropdown styles */
616
+ .gradio-container * {
617
+ pointer-events: auto;
618
+ }
619
+
620
+ /* Remove any potential blocking overlays */
621
+ .gradio-container::before,
622
+ .gradio-container::after {
623
+ display: none;
624
+ }
625
+
626
+ /* Ensure all interactive elements work */
627
+ button, select, input, textarea, .gr-dropdown {
628
+ pointer-events: auto !important;
629
+ position: relative !important;
630
+ }
631
+
632
+ /* Simple dropdown fix without complex selectors */
633
+ [class*="dropdown"] {
634
+ position: relative !important;
635
+ z-index: 999 !important;
636
+ }
637
+
638
+ [class*="dropdown"] * {
639
+ pointer-events: auto !important;
640
+ }
641
+
642
+ /* Make sure no overlay blocks clicks */
643
+ .gradio-container .gr-form {
644
+ position: relative;
645
+ z-index: 1;
646
+ }
647
+
648
+ .gradio-container .gr-block {
649
+ position: relative;
650
+ z-index: 1;
651
+ }
652
+
653
+ .mobile-button {
654
+ width: 100% !important;
655
+ padding: 15px !important;
656
+ font-size: 1.1em !important;
657
+ margin: 20px 0 !important;
658
+ border-radius: 12px !important;
659
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%) !important;
660
+ border: none !important;
661
+ color: white !important;
662
+ font-weight: bold !important;
663
+ box-shadow: 0 4px 15px rgba(102, 126, 234, 0.3) !important;
664
+ transition: all 0.3s ease !important;
665
+ cursor: pointer !important;
666
+ position: relative !important;
667
+ overflow: hidden !important;
668
+ }
669
+
670
+ .mobile-button:hover {
671
+ transform: translateY(-2px) !important;
672
+ box-shadow: 0 8px 25px rgba(102, 126, 234, 0.4) !important;
673
+ background: linear-gradient(135deg, #5a6fd8 0%, #6b4190 100%) !important;
674
+ }
675
+
676
+ .mobile-button:active {
677
+ transform: translateY(0px) !important;
678
+ box-shadow: 0 2px 10px rgba(102, 126, 234, 0.3) !important;
679
+ }
680
+
681
+ /* Ripple effect for button */
682
+ .mobile-button::before {
683
+ content: '';
684
+ position: absolute;
685
+ top: 50%;
686
+ left: 50%;
687
+ width: 0;
688
+ height: 0;
689
+ border-radius: 50%;
690
+ background: rgba(255, 255, 255, 0.3);
691
+ transform: translate(-50%, -50%);
692
+ transition: width 0.6s, height 0.6s;
693
+ }
694
+
695
+ .mobile-button:active::before {
696
+ width: 300px;
697
+ height: 300px;
698
+ }
699
+
700
+ /* Loading spinner animation */
701
+ @keyframes spin {
702
+ 0% { transform: rotate(0deg); }
703
+ 100% { transform: rotate(360deg); }
704
+ }
705
+
706
+ .loading-spinner {
707
+ display: inline-block;
708
+ width: 20px;
709
+ height: 20px;
710
+ border: 3px solid rgba(255,255,255,0.3);
711
+ border-radius: 50%;
712
+ border-top-color: white;
713
+ animation: spin 1s ease-in-out infinite;
714
+ margin-right: 10px;
715
+ }
716
+
717
+ /* Button pulse effect when processing */
718
+ @keyframes pulse {
719
+ 0% {
720
+ box-shadow: 0 4px 15px rgba(102, 126, 234, 0.3);
721
+ }
722
+ 50% {
723
+ box-shadow: 0 8px 25px rgba(102, 126, 234, 0.6);
724
+ }
725
+ 100% {
726
+ box-shadow: 0 4px 15px rgba(102, 126, 234, 0.3);
727
+ }
728
+ }
729
+
730
+ .button-processing {
731
+ animation: pulse 2s ease-in-out infinite;
732
+ background: linear-gradient(135deg, #FF8E53 0%, #FF6B6B 100%) !important;
733
+ }
734
+
735
+ .mobile-textbox textarea {
736
+ border-radius: 10px !important;
737
+ border: 2px solid #e0e0e0 !important;
738
+ padding: 12px !important;
739
+ font-size: 1em !important;
740
+ line-height: 1.5 !important;
741
+ }
742
+
743
+ .mobile-compare textarea {
744
+ border-radius: 8px !important;
745
+ border: 1px solid #ddd !important;
746
+ padding: 10px !important;
747
+ background: #fafafa !important;
748
+ font-size: 0.95em !important;
749
+ }
750
+
751
+ .mobile-audio {
752
+ margin: 10px 0 !important;
753
+ border-radius: 10px !important;
754
+ }
755
+
756
+ .mobile-file {
757
+ margin: 10px 0 !important;
758
+ border-radius: 10px !important;
759
+ }
760
+
761
+ /* Mobile responsive breakpoints */
762
+ @media (max-width: 768px) {
763
+ .gradio-container {
764
+ padding: 10px !important;
765
+ }
766
+
767
+ .input-card {
768
+ padding: 12px !important;
769
+ margin: 8px 0 !important;
770
+ }
771
+
772
+ .output-area {
773
+ padding: 12px !important;
774
+ margin: 10px 0 !important;
775
+ }
776
+
777
+ .examples-section {
778
+ padding: 12px !important;
779
+ }
780
+
781
+ .main-header h2 {
782
+ font-size: 1.5em !important;
783
+ }
784
+
785
+ .main-header p {
786
+ font-size: 1em !important;
787
+ }
788
+
789
+ /* Mobile layout adjustments - less aggressive */
790
+ .gr-row {
791
+ flex-direction: column;
792
+ }
793
+
794
+ .gr-column {
795
+ width: 100%;
796
+ margin-bottom: 15px;
797
+ }
798
+ }
799
+
800
+ @media (max-width: 480px) {
801
+ .gradio-container {
802
+ padding: 5px !important;
803
+ }
804
+
805
+ .input-card {
806
+ padding: 10px !important;
807
+ margin: 5px 0 !important;
808
+ }
809
+
810
+ .main-header {
811
+ padding: 15px !important;
812
+ }
813
+
814
+ .main-header h2 {
815
+ font-size: 1.3em !important;
816
+ }
817
+
818
+ .mobile-button {
819
+ padding: 12px !important;
820
+ font-size: 1em !important;
821
+ }
822
+ }
823
+
824
+ /* JavaScript for button interactions */
825
+ """
826
+
827
+ # Add JavaScript for button effects
828
+ js_code = """
829
+ <script>
830
+ function addButtonEffects() {
831
+ // Find button by class since Gradio might change IDs
832
+ const buttons = document.querySelectorAll('.mobile-button');
833
+
834
+ buttons.forEach(button => {
835
+ // Remove existing listeners to avoid duplicates
836
+ button.removeEventListener('click', handleClick);
837
+
838
+ // Add enhanced click effect
839
+ button.addEventListener('click', handleClick);
840
+
841
+ // Add hover effects for better interaction
842
+ button.addEventListener('mouseenter', function() {
843
+ if (!this.disabled) {
844
+ this.style.transform = 'translateY(-2px) scale(1.02)';
845
+ }
846
+ });
847
+
848
+ button.addEventListener('mouseleave', function() {
849
+ if (!this.disabled) {
850
+ this.style.transform = 'translateY(0) scale(1)';
851
+ }
852
+ });
853
+ });
854
+ }
855
+
856
+ function handleClick(e) {
857
+ const button = e.target;
858
+
859
+ // Immediate visual feedback
860
+ button.style.transform = 'scale(0.98)';
861
+ button.style.transition = 'all 0.1s ease';
862
+
863
+ setTimeout(() => {
864
+ button.style.transform = 'scale(1)';
865
+ button.style.transition = 'all 0.3s ease';
866
+ }, 100);
867
+
868
+ // Add processing state
869
+ const originalText = button.innerHTML;
870
+ button.innerHTML = '<span class="loading-spinner"></span>⏳ ĐANG XỬ LÝ...';
871
+ button.classList.add('button-processing');
872
+ button.disabled = true;
873
+
874
+ // Monitor for completion and reset
875
+ let checkCount = 0;
876
+ const checkInterval = setInterval(() => {
877
+ checkCount++;
878
+
879
+ // Reset after 15 seconds max or if status changes
880
+ const statusElements = document.querySelectorAll('[style*="Hoàn thành"]');
881
+ if (statusElements.length > 0 || checkCount > 50) {
882
+ clearInterval(checkInterval);
883
+ button.innerHTML = originalText;
884
+ button.classList.remove('button-processing');
885
+ button.disabled = false;
886
+ button.style.transform = 'scale(1)';
887
+ }
888
+ }, 300);
889
+ }
890
+
891
+ // Initialize when DOM is ready
892
+ if (document.readyState === 'loading') {
893
+ document.addEventListener('DOMContentLoaded', addButtonEffects);
894
+ } else {
895
+ addButtonEffects();
896
+ }
897
+
898
+ // Re-initialize periodically for Gradio updates
899
+ setInterval(addButtonEffects, 2000);
900
+ </script>
901
+ """
902
+
903
+ # Create interface with tabs
904
+ with gr.Blocks(css=css, title="🎤 Voice Studio & Audio Translation") as demo:
905
+ # Header with iframe microphone permissions
906
+ gr.HTML("""
907
+ <meta charset="UTF-8">
908
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
909
+ <meta http-equiv="Permissions-Policy" content="microphone=*, camera=*, display-capture=*">
910
+ <meta http-equiv="Feature-Policy" content="microphone 'self' *; camera 'self' *">
911
+
912
+ <script>
913
+ // Request microphone permissions for iframe
914
+ if (window.location !== window.parent.location) {
915
+ // We're in an iframe
916
+ console.log('Running in iframe - requesting microphone permissions');
917
+
918
+ // Try to request microphone access
919
+ if (navigator.mediaDevices && navigator.mediaDevices.getUserMedia) {
920
+ navigator.mediaDevices.getUserMedia({ audio: true })
921
+ .then(function(stream) {
922
+ console.log('Microphone access granted');
923
+ // Stop the stream immediately, we just wanted to get permission
924
+ stream.getTracks().forEach(track => track.stop());
925
+ })
926
+ .catch(function(err) {
927
+ console.log('Microphone access denied:', err);
928
+ // Show user-friendly message
929
+ const message = document.createElement('div');
930
+ message.innerHTML = `
931
+ <div style="
932
+ background: #fff3cd;
933
+ color: #856404;
934
+ padding: 15px;
935
+ border-radius: 8px;
936
+ margin: 10px;
937
+ border: 1px solid #ffeaa7;
938
+ text-align: center;
939
+ ">
940
+ <strong>⚠️ Microphone Access Required</strong><br>
941
+ To use recording features, please allow microphone access or open this app in a new window.
942
+ <br><br>
943
+ <a href="${window.location.href}" target="_blank" style="
944
+ background: #667eea;
945
+ color: white;
946
+ padding: 8px 16px;
947
+ text-decoration: none;
948
+ border-radius: 6px;
949
+ display: inline-block;
950
+ margin-top: 10px;
951
+ ">🔗 Open in New Window</a>
952
+ </div>
953
+ `;
954
+ document.body.insertBefore(message, document.body.firstChild);
955
+ });
956
+ }
957
+ }
958
+ </script>
959
+
960
+ <div style="text-align: center; background: linear-gradient(135deg, #4A90E2 0%, #FF6B9D 100%); color: white; padding: 20px; border-radius: 10px; margin-bottom: 20px;">
961
+ <h1>🎤 Voice Studio & Audio Translation</h1>
962
+ <p>Chuyển văn bản thành giọng nói, dịch văn bản và dịch audio sang nhiều ngôn ngữ!</p>
963
+ <div style="margin-top: 10px; font-size: 14px; opacity: 0.9;">
964
+ ✨ Tính năng mới: Dịch văn bản trực tiếp trong Voice Studio
965
+ </div>
966
+ <div style="margin-top: 8px;">🧠 <strong>Digitized Brains</strong></div>
967
+ </div>
968
+ """)
969
+
970
+ with gr.Tabs():
971
+ # Voice Studio Tab
972
+ with gr.TabItem("🎤 Voice Studio"):
973
+ gr.HTML("""
974
+ <div style="display: flex; justify-content: center; gap: 15px; margin: 20px 0; flex-wrap: wrap;">
975
+ <div style="background: linear-gradient(135deg, #FF6B6B 0%, #FF8E53 100%); padding: 15px; border-radius: 10px; color: white; text-align: center; min-width: 150px;">
976
+ <h4>🇻🇳 Tiếng Việt</h4>
977
+ <p style="margin: 0; font-size: 12px;">2 giọng chuẩn</p>
978
+ <p style="margin: 0; font-size: 10px;">HoaiMy • NamMinh</p>
979
+ </div>
980
+ <div style="background: linear-gradient(135deg, #4ECDC4 0%, #44A08D 100%); padding: 15px; border-radius: 10px; color: white; text-align: center; min-width: 150px;">
981
+ <h4>🇺🇸🇬🇧 English</h4>
982
+ <p style="margin: 0; font-size: 12px;">4 giọng chuẩn</p>
983
+ <p style="margin: 0; font-size: 10px;">US • UK</p>
984
+ </div>
985
+ <div style="background: linear-gradient(135deg, #45B7D1 0%, #96C93D 100%); padding: 15px; border-radius: 10px; color: white; text-align: center; min-width: 150px;">
986
+ <h4>🌍 Đa ngôn ngữ</h4>
987
+ <p style="margin: 0; font-size: 12px;">20 giọng chuẩn</p>
988
+ <p style="margin: 0; font-size: 10px;">10 ngôn ngữ</p>
989
+ </div>
990
+ </div>
991
+ """)
992
+
993
+ gr.Markdown("### 📝 Nhập nội dung và chọn giọng nói")
994
+
995
+ with gr.Row():
996
+ text_input = gr.Textbox(
997
+ placeholder="Nhập văn bản cần chuyển thành giọng nói...",
998
+ lines=4,
999
+ label="Văn bản",
1000
+ scale=2
1001
+ )
1002
+
1003
+ with gr.Row():
1004
+ with gr.Column(scale=1):
1005
+ country_dropdown = gr.Dropdown(
1006
+ choices=list(voice_choices_by_country.keys()),
1007
+ value="🇻��� Việt Nam",
1008
+ label="🌍 Chọn quốc gia"
1009
+ )
1010
+
1011
+ with gr.Column(scale=1):
1012
+ voice_dropdown = gr.Dropdown(
1013
+ choices=voice_choices_by_country["🇻🇳 Việt Nam"],
1014
+ value="🇻🇳 HoaiMy - Nữ Việt Chuẩn",
1015
+ label="🎭 Chọn giọng nói"
1016
+ )
1017
+
1018
+ with gr.Row():
1019
+ speed_slider = gr.Slider(
1020
+ minimum=0.5,
1021
+ maximum=2.0,
1022
+ value=1.0,
1023
+ step=0.1,
1024
+ label="⚡ Tốc độ phát"
1025
+ )
1026
+
1027
+ # Translation feature
1028
+ with gr.Row():
1029
+ with gr.Column(scale=1):
1030
+ translate_checkbox = gr.Checkbox(
1031
+ label="🌍 Dịch văn bản trước khi tạo giọng nói",
1032
+ value=False
1033
+ )
1034
+ with gr.Column(scale=2):
1035
+ translate_btn = gr.Button("🔄 DỊCH VĂN BẢN", variant="secondary", size="lg", visible=False)
1036
+
1037
+ # Show translated text when translation is enabled
1038
+ translated_text_output = gr.Textbox(
1039
+ label="📝 Văn bản đã dịch",
1040
+ lines=3,
1041
+ interactive=True,
1042
+ visible=False,
1043
+ placeholder="Văn bản sau khi dịch sẽ hiển thị ở đây..."
1044
+ )
1045
+
1046
+ generate_btn = gr.Button("🎵 TẠO GIỌNG NÓI", variant="primary", size="lg")
1047
+
1048
+ gr.Markdown("### 🎧 Kết quả âm thanh")
1049
+ audio_output_vs = gr.HTML(
1050
+ value="<p style='text-align: center; color: #666; padding: 40px;'>Nhấn 'TẠO GIỌNG NÓI' để bắt đầu 🎤</p>"
1051
+ )
1052
+
1053
+ # Examples section
1054
+ gr.Markdown("### 📚 Ví dụ nhanh")
1055
+ with gr.Row():
1056
+ example_vn = gr.Button("🇻🇳 Tiếng Việt", size="sm")
1057
+ example_en = gr.Button("🇺🇸 English", size="sm")
1058
+ example_de = gr.Button("🇩🇪 Deutsch", size="sm")
1059
+ example_translate = gr.Button("🌍 Dịch thuật", size="sm")
1060
+
1061
+ # Example button functions
1062
+ def load_vn_example():
1063
+ return "Xin chào! Chào mừng bạn đến với studio giọng nói.", "🇻🇳 Việt Nam"
1064
+
1065
+ def load_en_example():
1066
+ return "Hello! Welcome to our voice studio.", "🇺🇸 Hoa Kỳ"
1067
+
1068
+ def load_de_example():
1069
+ return "Hallo! Willkommen in unserem Sprachstudio.", "🇩🇪 Đức"
1070
+
1071
+ def load_translate_example():
1072
+ return "Hello! This is an example text for translation.", "🇺🇸 Hoa Kỳ", True
1073
+
1074
+ # Translation functions
1075
+ def toggle_translation_ui(translate_enabled):
1076
+ """Show/hide translation UI elements"""
1077
+ return (
1078
+ gr.update(visible=translate_enabled), # translate_btn
1079
+ gr.update(visible=translate_enabled) # translated_text_output
1080
+ )
1081
+
1082
+ def translate_text_interface(text, voice_selection):
1083
+ """Translate text for Voice Studio"""
1084
+ if not text.strip():
1085
+ return "Vui lòng nhập văn bản trước khi dịch"
1086
+
1087
+ target_language = get_target_language_from_voice(voice_selection)
1088
+ translated = translate_text_with_gemini(text, target_language)
1089
+ return translated
1090
+
1091
+ def create_voice_with_translation(original_text, translated_text, translate_enabled, voice_selection, speed):
1092
+ """Create voice using original or translated text"""
1093
+ if translate_enabled and translated_text.strip() and not translated_text.startswith("Lỗi"):
1094
+ # Use translated text
1095
+ return create_audio_voice_studio(translated_text, voice_selection, speed)
1096
+ else:
1097
+ # Use original text
1098
+ return create_audio_voice_studio(original_text, voice_selection, speed)
1099
+
1100
+ # Event handlers for Voice Studio
1101
+ country_dropdown.change(
1102
+ fn=update_voices,
1103
+ inputs=[country_dropdown],
1104
+ outputs=[voice_dropdown]
1105
+ )
1106
+
1107
+ example_vn.click(
1108
+ fn=load_vn_example,
1109
+ outputs=[text_input, country_dropdown]
1110
+ )
1111
+
1112
+ example_en.click(
1113
+ fn=load_en_example,
1114
+ outputs=[text_input, country_dropdown]
1115
+ )
1116
+
1117
+ example_de.click(
1118
+ fn=load_de_example,
1119
+ outputs=[text_input, country_dropdown]
1120
+ )
1121
+
1122
+ example_translate.click(
1123
+ fn=load_translate_example,
1124
+ outputs=[text_input, country_dropdown, translate_checkbox]
1125
+ )
1126
+
1127
+ # Translation UI toggle
1128
+ translate_checkbox.change(
1129
+ fn=toggle_translation_ui,
1130
+ inputs=[translate_checkbox],
1131
+ outputs=[translate_btn, translated_text_output]
1132
+ )
1133
+
1134
+ # Translation button
1135
+ translate_btn.click(
1136
+ fn=translate_text_interface,
1137
+ inputs=[text_input, voice_dropdown],
1138
+ outputs=[translated_text_output]
1139
+ )
1140
+
1141
+ # Generate voice with translation support
1142
+ generate_btn.click(
1143
+ fn=create_voice_with_translation,
1144
+ inputs=[text_input, translated_text_output, translate_checkbox, voice_dropdown, speed_slider],
1145
+ outputs=[audio_output_vs]
1146
+ )
1147
+
1148
+ # Audio Translation Tab
1149
+ with gr.TabItem("🎙️ Audio Translation"):
1150
+ # Colorful feature cards like Voice Studio
1151
+ gr.HTML("""
1152
+ <div style="display: flex; justify-content: center; gap: 15px; margin: 20px 0; flex-wrap: wrap;">
1153
+ <div style="background: linear-gradient(135deg, #FF6B6B 0%, #FF8E53 100%); padding: 15px; border-radius: 10px; color: white; text-align: center; min-width: 150px;">
1154
+ <h4>🎤 Ghi âm</h4>
1155
+ <p style="margin: 0; font-size: 12px;">Microphone</p>
1156
+ <p style="margin: 0; font-size: 10px;">Real-time</p>
1157
+ </div>
1158
+ <div style="background: linear-gradient(135deg, #4ECDC4 0%, #44A08D 100%); padding: 15px; border-radius: 10px; color: white; text-align: center; min-width: 150px;">
1159
+ <h4>📁 Upload</h4>
1160
+ <p style="margin: 0; font-size: 12px;">Audio Files</p>
1161
+ <p style="margin: 0; font-size: 10px;">WAV • MP3</p>
1162
+ </div>
1163
+ <div style="background: linear-gradient(135deg, #45B7D1 0%, #96C93D 100%); padding: 15px; border-radius: 10px; color: white; text-align: center; min-width: 150px;">
1164
+ <h4>🔄 AI Dịch</h4>
1165
+ <p style="margin: 0; font-size: 12px;">13 ngôn ngữ</p>
1166
+ <p style="margin: 0; font-size: 10px;">Gemini 2.0</p>
1167
+ </div>
1168
+ <div style="background: linear-gradient(135deg, #A855F7 0%, #EC4899 100%); padding: 15px; border-radius: 10px; color: white; text-align: center; min-width: 150px;">
1169
+ <h4>🎵 Tổng hợp</h4>
1170
+ <p style="margin: 0; font-size: 12px;">Neural TTS</p>
1171
+ <p style="margin: 0; font-size: 10px;">26 giọng</p>
1172
+ </div>
1173
+ </div>
1174
+ """)
1175
+
1176
+ # Input section with colorful design
1177
+ gr.HTML("""
1178
+ <div style="
1179
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
1180
+ color: white;
1181
+ padding: 20px;
1182
+ border-radius: 15px;
1183
+ margin: 20px 0;
1184
+ text-align: center;
1185
+ box-shadow: 0 8px 32px rgba(0,0,0,0.2);
1186
+ ">
1187
+ <h3 style="margin: 0 0 10px 0;">🎤 Tải lên file audio hoặc ghi âm trực tiếp</h3>
1188
+ <p style="margin: 0; opacity: 0.9; font-size: 0.95em;">
1189
+ Hỗ trợ file WAV, MP3 hoặc ghi âm real-time qua microphone
1190
+ </p>
1191
+ </div>
1192
+ """)
1193
+
1194
+ # Microphone permission notice for iframe
1195
+ gr.HTML("""
1196
+ <div id="microphone-notice" style="
1197
+ background: linear-gradient(135deg, #fff3cd 0%, #ffeaa7 100%);
1198
+ color: #856404;
1199
+ padding: 15px;
1200
+ border-radius: 10px;
1201
+ margin: 15px 0;
1202
+ border: 1px solid #ffeaa7;
1203
+ text-align: center;
1204
+ display: none;
1205
+ ">
1206
+ <strong>🎤 Microphone Access</strong><br>
1207
+ If recording doesn't work, it may be due to iframe restrictions.<br>
1208
+ <a href="#" onclick="window.open(window.location.href, '_blank')" style="
1209
+ background: #667eea;
1210
+ color: white;
1211
+ padding: 8px 16px;
1212
+ text-decoration: none;
1213
+ border-radius: 6px;
1214
+ display: inline-block;
1215
+ margin-top: 8px;
1216
+ ">🔗 Open in New Window</a>
1217
+ </div>
1218
+
1219
+ <script>
1220
+ // Show notice only if in iframe and microphone fails
1221
+ if (window.location !== window.parent.location) {
1222
+ setTimeout(() => {
1223
+ const notice = document.getElementById('microphone-notice');
1224
+ if (notice) notice.style.display = 'block';
1225
+ }, 2000);
1226
+ }
1227
+ </script>
1228
+ """)
1229
+
1230
+ audio_input = gr.Audio(
1231
+ label="📎 Audio Input",
1232
+ type="filepath",
1233
+ sources=["upload", "microphone"],
1234
+ show_label=False
1235
+ )
1236
+
1237
+ # Settings section with gradient header
1238
+ gr.HTML("""
1239
+ <div style="
1240
+ background: linear-gradient(135deg, #FF6B6B 0%, #FF8E53 100%);
1241
+ color: white;
1242
+ padding: 18px;
1243
+ border-radius: 12px;
1244
+ margin: 25px 0 20px 0;
1245
+ text-align: center;
1246
+ box-shadow: 0 6px 24px rgba(255,107,107,0.3);
1247
+ ">
1248
+ <h3 style="margin: 0 0 8px 0;">🌍 Cài đặt dịch thuật</h3>
1249
+ <p style="margin: 0; opacity: 0.9; font-size: 0.9em;">
1250
+ Chọn ngôn ngữ đích và giọng nói cho kết quả dịch thuật
1251
+ </p>
1252
+ </div>
1253
+ """)
1254
+
1255
+ # Separate dropdowns without complex wrappers to avoid CSS conflicts
1256
+ target_country_dropdown = gr.Dropdown(
1257
+ choices=list(voice_choices_by_country.keys()),
1258
+ value="🇻🇳 Việt Nam",
1259
+ label="🌍 Chọn quốc gia đích"
1260
+ )
1261
+
1262
+ target_voice_dropdown = gr.Dropdown(
1263
+ choices=voice_choices_by_country["🇻🇳 Việt Nam"],
1264
+ value="🇻🇳 HoaiMy - Nữ Việt Chuẩn",
1265
+ label="🎭 Chọn giọng nói đích"
1266
+ )
1267
+
1268
+ text_format_dropdown = gr.Dropdown(
1269
+ choices=["TXT (.txt)", "Word (.docx)"] if DOCX_AVAILABLE else ["TXT (.txt)"],
1270
+ value="TXT (.txt)",
1271
+ label="📄 Định dạng file văn bản"
1272
+ )
1273
+
1274
+ # Colorful action button
1275
+ gr.HTML("""
1276
+ <div style="margin: 25px 0 15px 0; text-align: center;">
1277
+ <div style="
1278
+ background: linear-gradient(135deg, #4ECDC4 0%, #44A08D 100%);
1279
+ color: white;
1280
+ padding: 12px 20px;
1281
+ border-radius: 8px;
1282
+ margin-bottom: 15px;
1283
+ box-shadow: 0 4px 15px rgba(78,205,196,0.3);
1284
+ display: inline-block;
1285
+ ">
1286
+ <h4 style="margin: 0; font-size: 1em;">⚡ Sẵn sàng xử lý</h4>
1287
+ </div>
1288
+ </div>
1289
+ """)
1290
+
1291
+ translate_btn = gr.Button(
1292
+ "🔄 BẮT ĐẦU DỊCH",
1293
+ variant="primary",
1294
+ size="lg",
1295
+ elem_classes=["mobile-button"],
1296
+ elem_id="translate-btn"
1297
+ )
1298
+
1299
+ # Results section with colorful headers
1300
+ gr.HTML("""
1301
+ <div style="
1302
+ background: linear-gradient(135deg, #45B7D1 0%, #96C93D 100%);
1303
+ color: white;
1304
+ padding: 18px;
1305
+ border-radius: 12px;
1306
+ margin: 30px 0 20px 0;
1307
+ text-align: center;
1308
+ box-shadow: 0 6px 24px rgba(69,183,209,0.3);
1309
+ ">
1310
+ <h3 style="margin: 0 0 8px 0;">📊 Kết quả xử lý</h3>
1311
+ <p style="margin: 0; opacity: 0.9; font-size: 0.9em;">
1312
+ Phiên âm, dịch thuật và tổng hợp giọng nói
1313
+ </p>
1314
+ </div>
1315
+ """)
1316
+
1317
+ # Dynamic status indicator
1318
+ status_text = gr.HTML("""
1319
+ <div style="
1320
+ text-align: center;
1321
+ margin: 20px 0;
1322
+ padding: 15px;
1323
+ background: linear-gradient(135deg, #A855F7 0%, #EC4899 100%);
1324
+ border-radius: 12px;
1325
+ color: white;
1326
+ box-shadow: 0 4px 15px rgba(168,85,247,0.3);
1327
+ ">
1328
+ <span style="font-weight: bold; font-size: 1.1em;">
1329
+ ✅ Sẵn sàng xử lý
1330
+ </span>
1331
+ </div>
1332
+ """)
1333
+
1334
+ # Card-based layout for mobile
1335
+ with gr.Column(elem_classes=["output-area"]):
1336
+ # Original content card
1337
+ gr.HTML("""
1338
+ <div style="
1339
+ background: linear-gradient(135deg, #e3f2fd 0%, #bbdefb 100%);
1340
+ padding: 15px;
1341
+ border-radius: 12px;
1342
+ margin: 15px 0;
1343
+ border-left: 4px solid #2196F3;
1344
+ ">
1345
+ <h4 style="margin: 0 0 10px 0; color: #1976D2;">📝 Nội dung gốc từ audio</h4>
1346
+ </div>
1347
+ """)
1348
+
1349
+ transcription_output = gr.Textbox(
1350
+ label="🎯 Phiên âm từ audio",
1351
+ lines=4,
1352
+ interactive=False,
1353
+ placeholder="Nội dung phiên âm từ file audio sẽ hiển thị ở đây...",
1354
+ elem_classes=["mobile-textbox"]
1355
+ )
1356
+
1357
+ detected_language = gr.Textbox(
1358
+ label="🌐 Ngôn ngữ được phát hiện",
1359
+ lines=1,
1360
+ interactive=False,
1361
+ placeholder="Tự động nhận diện...",
1362
+ elem_classes=["mobile-textbox"]
1363
+ )
1364
+
1365
+
1366
+ # Translation result card
1367
+ gr.HTML("""
1368
+ <div style="
1369
+ background: linear-gradient(135deg, #e8f5e8 0%, #c8e6c9 100%);
1370
+ padding: 15px;
1371
+ border-radius: 12px;
1372
+ margin: 15px 0;
1373
+ border-left: 4px solid #4CAF50;
1374
+ ">
1375
+ <h4 style="margin: 0 0 10px 0; color: #388E3C;">✨ Kết quả dịch thuật</h4>
1376
+ </div>
1377
+ """)
1378
+
1379
+ translation_output = gr.Textbox(
1380
+ label="🔄 Nội dung đã dịch",
1381
+ lines=4,
1382
+ interactive=False,
1383
+ placeholder="Bản dịch sẽ hiển thị ở đây...",
1384
+ elem_classes=["mobile-textbox"]
1385
+ )
1386
+
1387
+ target_language_display = gr.Textbox(
1388
+ label="🎯 Ngôn ngữ đích",
1389
+ lines=1,
1390
+ interactive=False,
1391
+ placeholder="Chưa chọn...",
1392
+ elem_classes=["mobile-textbox"]
1393
+ )
1394
+
1395
+ # Mobile-friendly comparison section
1396
+ with gr.Accordion("🔍 So sánh nội dung", open=False):
1397
+ gr.HTML("""
1398
+ <div style="
1399
+ text-align: center;
1400
+ margin-bottom: 15px;
1401
+ padding: 10px;
1402
+ background: #f5f5f5;
1403
+ border-radius: 8px;
1404
+ ">
1405
+ <p style="color: #666; font-style: italic; margin: 0;">
1406
+ Xem nội dung gốc và bản dịch để so sánh
1407
+ </p>
1408
+ </div>
1409
+ """)
1410
+
1411
+ # Stack vertically on mobile for better readability
1412
+ with gr.Column():
1413
+ gr.HTML("""
1414
+ <div style="
1415
+ background: #e3f2fd;
1416
+ padding: 10px;
1417
+ border-radius: 8px;
1418
+ margin: 10px 0;
1419
+ text-align: center;
1420
+ font-weight: bold;
1421
+ color: #1976D2;
1422
+ ">📝 Ngôn ngữ gốc</div>
1423
+ """)
1424
+ original_compare = gr.Textbox(
1425
+ label="",
1426
+ lines=4,
1427
+ interactive=False,
1428
+ show_label=False,
1429
+ placeholder="Nội dung phiên âm từ audio sẽ hiển thị ở đây...",
1430
+ elem_classes=["mobile-compare"]
1431
+ )
1432
+
1433
+ gr.HTML("""
1434
+ <div style="
1435
+ background: #e8f5e8;
1436
+ padding: 10px;
1437
+ border-radius: 8px;
1438
+ margin: 15px 0 5px 0;
1439
+ text-align: center;
1440
+ font-weight: bold;
1441
+ color: #388E3C;
1442
+ ">✨ Sau khi dịch</div>
1443
+ """)
1444
+ translated_compare = gr.Textbox(
1445
+ label="",
1446
+ lines=4,
1447
+ interactive=False,
1448
+ show_label=False,
1449
+ placeholder="Nội dung sau khi dịch sẽ hiển thị ở đây...",
1450
+ elem_classes=["mobile-compare"]
1451
+ )
1452
+
1453
+ # Mobile-optimized download section
1454
+ with gr.Accordion("💾 Tải xuống kết quả", open=True):
1455
+ gr.HTML("""
1456
+ <div style="
1457
+ background: linear-gradient(135deg, #fff3e0 0%, #ffcc80 100%);
1458
+ padding: 15px;
1459
+ border-radius: 12px;
1460
+ margin: 15px 0;
1461
+ border-left: 4px solid #FF9800;
1462
+ text-align: center;
1463
+ ">
1464
+ <h4 style="margin: 0 0 10px 0; color: #E65100;">💾 Tải xuống kết quả</h4>
1465
+ <p style="color: #BF360C; margin: 0; font-style: italic;">
1466
+ File audio và văn bản đã dịch
1467
+ </p>
1468
+ </div>
1469
+ """)
1470
+
1471
+ # Stack downloads vertically for mobile
1472
+ with gr.Column():
1473
+ gr.HTML("""
1474
+ <div style="
1475
+ background: #e3f2fd;
1476
+ padding: 12px;
1477
+ border-radius: 8px;
1478
+ margin: 15px 0 10px 0;
1479
+ text-align: center;
1480
+ font-weight: bold;
1481
+ color: #1976D2;
1482
+ ">🔊 Audio đã dịch</div>
1483
+ """)
1484
+ audio_output_at = gr.Audio(
1485
+ label="",
1486
+ type="filepath",
1487
+ show_label=False,
1488
+ elem_classes=["mobile-audio"]
1489
+ )
1490
+
1491
+ gr.HTML("""
1492
+ <div style="
1493
+ background: #e8f5e8;
1494
+ padding: 12px;
1495
+ border-radius: 8px;
1496
+ margin: 25px 0 10px 0;
1497
+ text-align: center;
1498
+ font-weight: bold;
1499
+ color: #388E3C;
1500
+ ">📄 Văn bản đã dịch</div>
1501
+ """)
1502
+ text_output = gr.File(
1503
+ label="",
1504
+ file_count="single",
1505
+ file_types=[".txt", ".docx"],
1506
+ show_label=False,
1507
+ elem_classes=["mobile-file"]
1508
+ )
1509
+
1510
+ # Event handlers for Audio Translation with colorful status
1511
+ def update_status_processing():
1512
+ return """
1513
+ <div style="
1514
+ text-align: center;
1515
+ margin: 20px 0;
1516
+ padding: 15px;
1517
+ background: linear-gradient(135deg, #FF8E53 0%, #FF6B6B 100%);
1518
+ border-radius: 12px;
1519
+ color: white;
1520
+ box-shadow: 0 4px 15px rgba(255,142,83,0.3);
1521
+ ">
1522
+ <span style="font-weight: bold; font-size: 1.1em;">
1523
+ ⏳ Đang xử lý...
1524
+ </span>
1525
+ </div>
1526
+ """
1527
+
1528
+ def update_status_complete():
1529
+ return """
1530
+ <div style="
1531
+ text-align: center;
1532
+ margin: 20px 0;
1533
+ padding: 15px;
1534
+ background: linear-gradient(135deg, #4ECDC4 0%, #44A08D 100%);
1535
+ border-radius: 12px;
1536
+ color: white;
1537
+ box-shadow: 0 4px 15px rgba(78,205,196,0.3);
1538
+ ">
1539
+ <span style="font-weight: bold; font-size: 1.1em;">
1540
+ ✅ Hoàn thành!
1541
+ </span>
1542
+ </div>
1543
+ """
1544
+
1545
+ target_country_dropdown.change(
1546
+ fn=update_voices,
1547
+ inputs=[target_country_dropdown],
1548
+ outputs=[target_voice_dropdown]
1549
+ )
1550
+
1551
+ # Update target language display when dropdown changes
1552
+ target_voice_dropdown.change(
1553
+ fn=lambda voice: voice,
1554
+ inputs=[target_voice_dropdown],
1555
+ outputs=[target_language_display]
1556
+ )
1557
+
1558
+ # Helper function to extract format
1559
+ def get_format_from_dropdown(format_choice):
1560
+ if "Word" in format_choice:
1561
+ return "docx"
1562
+ return "txt"
1563
+
1564
+ translate_btn.click(
1565
+ fn=lambda: update_status_processing(),
1566
+ outputs=[status_text]
1567
+ ).then(
1568
+ fn=lambda audio, country, voice, fmt: translate_audio(audio, country, voice, get_format_from_dropdown(fmt)),
1569
+ inputs=[audio_input, target_country_dropdown, target_voice_dropdown, text_format_dropdown],
1570
+ outputs=[
1571
+ transcription_output,
1572
+ detected_language,
1573
+ translation_output,
1574
+ target_language_display,
1575
+ audio_output_at,
1576
+ original_compare,
1577
+ translated_compare,
1578
+ text_output
1579
+ ]
1580
+ ).then(
1581
+ fn=lambda: update_status_complete(),
1582
+ outputs=[status_text]
1583
+ )
1584
+
1585
+ # Footer
1586
+ gr.HTML("""
1587
+ <div class="custom-footer">
1588
+ <div style="display: flex; justify-content: center; align-items: center; gap: 15px; flex-wrap: wrap;">
1589
+ <div style="display: flex; align-items: center; gap: 8px;">
1590
+ <div style="background: rgba(255,255,255,0.2); padding: 8px 15px; border-radius: 20px; font-size: 16px;">
1591
+ 🧠 DB
1592
+ </div>
1593
+ <span style="font-size: 18px; font-weight: bold;">Digitized Brains</span>
1594
+ </div>
1595
+ <div style="font-size: 14px; opacity: 0.9;">
1596
+ Voice Studio - AI Powered
1597
+ </div>
1598
+ </div>
1599
+ </div>
1600
+ """)
1601
+
1602
+ # Add JavaScript for button effects
1603
+ gr.HTML(js_code)
1604
+
1605
+ if __name__ == "__main__":
1606
+ import sys
1607
+ import locale
1608
+ import os
1609
+
1610
+ # Ensure UTF-8 encoding
1611
+ if sys.platform == 'win32':
1612
+ os.environ['PYTHONIOENCODING'] = 'utf-8'
1613
+
1614
+ # Set environment variables for iframe support
1615
+ os.environ['GRADIO_ALLOW_FLAGGING'] = 'never'
1616
+ os.environ['GRADIO_TEMP_DIR'] = '/tmp'
1617
+
1618
+ # Hugging Face Spaces configuration
1619
+ port = int(os.environ.get("GRADIO_SERVER_PORT", 7860))
1620
+
1621
+ demo.launch(
1622
+ server_name="0.0.0.0",
1623
+ server_port=port,
1624
+ share=False,
1625
+ show_error=True
1626
+ )