ducnguyen1978 commited on
Commit
9b237e2
ยท
verified ยท
1 Parent(s): e17c366

Upload 3 files

Browse files
Files changed (3) hide show
  1. README.md +155 -0
  2. app.py +0 -0
  3. requirements.txt +9 -0
README.md ADDED
@@ -0,0 +1,155 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Voice Studio & Audio Translation
3
+ emoji: ๐ŸŽค
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: gradio
7
+ sdk_version: 5.43.1
8
+ app_file: app.py
9
+ pinned: false
10
+ ---
11
+
12
+ # ๐ŸŽค Voice Studio & Audio Translation
13
+
14
+ A comprehensive AI-powered application that combines text-to-speech synthesis and audio translation capabilities in one unified interface.
15
+
16
+ ## ๐ŸŒŸ Features
17
+
18
+ ### ๐ŸŽค Voice Studio
19
+ - **26 High-Quality Voices**: Standard neural voices across 13 countries
20
+ - **Multi-Language Support**: Vietnamese, English (US/UK), German, French, Spanish, Italian, Japanese, Korean, Chinese, Russian, Portuguese, Arabic
21
+ - **Speed Control**: Adjustable speech rate from 0.5x to 2.0x
22
+ - **Instant Download**: Generate and download MP3 files
23
+ - **Pure Neural Voices**: Only official Edge TTS neural voices, no artificial variations
24
+
25
+ ### ๐ŸŽ™๏ธ Audio Translation
26
+ - **Audio Transcription**: Powered by Google Gemini 2.0 Flash
27
+ - **Language Detection**: Automatic source language identification
28
+ - **Cultural Translation**: Context-aware translation preserving cultural nuances
29
+ - **Voice Synthesis**: Integrated with Voice Studio's 26 voices
30
+ - **Multiple Formats**: Download as TXT or Word documents
31
+ - **Side-by-Side Comparison**: Compare original and translated content
32
+
33
+ ## ๐Ÿš€ Supported Languages
34
+
35
+ **Voice Studio (26 voices):**
36
+ - ๐Ÿ‡ป๐Ÿ‡ณ **Vietnamese**: HoaiMy (Female), NamMinh (Male)
37
+ - ๐Ÿ‡บ๐Ÿ‡ธ **American English**: Aria (Female), Guy (Male)
38
+ - ๐Ÿ‡ฌ๐Ÿ‡ง **British English**: Sonia (Female), Ryan (Male)
39
+ - ๐Ÿ‡ฉ๐Ÿ‡ช **German**: Katja (Female), Conrad (Male)
40
+ - ๐Ÿ‡ซ๐Ÿ‡ท **French**: Denise (Female), Henri (Male)
41
+ - ๐Ÿ‡ช๐Ÿ‡ธ **Spanish**: Elvira (Female), Alvaro (Male)
42
+ - ๐Ÿ‡ฎ๐Ÿ‡น **Italian**: Elsa (Female), Diego (Male)
43
+ - ๐Ÿ‡ฏ๐Ÿ‡ต **Japanese**: Nanami (Female), Keita (Male)
44
+ - ๐Ÿ‡ฐ๐Ÿ‡ท **Korean**: SunHi (Female), BongJin (Male)
45
+ - ๐Ÿ‡จ๐Ÿ‡ณ **Chinese**: Xiaoxiao (Female), Yunxi (Male)
46
+ - ๐Ÿ‡ท๐Ÿ‡บ **Russian**: Svetlana (Female), Dmitry (Male)
47
+ - ๐Ÿ‡ต๐Ÿ‡น **Portuguese**: Francisca (Female), Antonio (Male)
48
+ - ๐Ÿ‡ธ๐Ÿ‡ฆ **Arabic**: Zariyah (Female), Hamed (Male)
49
+
50
+ **Audio Translation:**
51
+ - All Voice Studio languages plus additional Google TTS supported languages
52
+
53
+ ## ๐Ÿ”ง Technology Stack
54
+
55
+ - **Frontend**: Gradio 4.0+ with responsive mobile design
56
+ - **TTS Engine**: Microsoft Edge TTS Neural Voices
57
+ - **AI Translation**: Google Gemini 2.0 Flash
58
+ - **Audio Processing**: Google Text-to-Speech, advanced audio libraries
59
+ - **File Handling**: SoundFile, Librosa, python-docx
60
+
61
+ ## โš™๏ธ Setup
62
+
63
+ ### Prerequisites
64
+ - Python 3.8+
65
+ - Google Gemini API Key
66
+
67
+ ### Environment Variables
68
+ ```bash
69
+ export GEMINI_API_KEY="your_gemini_api_key_here"
70
+ ```
71
+
72
+ ### Installation
73
+ ```bash
74
+ pip install -r requirements.txt
75
+ ```
76
+
77
+ ### Run the Application
78
+ ```bash
79
+ python app.py
80
+ ```
81
+
82
+ The application will be available at `http://localhost:7860`
83
+
84
+ ## ๐Ÿ“ฑ Mobile Optimized
85
+
86
+ The interface is fully responsive and optimized for mobile devices with:
87
+ - Touch-friendly buttons
88
+ - Vertical stacking on small screens
89
+ - Optimized font sizes and spacing
90
+ - Mobile-first design approach
91
+
92
+ ## ๐Ÿ”’ Privacy & Security
93
+
94
+ - **No Data Storage**: All processing is done in memory
95
+ - **Temporary Files**: Audio and text files are automatically cleaned up
96
+ - **Secure API**: Uses environment variables for API keys
97
+ - **Local Processing**: Text-to-speech runs locally using Edge TTS
98
+
99
+ ## ๐ŸŽฏ Use Cases
100
+
101
+ - **Language Learning**: Practice pronunciation in multiple languages
102
+ - **Content Creation**: Generate multilingual audio content
103
+ - **Accessibility**: Convert text to speech for visually impaired users
104
+ - **Translation Services**: Translate audio content while preserving voice characteristics
105
+ - **Podcast Localization**: Create multilingual versions of audio content
106
+
107
+ ## ๐Ÿ› ๏ธ Advanced Features
108
+
109
+ - **Automatic Language Detection**: Intelligently detects source language
110
+ - **Cultural Context Preservation**: Maintains meaning across cultural boundaries
111
+ - **High-Quality Audio**: WAV format output for best quality
112
+ - **Batch Processing Ready**: Designed for scalability
113
+ - **Error Handling**: Comprehensive error management and user feedback
114
+
115
+ ## ๐Ÿ“ฆ Deployment
116
+
117
+ ### Hugging Face Spaces
118
+ This application is ready for deployment on Hugging Face Spaces:
119
+
120
+ 1. Upload all files to your Hugging Face Space
121
+ 2. Set `GEMINI_API_KEY` in Space secrets
122
+ 3. The app will automatically start on port 7860
123
+
124
+ ### Docker Support
125
+ ```dockerfile
126
+ FROM python:3.9-slim
127
+
128
+ WORKDIR /app
129
+ COPY requirements.txt .
130
+ RUN pip install -r requirements.txt
131
+
132
+ COPY app.py .
133
+ EXPOSE 7860
134
+
135
+ CMD ["python", "app.py"]
136
+ ```
137
+
138
+ ## ๐Ÿค Contributing
139
+
140
+ Contributions are welcome! Please feel free to submit a Pull Request.
141
+
142
+ ## ๐Ÿ“„ License
143
+
144
+ This project is licensed under the MIT License.
145
+
146
+ ## ๐Ÿ™ Acknowledgments
147
+
148
+ - Microsoft Edge TTS for high-quality neural voices
149
+ - Google Gemini for advanced AI capabilities
150
+ - Librosa for advanced audio processing
151
+ - Gradio team for the excellent UI framework
152
+
153
+ ---
154
+
155
+ **Developed by Digitized Brains** ๐Ÿง 
app.py ADDED
The diff for this file is too large to render. See raw diff
 
requirements.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ # Optimized requirements for Hugging Face Spaces
2
+ gradio>=4.0.0,<5.0.0
3
+ google-generativeai>=0.8.0,<1.0.0
4
+ gtts>=2.5.0,<3.0.0
5
+ soundfile>=0.13.0,<1.0.0
6
+ edge-tts>=6.1.0,<7.0.0
7
+ numpy>=1.26.0,<2.0.0
8
+ python-docx>=1.1.0,<2.0.0
9
+ PyPDF2>=3.0.0,<4.0.0