Spaces:
Running
Running
Upload 3 files
Browse files- README.md +155 -0
- app.py +0 -0
- requirements.txt +9 -0
README.md
ADDED
|
@@ -0,0 +1,155 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Voice Studio & Audio Translation
|
| 3 |
+
emoji: ๐ค
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: purple
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: 5.43.1
|
| 8 |
+
app_file: app.py
|
| 9 |
+
pinned: false
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# ๐ค Voice Studio & Audio Translation
|
| 13 |
+
|
| 14 |
+
A comprehensive AI-powered application that combines text-to-speech synthesis and audio translation capabilities in one unified interface.
|
| 15 |
+
|
| 16 |
+
## ๐ Features
|
| 17 |
+
|
| 18 |
+
### ๐ค Voice Studio
|
| 19 |
+
- **26 High-Quality Voices**: Standard neural voices across 13 countries
|
| 20 |
+
- **Multi-Language Support**: Vietnamese, English (US/UK), German, French, Spanish, Italian, Japanese, Korean, Chinese, Russian, Portuguese, Arabic
|
| 21 |
+
- **Speed Control**: Adjustable speech rate from 0.5x to 2.0x
|
| 22 |
+
- **Instant Download**: Generate and download MP3 files
|
| 23 |
+
- **Pure Neural Voices**: Only official Edge TTS neural voices, no artificial variations
|
| 24 |
+
|
| 25 |
+
### ๐๏ธ Audio Translation
|
| 26 |
+
- **Audio Transcription**: Powered by Google Gemini 2.0 Flash
|
| 27 |
+
- **Language Detection**: Automatic source language identification
|
| 28 |
+
- **Cultural Translation**: Context-aware translation preserving cultural nuances
|
| 29 |
+
- **Voice Synthesis**: Integrated with Voice Studio's 26 voices
|
| 30 |
+
- **Multiple Formats**: Download as TXT or Word documents
|
| 31 |
+
- **Side-by-Side Comparison**: Compare original and translated content
|
| 32 |
+
|
| 33 |
+
## ๐ Supported Languages
|
| 34 |
+
|
| 35 |
+
**Voice Studio (26 voices):**
|
| 36 |
+
- ๐ป๐ณ **Vietnamese**: HoaiMy (Female), NamMinh (Male)
|
| 37 |
+
- ๐บ๐ธ **American English**: Aria (Female), Guy (Male)
|
| 38 |
+
- ๐ฌ๐ง **British English**: Sonia (Female), Ryan (Male)
|
| 39 |
+
- ๐ฉ๐ช **German**: Katja (Female), Conrad (Male)
|
| 40 |
+
- ๐ซ๐ท **French**: Denise (Female), Henri (Male)
|
| 41 |
+
- ๐ช๐ธ **Spanish**: Elvira (Female), Alvaro (Male)
|
| 42 |
+
- ๐ฎ๐น **Italian**: Elsa (Female), Diego (Male)
|
| 43 |
+
- ๐ฏ๐ต **Japanese**: Nanami (Female), Keita (Male)
|
| 44 |
+
- ๐ฐ๐ท **Korean**: SunHi (Female), BongJin (Male)
|
| 45 |
+
- ๐จ๐ณ **Chinese**: Xiaoxiao (Female), Yunxi (Male)
|
| 46 |
+
- ๐ท๐บ **Russian**: Svetlana (Female), Dmitry (Male)
|
| 47 |
+
- ๐ต๐น **Portuguese**: Francisca (Female), Antonio (Male)
|
| 48 |
+
- ๐ธ๐ฆ **Arabic**: Zariyah (Female), Hamed (Male)
|
| 49 |
+
|
| 50 |
+
**Audio Translation:**
|
| 51 |
+
- All Voice Studio languages plus additional Google TTS supported languages
|
| 52 |
+
|
| 53 |
+
## ๐ง Technology Stack
|
| 54 |
+
|
| 55 |
+
- **Frontend**: Gradio 4.0+ with responsive mobile design
|
| 56 |
+
- **TTS Engine**: Microsoft Edge TTS Neural Voices
|
| 57 |
+
- **AI Translation**: Google Gemini 2.0 Flash
|
| 58 |
+
- **Audio Processing**: Google Text-to-Speech, advanced audio libraries
|
| 59 |
+
- **File Handling**: SoundFile, Librosa, python-docx
|
| 60 |
+
|
| 61 |
+
## โ๏ธ Setup
|
| 62 |
+
|
| 63 |
+
### Prerequisites
|
| 64 |
+
- Python 3.8+
|
| 65 |
+
- Google Gemini API Key
|
| 66 |
+
|
| 67 |
+
### Environment Variables
|
| 68 |
+
```bash
|
| 69 |
+
export GEMINI_API_KEY="your_gemini_api_key_here"
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
### Installation
|
| 73 |
+
```bash
|
| 74 |
+
pip install -r requirements.txt
|
| 75 |
+
```
|
| 76 |
+
|
| 77 |
+
### Run the Application
|
| 78 |
+
```bash
|
| 79 |
+
python app.py
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
The application will be available at `http://localhost:7860`
|
| 83 |
+
|
| 84 |
+
## ๐ฑ Mobile Optimized
|
| 85 |
+
|
| 86 |
+
The interface is fully responsive and optimized for mobile devices with:
|
| 87 |
+
- Touch-friendly buttons
|
| 88 |
+
- Vertical stacking on small screens
|
| 89 |
+
- Optimized font sizes and spacing
|
| 90 |
+
- Mobile-first design approach
|
| 91 |
+
|
| 92 |
+
## ๐ Privacy & Security
|
| 93 |
+
|
| 94 |
+
- **No Data Storage**: All processing is done in memory
|
| 95 |
+
- **Temporary Files**: Audio and text files are automatically cleaned up
|
| 96 |
+
- **Secure API**: Uses environment variables for API keys
|
| 97 |
+
- **Local Processing**: Text-to-speech runs locally using Edge TTS
|
| 98 |
+
|
| 99 |
+
## ๐ฏ Use Cases
|
| 100 |
+
|
| 101 |
+
- **Language Learning**: Practice pronunciation in multiple languages
|
| 102 |
+
- **Content Creation**: Generate multilingual audio content
|
| 103 |
+
- **Accessibility**: Convert text to speech for visually impaired users
|
| 104 |
+
- **Translation Services**: Translate audio content while preserving voice characteristics
|
| 105 |
+
- **Podcast Localization**: Create multilingual versions of audio content
|
| 106 |
+
|
| 107 |
+
## ๐ ๏ธ Advanced Features
|
| 108 |
+
|
| 109 |
+
- **Automatic Language Detection**: Intelligently detects source language
|
| 110 |
+
- **Cultural Context Preservation**: Maintains meaning across cultural boundaries
|
| 111 |
+
- **High-Quality Audio**: WAV format output for best quality
|
| 112 |
+
- **Batch Processing Ready**: Designed for scalability
|
| 113 |
+
- **Error Handling**: Comprehensive error management and user feedback
|
| 114 |
+
|
| 115 |
+
## ๐ฆ Deployment
|
| 116 |
+
|
| 117 |
+
### Hugging Face Spaces
|
| 118 |
+
This application is ready for deployment on Hugging Face Spaces:
|
| 119 |
+
|
| 120 |
+
1. Upload all files to your Hugging Face Space
|
| 121 |
+
2. Set `GEMINI_API_KEY` in Space secrets
|
| 122 |
+
3. The app will automatically start on port 7860
|
| 123 |
+
|
| 124 |
+
### Docker Support
|
| 125 |
+
```dockerfile
|
| 126 |
+
FROM python:3.9-slim
|
| 127 |
+
|
| 128 |
+
WORKDIR /app
|
| 129 |
+
COPY requirements.txt .
|
| 130 |
+
RUN pip install -r requirements.txt
|
| 131 |
+
|
| 132 |
+
COPY app.py .
|
| 133 |
+
EXPOSE 7860
|
| 134 |
+
|
| 135 |
+
CMD ["python", "app.py"]
|
| 136 |
+
```
|
| 137 |
+
|
| 138 |
+
## ๐ค Contributing
|
| 139 |
+
|
| 140 |
+
Contributions are welcome! Please feel free to submit a Pull Request.
|
| 141 |
+
|
| 142 |
+
## ๐ License
|
| 143 |
+
|
| 144 |
+
This project is licensed under the MIT License.
|
| 145 |
+
|
| 146 |
+
## ๐ Acknowledgments
|
| 147 |
+
|
| 148 |
+
- Microsoft Edge TTS for high-quality neural voices
|
| 149 |
+
- Google Gemini for advanced AI capabilities
|
| 150 |
+
- Librosa for advanced audio processing
|
| 151 |
+
- Gradio team for the excellent UI framework
|
| 152 |
+
|
| 153 |
+
---
|
| 154 |
+
|
| 155 |
+
**Developed by Digitized Brains** ๐ง
|
app.py
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
requirements.txt
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Optimized requirements for Hugging Face Spaces
|
| 2 |
+
gradio>=4.0.0,<5.0.0
|
| 3 |
+
google-generativeai>=0.8.0,<1.0.0
|
| 4 |
+
gtts>=2.5.0,<3.0.0
|
| 5 |
+
soundfile>=0.13.0,<1.0.0
|
| 6 |
+
edge-tts>=6.1.0,<7.0.0
|
| 7 |
+
numpy>=1.26.0,<2.0.0
|
| 8 |
+
python-docx>=1.1.0,<2.0.0
|
| 9 |
+
PyPDF2>=3.0.0,<4.0.0
|