Spaces:

ducnguyen1978
/

Voice_Agent

Running

App Files Files Community

ducnguyen1978 commited on Aug 22, 2025

Commit

9b237e2

verified ·

1 Parent(s): e17c366

Upload 3 files

Browse files

Files changed (3) hide show

README.md +155 -0
app.py +0 -0
requirements.txt +9 -0

README.md ADDED Viewed

	@@ -0,0 +1,155 @@

+---
+title: Voice Studio & Audio Translation
+emoji: 🎤
+colorFrom: blue
+colorTo: purple
+sdk: gradio
+sdk_version: 5.43.1
+app_file: app.py
+pinned: false
+---
+# 🎤 Voice Studio & Audio Translation
+A comprehensive AI-powered application that combines text-to-speech synthesis and audio translation capabilities in one unified interface.
+## 🌟 Features
+### 🎤 Voice Studio
+- **26 High-Quality Voices**: Standard neural voices across 13 countries
+- **Multi-Language Support**: Vietnamese, English (US/UK), German, French, Spanish, Italian, Japanese, Korean, Chinese, Russian, Portuguese, Arabic
+- **Speed Control**: Adjustable speech rate from 0.5x to 2.0x
+- **Instant Download**: Generate and download MP3 files
+- **Pure Neural Voices**: Only official Edge TTS neural voices, no artificial variations
+### 🎙️ Audio Translation
+- **Audio Transcription**: Powered by Google Gemini 2.0 Flash
+- **Language Detection**: Automatic source language identification
+- **Cultural Translation**: Context-aware translation preserving cultural nuances
+- **Voice Synthesis**: Integrated with Voice Studio's 26 voices
+- **Multiple Formats**: Download as TXT or Word documents
+- **Side-by-Side Comparison**: Compare original and translated content
+## 🚀 Supported Languages
+**Voice Studio (26 voices):**
+- 🇻🇳 **Vietnamese**: HoaiMy (Female), NamMinh (Male)
+- 🇺🇸 **American English**: Aria (Female), Guy (Male)
+- 🇬🇧 **British English**: Sonia (Female), Ryan (Male)
+- 🇩🇪 **German**: Katja (Female), Conrad (Male)
+- 🇫🇷 **French**: Denise (Female), Henri (Male)
+- 🇪🇸 **Spanish**: Elvira (Female), Alvaro (Male)
+- 🇮🇹 **Italian**: Elsa (Female), Diego (Male)
+- 🇯🇵 **Japanese**: Nanami (Female), Keita (Male)
+- 🇰🇷 **Korean**: SunHi (Female), BongJin (Male)
+- 🇨🇳 **Chinese**: Xiaoxiao (Female), Yunxi (Male)
+- 🇷🇺 **Russian**: Svetlana (Female), Dmitry (Male)
+- 🇵🇹 **Portuguese**: Francisca (Female), Antonio (Male)
+- 🇸🇦 **Arabic**: Zariyah (Female), Hamed (Male)
+**Audio Translation:**
+- All Voice Studio languages plus additional Google TTS supported languages
+## 🔧 Technology Stack
+- **Frontend**: Gradio 4.0+ with responsive mobile design
+- **TTS Engine**: Microsoft Edge TTS Neural Voices
+- **AI Translation**: Google Gemini 2.0 Flash
+- **Audio Processing**: Google Text-to-Speech, advanced audio libraries
+- **File Handling**: SoundFile, Librosa, python-docx
+## ⚙️ Setup
+### Prerequisites
+- Python 3.8+
+- Google Gemini API Key
+### Environment Variables
+```bash
+export GEMINI_API_KEY="your_gemini_api_key_here"
+```
+### Installation
+```bash
+pip install -r requirements.txt
+```
+### Run the Application
+```bash
+python app.py
+```
+The application will be available at `http://localhost:7860`
+## 📱 Mobile Optimized
+The interface is fully responsive and optimized for mobile devices with:
+- Touch-friendly buttons
+- Vertical stacking on small screens
+- Optimized font sizes and spacing
+- Mobile-first design approach
+## 🔒 Privacy & Security
+- **No Data Storage**: All processing is done in memory
+- **Temporary Files**: Audio and text files are automatically cleaned up
+- **Secure API**: Uses environment variables for API keys
+- **Local Processing**: Text-to-speech runs locally using Edge TTS
+## 🎯 Use Cases
+- **Language Learning**: Practice pronunciation in multiple languages
+- **Content Creation**: Generate multilingual audio content
+- **Accessibility**: Convert text to speech for visually impaired users
+- **Translation Services**: Translate audio content while preserving voice characteristics
+- **Podcast Localization**: Create multilingual versions of audio content
+## 🛠️ Advanced Features
+- **Automatic Language Detection**: Intelligently detects source language
+- **Cultural Context Preservation**: Maintains meaning across cultural boundaries
+- **High-Quality Audio**: WAV format output for best quality
+- **Batch Processing Ready**: Designed for scalability
+- **Error Handling**: Comprehensive error management and user feedback
+## 📦 Deployment
+### Hugging Face Spaces
+This application is ready for deployment on Hugging Face Spaces:
+1. Upload all files to your Hugging Face Space
+2. Set `GEMINI_API_KEY` in Space secrets
+3. The app will automatically start on port 7860
+### Docker Support
+```dockerfile
+FROM python:3.9-slim
+WORKDIR /app
+COPY requirements.txt .
+RUN pip install -r requirements.txt
+COPY app.py .
+EXPOSE 7860
+CMD ["python", "app.py"]
+```
+## 🤝 Contributing
+Contributions are welcome! Please feel free to submit a Pull Request.
+## 📄 License
+This project is licensed under the MIT License.
+## 🙏 Acknowledgments
+- Microsoft Edge TTS for high-quality neural voices
+- Google Gemini for advanced AI capabilities
+- Librosa for advanced audio processing
+- Gradio team for the excellent UI framework
+---
+**Developed by Digitized Brains** 🧠

app.py ADDED Viewed

The diff for this file is too large to render. See raw diff

requirements.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+# Optimized requirements for Hugging Face Spaces
+gradio>=4.0.0,<5.0.0
+google-generativeai>=0.8.0,<1.0.0
+gtts>=2.5.0,<3.0.0
+soundfile>=0.13.0,<1.0.0
+edge-tts>=6.1.0,<7.0.0
+numpy>=1.26.0,<2.0.0
+python-docx>=1.1.0,<2.0.0
+PyPDF2>=3.0.0,<4.0.0