Spaces:
Sleeping
Sleeping
| title: Realtime Voice Translator | |
| emoji: 🗣️ | |
| sdk: docker | |
| app_port: 7860 | |
| # Real-Time English/French Voice Translator Web App | |
| This project provides a real-time, bidirectional voice translation web application. Speak in English or French into your browser, and hear the translation in the other language almost instantly. | |
| It is built to be easily deployed as a HuggingFace Space. | |
| It uses a combination of cutting-edge APIs for high-quality speech recognition, translation, and synthesis: | |
| - **Speech-to-Text (STT):** Google Cloud Speech-to-Text | |
| - **Translation:** DeepL API | |
| - **Text-to-Speech (TTS):** ElevenLabs API | |
| ## Features | |
| - **Web-Based UI:** A simple and clean browser interface for real-time translation. | |
| - **Bidirectional Translation:** Simultaneously listens for both English and French and translates to the other language. | |
| - **Low Latency:** Built with `asyncio`, WebSockets, and multithreading for a responsive, conversational experience. | |
| - **High-Quality Voice:** Leverages ElevenLabs for natural-sounding synthesized speech. | |
| - **Echo Suppression:** The translator is smart enough not to translate its own spoken output. | |
| ## How It Works | |
| The application is composed of a web frontend and a Python backend: | |
| 1. **Audio Capture (Frontend):** The browser's JavaScript captures audio from your microphone using the Web Audio API. | |
| 2. **WebSocket Streaming:** The audio is chunked and streamed over a WebSocket connection to the FastAPI backend. | |
| 3. **Backend Processing:** | |
| - The `VoiceTranslator` class receives the audio stream. | |
| - The audio is fed into two separate Google Cloud STT streams in parallel (`en-US` and `fr-FR`). | |
| - When an STT stream detects a final utterance, it's sent to the DeepL API for translation. | |
| - The translated text is sent to the ElevenLabs streaming TTS API. | |
| 4. **Audio Playback (Frontend):** The synthesized audio from ElevenLabs is streamed back to the browser through the WebSocket and played instantly. | |
| ## Requirements | |
| ### 1. Software | |
| - Python 3.8+ | |
| - `pip` and `venv` | |
| - **FFmpeg:** This is a system dependency for audio format conversion. | |
| - **macOS (via Homebrew):** `brew install ffmpeg` | |
| - **Debian/Ubuntu:** `sudo apt-get install ffmpeg` | |
| ### 2. API Keys | |
| You will need active accounts and API keys for the following services: | |
| - **Google Cloud:** | |
| - A Google Cloud Platform project with the **Speech-to-Text API** enabled. | |
| - A service account key file (`.json`). | |
| - **DeepL:** | |
| - A DeepL API plan (the Free plan is sufficient for moderate use). | |
| - **ElevenLabs:** | |
| - An ElevenLabs account and your **Voice ID** for the desired voice. | |
| ## Installation & Setup | |
| 1. **Clone the Repository** | |
| ```bash | |
| git clone <your-repository-url> | |
| cd realtime-translator-webapp # Or your directory name | |
| ``` | |
| 2. **Create a Virtual Environment** | |
| ```bash | |
| python -m venv venv | |
| source venv/bin/activate # On Windows, use `venv\Scripts\activate` | |
| ``` | |
| 3. **Install Dependencies** | |
| Install the Python packages from `requirements.txt`: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 4. **Configure Environment Variables** | |
| Create a file named `.env` in the project root and add your credentials. This file is ignored by Git to keep your keys safe. | |
| ```env | |
| # Path to your Google Cloud service account JSON file | |
| GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/google-credentials.json" | |
| # Your DeepL API Key | |
| DEEPL_API_KEY="YOUR_DEEPL_API_KEY" | |
| # Your ElevenLabs API Key and Voice ID | |
| ELEVENLABS_API_KEY="YOUR_ELEVENLABS_API_KEY" | |
| ELEVENLABS_VOICE_ID="YOUR_ELEVENLABS_VOICE_ID" | |
| ``` | |
| ## Local Usage | |
| 1. **Start the Server** | |
| Run the Uvicorn server from the project root: | |
| ```bash | |
| uvicorn server:app --reload | |
| ``` | |
| 2. **Use the Application** | |
| - Open your web browser and navigate to `http://127.0.0.1:8000`. | |
| - Click the "Start Translation" button. Your browser will ask for microphone permission. | |
| - Speak in either English or French. | |
| - The translated audio will play back automatically. | |
| - Click "Stop Translation" to end the session. | |
| ## Deploying to HuggingFace Spaces | |
| This application is ready to be deployed as a HuggingFace Space. | |
| 1. Create a new Space on HuggingFace, selecting the "Docker" template. | |
| 2. Upload the entire project contents to the Space repository. | |
| 3. In the Space "Settings" tab, add your API keys (`GOOGLE_APPLICATION_CREDENTIALS`, `DEEPL_API_KEY`, `ELEVENLABS_API_KEY`, `ELEVENLABS_VOICE_ID`) as secrets. Make sure to also add your google credentials file. | |
| 4. The Space will automatically build the Docker image and start the application. Your translator will be live! | |