--- title: Realtime Voice Translator emoji: 🗣️ sdk: docker app_port: 7860 --- # Real-Time English/French Voice Translator Web App This project provides a real-time, bidirectional voice translation web application. Speak in English or French into your browser, and hear the translation in the other language almost instantly. It is built to be easily deployed as a HuggingFace Space. It uses a combination of cutting-edge APIs for high-quality speech recognition, translation, and synthesis: - **Speech-to-Text (STT):** Google Cloud Speech-to-Text - **Translation:** DeepL API - **Text-to-Speech (TTS):** ElevenLabs API ## Features - **Web-Based UI:** A simple and clean browser interface for real-time translation. - **Bidirectional Translation:** Simultaneously listens for both English and French and translates to the other language. - **Low Latency:** Built with `asyncio`, WebSockets, and multithreading for a responsive, conversational experience. - **High-Quality Voice:** Leverages ElevenLabs for natural-sounding synthesized speech. - **Echo Suppression:** The translator is smart enough not to translate its own spoken output. ## How It Works The application is composed of a web frontend and a Python backend: 1. **Audio Capture (Frontend):** The browser's JavaScript captures audio from your microphone using the Web Audio API. 2. **WebSocket Streaming:** The audio is chunked and streamed over a WebSocket connection to the FastAPI backend. 3. **Backend Processing:** - The `VoiceTranslator` class receives the audio stream. - The audio is fed into two separate Google Cloud STT streams in parallel (`en-US` and `fr-FR`). - When an STT stream detects a final utterance, it's sent to the DeepL API for translation. - The translated text is sent to the ElevenLabs streaming TTS API. 4. **Audio Playback (Frontend):** The synthesized audio from ElevenLabs is streamed back to the browser through the WebSocket and played instantly. ## Requirements ### 1. Software - Python 3.8+ - `pip` and `venv` - **FFmpeg:** This is a system dependency for audio format conversion. - **macOS (via Homebrew):** `brew install ffmpeg` - **Debian/Ubuntu:** `sudo apt-get install ffmpeg` ### 2. API Keys You will need active accounts and API keys for the following services: - **Google Cloud:** - A Google Cloud Platform project with the **Speech-to-Text API** enabled. - A service account key file (`.json`). - **DeepL:** - A DeepL API plan (the Free plan is sufficient for moderate use). - **ElevenLabs:** - An ElevenLabs account and your **Voice ID** for the desired voice. ## Installation & Setup 1. **Clone the Repository** ```bash git clone cd realtime-translator-webapp # Or your directory name ``` 2. **Create a Virtual Environment** ```bash python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate` ``` 3. **Install Dependencies** Install the Python packages from `requirements.txt`: ```bash pip install -r requirements.txt ``` 4. **Configure Environment Variables** Create a file named `.env` in the project root and add your credentials. This file is ignored by Git to keep your keys safe. ```env # Path to your Google Cloud service account JSON file GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/google-credentials.json" # Your DeepL API Key DEEPL_API_KEY="YOUR_DEEPL_API_KEY" # Your ElevenLabs API Key and Voice ID ELEVENLABS_API_KEY="YOUR_ELEVENLABS_API_KEY" ELEVENLABS_VOICE_ID="YOUR_ELEVENLABS_VOICE_ID" ``` ## Local Usage 1. **Start the Server** Run the Uvicorn server from the project root: ```bash uvicorn server:app --reload ``` 2. **Use the Application** - Open your web browser and navigate to `http://127.0.0.1:8000`. - Click the "Start Translation" button. Your browser will ask for microphone permission. - Speak in either English or French. - The translated audio will play back automatically. - Click "Stop Translation" to end the session. ## Deploying to HuggingFace Spaces This application is ready to be deployed as a HuggingFace Space. 1. Create a new Space on HuggingFace, selecting the "Docker" template. 2. Upload the entire project contents to the Space repository. 3. In the Space "Settings" tab, add your API keys (`GOOGLE_APPLICATION_CREDENTIALS`, `DEEPL_API_KEY`, `ELEVENLABS_API_KEY`, `ELEVENLABS_VOICE_ID`) as secrets. Make sure to also add your google credentials file. 4. The Space will automatically build the Docker image and start the application. Your translator will be live!