Spaces:
Sleeping
title: Realtime Voice Translator
emoji: 🗣️
sdk: docker
app_port: 7860
Real-Time English/French Voice Translator Web App
This project provides a real-time, bidirectional voice translation web application. Speak in English or French into your browser, and hear the translation in the other language almost instantly.
It is built to be easily deployed as a HuggingFace Space.
It uses a combination of cutting-edge APIs for high-quality speech recognition, translation, and synthesis:
- Speech-to-Text (STT): Google Cloud Speech-to-Text
- Translation: DeepL API
- Text-to-Speech (TTS): ElevenLabs API
Features
- Web-Based UI: A simple and clean browser interface for real-time translation.
- Bidirectional Translation: Simultaneously listens for both English and French and translates to the other language.
- Low Latency: Built with
asyncio, WebSockets, and multithreading for a responsive, conversational experience. - High-Quality Voice: Leverages ElevenLabs for natural-sounding synthesized speech.
- Echo Suppression: The translator is smart enough not to translate its own spoken output.
How It Works
The application is composed of a web frontend and a Python backend:
- Audio Capture (Frontend): The browser's JavaScript captures audio from your microphone using the Web Audio API.
- WebSocket Streaming: The audio is chunked and streamed over a WebSocket connection to the FastAPI backend.
- Backend Processing:
- The
VoiceTranslatorclass receives the audio stream. - The audio is fed into two separate Google Cloud STT streams in parallel (
en-USandfr-FR). - When an STT stream detects a final utterance, it's sent to the DeepL API for translation.
- The translated text is sent to the ElevenLabs streaming TTS API.
- The
- Audio Playback (Frontend): The synthesized audio from ElevenLabs is streamed back to the browser through the WebSocket and played instantly.
Requirements
1. Software
- Python 3.8+
pipandvenv- FFmpeg: This is a system dependency for audio format conversion.
- macOS (via Homebrew):
brew install ffmpeg - Debian/Ubuntu:
sudo apt-get install ffmpeg
- macOS (via Homebrew):
2. API Keys
You will need active accounts and API keys for the following services:
- Google Cloud:
- A Google Cloud Platform project with the Speech-to-Text API enabled.
- A service account key file (
.json).
- DeepL:
- A DeepL API plan (the Free plan is sufficient for moderate use).
- ElevenLabs:
- An ElevenLabs account and your Voice ID for the desired voice.
Installation & Setup
Clone the Repository
git clone <your-repository-url> cd realtime-translator-webapp # Or your directory nameCreate a Virtual Environment
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`Install Dependencies Install the Python packages from
requirements.txt:pip install -r requirements.txtConfigure Environment Variables Create a file named
.envin the project root and add your credentials. This file is ignored by Git to keep your keys safe.# Path to your Google Cloud service account JSON file GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/google-credentials.json" # Your DeepL API Key DEEPL_API_KEY="YOUR_DEEPL_API_KEY" # Your ElevenLabs API Key and Voice ID ELEVENLABS_API_KEY="YOUR_ELEVENLABS_API_KEY" ELEVENLABS_VOICE_ID="YOUR_ELEVENLABS_VOICE_ID"
Local Usage
Start the Server Run the Uvicorn server from the project root:
uvicorn server:app --reloadUse the Application
- Open your web browser and navigate to
http://127.0.0.1:8000. - Click the "Start Translation" button. Your browser will ask for microphone permission.
- Speak in either English or French.
- The translated audio will play back automatically.
- Click "Stop Translation" to end the session.
- Open your web browser and navigate to
Deploying to HuggingFace Spaces
This application is ready to be deployed as a HuggingFace Space.
- Create a new Space on HuggingFace, selecting the "Docker" template.
- Upload the entire project contents to the Space repository.
- In the Space "Settings" tab, add your API keys (
GOOGLE_APPLICATION_CREDENTIALS,DEEPL_API_KEY,ELEVENLABS_API_KEY,ELEVENLABS_VOICE_ID) as secrets. Make sure to also add your google credentials file. - The Space will automatically build the Docker image and start the application. Your translator will be live!