Spaces:

majweldon
/

RealtimeTranslator

Sleeping

App Files Files Community

RealtimeTranslator / README.md

Mike W

Fix: Initial runtime errors with integration

2e0855f 4 months ago

preview code

raw

history blame contribute delete

4.67 kB

metadata

title: Realtime Voice Translator
emoji: 🗣️
sdk: docker
app_port: 7860

Real-Time English/French Voice Translator Web App

This project provides a real-time, bidirectional voice translation web application. Speak in English or French into your browser, and hear the translation in the other language almost instantly.

It is built to be easily deployed as a HuggingFace Space.

It uses a combination of cutting-edge APIs for high-quality speech recognition, translation, and synthesis:

Speech-to-Text (STT): Google Cloud Speech-to-Text
Translation: DeepL API
Text-to-Speech (TTS): ElevenLabs API

Features

Web-Based UI: A simple and clean browser interface for real-time translation.
Bidirectional Translation: Simultaneously listens for both English and French and translates to the other language.
Low Latency: Built with asyncio, WebSockets, and multithreading for a responsive, conversational experience.
High-Quality Voice: Leverages ElevenLabs for natural-sounding synthesized speech.
Echo Suppression: The translator is smart enough not to translate its own spoken output.

How It Works

The application is composed of a web frontend and a Python backend:

Audio Capture (Frontend): The browser's JavaScript captures audio from your microphone using the Web Audio API.
WebSocket Streaming: The audio is chunked and streamed over a WebSocket connection to the FastAPI backend.
Backend Processing:
- The VoiceTranslator class receives the audio stream.
- The audio is fed into two separate Google Cloud STT streams in parallel (en-US and fr-FR).
- When an STT stream detects a final utterance, it's sent to the DeepL API for translation.
- The translated text is sent to the ElevenLabs streaming TTS API.
Audio Playback (Frontend): The synthesized audio from ElevenLabs is streamed back to the browser through the WebSocket and played instantly.

Requirements

1. Software

Python 3.8+
pip and venv
FFmpeg: This is a system dependency for audio format conversion.
- macOS (via Homebrew): brew install ffmpeg
- Debian/Ubuntu: sudo apt-get install ffmpeg

2. API Keys

You will need active accounts and API keys for the following services:

Google Cloud:
- A Google Cloud Platform project with the Speech-to-Text API enabled.
- A service account key file (.json).
DeepL:
- A DeepL API plan (the Free plan is sufficient for moderate use).
ElevenLabs:
- An ElevenLabs account and your Voice ID for the desired voice.

Installation & Setup

Clone the Repository

git clone <your-repository-url>
cd realtime-translator-webapp # Or your directory name

Create a Virtual Environment

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Install Dependencies Install the Python packages from requirements.txt:
```
pip install -r requirements.txt
```

Configure Environment Variables Create a file named .env in the project root and add your credentials. This file is ignored by Git to keep your keys safe.

# Path to your Google Cloud service account JSON file
GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/google-credentials.json"

# Your DeepL API Key
DEEPL_API_KEY="YOUR_DEEPL_API_KEY"

# Your ElevenLabs API Key and Voice ID
ELEVENLABS_API_KEY="YOUR_ELEVENLABS_API_KEY"
ELEVENLABS_VOICE_ID="YOUR_ELEVENLABS_VOICE_ID"

Local Usage

Start the Server Run the Uvicorn server from the project root:
```
uvicorn server:app --reload
```
Use the Application
- Open your web browser and navigate to http://127.0.0.1:8000.
- Click the "Start Translation" button. Your browser will ask for microphone permission.
- Speak in either English or French.
- The translated audio will play back automatically.
- Click "Stop Translation" to end the session.

Deploying to HuggingFace Spaces

This application is ready to be deployed as a HuggingFace Space.

Create a new Space on HuggingFace, selecting the "Docker" template.
Upload the entire project contents to the Space repository.
In the Space "Settings" tab, add your API keys (GOOGLE_APPLICATION_CREDENTIALS, DEEPL_API_KEY, ELEVENLABS_API_KEY, ELEVENLABS_VOICE_ID) as secrets. Make sure to also add your google credentials file.
The Space will automatically build the Docker image and start the application. Your translator will be live!