Spaces:

majweldon
/

RealtimeTranslator

Sleeping

App Files Files Community

RealtimeTranslator / README.md

Mike W

Fix: Initial runtime errors with integration

2e0855f 4 months ago

preview code

raw

history blame contribute delete

4.67 kB

	---
	title: Realtime Voice Translator
	emoji: 🗣️
	sdk: docker
	app_port: 7860
	---

	# Real-Time English/French Voice Translator Web App

	This project provides a real-time, bidirectional voice translation web application. Speak in English or French into your browser, and hear the translation in the other language almost instantly.

	It is built to be easily deployed as a HuggingFace Space.

	It uses a combination of cutting-edge APIs for high-quality speech recognition, translation, and synthesis:

	- Speech-to-Text (STT): Google Cloud Speech-to-Text
	- Translation: DeepL API
	- Text-to-Speech (TTS): ElevenLabs API

	## Features

	- Web-Based UI: A simple and clean browser interface for real-time translation.
	- Bidirectional Translation: Simultaneously listens for both English and French and translates to the other language.
	- Low Latency: Built with `asyncio`, WebSockets, and multithreading for a responsive, conversational experience.
	- High-Quality Voice: Leverages ElevenLabs for natural-sounding synthesized speech.
	- Echo Suppression: The translator is smart enough not to translate its own spoken output.

	## How It Works

	The application is composed of a web frontend and a Python backend:

	1. Audio Capture (Frontend): The browser's JavaScript captures audio from your microphone using the Web Audio API.
	2. WebSocket Streaming: The audio is chunked and streamed over a WebSocket connection to the FastAPI backend.
	3. Backend Processing:
	- The `VoiceTranslator` class receives the audio stream.
	- The audio is fed into two separate Google Cloud STT streams in parallel (`en-US` and `fr-FR`).
	- When an STT stream detects a final utterance, it's sent to the DeepL API for translation.
	- The translated text is sent to the ElevenLabs streaming TTS API.
	4. Audio Playback (Frontend): The synthesized audio from ElevenLabs is streamed back to the browser through the WebSocket and played instantly.

	## Requirements

	### 1. Software
	- Python 3.8+
	- `pip` and `venv`
	- FFmpeg: This is a system dependency for audio format conversion.
	- macOS (via Homebrew): `brew install ffmpeg`
	- Debian/Ubuntu: `sudo apt-get install ffmpeg`

	### 2. API Keys
	You will need active accounts and API keys for the following services:

	- Google Cloud:
	- A Google Cloud Platform project with the Speech-to-Text API enabled.
	- A service account key file (`.json`).
	- DeepL:
	- A DeepL API plan (the Free plan is sufficient for moderate use).
	- ElevenLabs:
	- An ElevenLabs account and your Voice ID for the desired voice.

	## Installation & Setup

	1. Clone the Repository
	```bash
	git clone <your-repository-url>
	cd realtime-translator-webapp # Or your directory name
	```

	2. Create a Virtual Environment
	```bash
	python -m venv venv
	source venv/bin/activate # On Windows, use `venv\Scripts\activate`
	```

	3. Install Dependencies
	Install the Python packages from `requirements.txt`:
	```bash
	pip install -r requirements.txt
	```

	4. Configure Environment Variables
	Create a file named `.env` in the project root and add your credentials. This file is ignored by Git to keep your keys safe.

	```env
	# Path to your Google Cloud service account JSON file
	GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/google-credentials.json"

	# Your DeepL API Key
	DEEPL_API_KEY="YOUR_DEEPL_API_KEY"

	# Your ElevenLabs API Key and Voice ID
	ELEVENLABS_API_KEY="YOUR_ELEVENLABS_API_KEY"
	ELEVENLABS_VOICE_ID="YOUR_ELEVENLABS_VOICE_ID"
	```

	## Local Usage

	1. Start the Server
	Run the Uvicorn server from the project root:
	```bash
	uvicorn server:app --reload
	```

	2. Use the Application
	- Open your web browser and navigate to `http://127.0.0.1:8000`.
	- Click the "Start Translation" button. Your browser will ask for microphone permission.
	- Speak in either English or French.
	- The translated audio will play back automatically.
	- Click "Stop Translation" to end the session.

	## Deploying to HuggingFace Spaces

	This application is ready to be deployed as a HuggingFace Space.

	1. Create a new Space on HuggingFace, selecting the "Docker" template.
	2. Upload the entire project contents to the Space repository.
	3. In the Space "Settings" tab, add your API keys (`GOOGLE_APPLICATION_CREDENTIALS`, `DEEPL_API_KEY`, `ELEVENLABS_API_KEY`, `ELEVENLABS_VOICE_ID`) as secrets. Make sure to also add your google credentials file.
	4. The Space will automatically build the Docker image and start the application. Your translator will be live!