Spaces:
Sleeping
Sleeping
Mike W commited on
Commit ·
ed699be
0
Parent(s):
Initial commit of real-time voice translator
Browse files- .gitignore +9 -0
- README.md +112 -0
- app.py +579 -0
- app_backup.py +597 -0
- checkpoint_nov2.py +570 -0
- mic_check.py +5 -0
- requirements.txt +5 -0
- working.py +334 -0
.gitignore
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Environment variables
|
| 2 |
+
.env
|
| 3 |
+
|
| 4 |
+
# Python virtual environment
|
| 5 |
+
venv/
|
| 6 |
+
|
| 7 |
+
# Python cache
|
| 8 |
+
__pycache__/
|
| 9 |
+
*.pyc
|
README.md
ADDED
|
@@ -0,0 +1,112 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Real-Time English/French Voice Translator
|
| 2 |
+
|
| 3 |
+
This project provides a real-time, bidirectional voice translation tool that runs in your terminal. Speak in English or French, and hear the translation in the other language almost instantly.
|
| 4 |
+
|
| 5 |
+
It uses a combination of cutting-edge APIs for high-quality speech recognition, translation, and synthesis:
|
| 6 |
+
|
| 7 |
+
- **Speech-to-Text (STT):** Google Cloud Speech-to-Text
|
| 8 |
+
- **Translation:** DeepL API
|
| 9 |
+
- **Text-to-Speech (TTS):** ElevenLabs API
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
*(Note: You can replace this with a real GIF of the application in action.)*
|
| 13 |
+
|
| 14 |
+
## Features
|
| 15 |
+
|
| 16 |
+
- **Bidirectional Translation:** Simultaneously listens for both English and French and translates to the other language.
|
| 17 |
+
- **Low Latency:** Built with `asyncio` and multithreading for a responsive, conversational experience.
|
| 18 |
+
- **High-Quality Voice:** Leverages ElevenLabs for natural-sounding synthesized speech.
|
| 19 |
+
- **Echo Suppression:** The translator is smart enough not to translate its own spoken output.
|
| 20 |
+
- **Robust Streaming:** Automatically manages and restarts API connections to handle pauses in conversation.
|
| 21 |
+
- **Simple CLI:** Easy to start and stop from the command line.
|
| 22 |
+
|
| 23 |
+
## How It Works
|
| 24 |
+
|
| 25 |
+
The application orchestrates several processes concurrently:
|
| 26 |
+
|
| 27 |
+
1. **Audio Capture:** A dedicated thread captures audio from your default microphone.
|
| 28 |
+
2. **Dual STT Streams:** The captured audio is fed into two separate Google Cloud STT streams in parallel: one configured for `en-US` and one for `fr-FR`.
|
| 29 |
+
3. **Transcription & Translation:** When either STT stream detects a final utterance, it's sent to the DeepL API for translation.
|
| 30 |
+
4. **Speech Synthesis:** The translated text is sent to the ElevenLabs streaming TTS API.
|
| 31 |
+
5. **Audio Playback:** The synthesized audio is played back through your speakers.
|
| 32 |
+
|
| 33 |
+
To prevent the system from re-translating its own output, the application pauses microphone processing during TTS playback and intelligently discards any recognized text that matches its last-spoken phrase.
|
| 34 |
+
|
| 35 |
+
## Requirements
|
| 36 |
+
|
| 37 |
+
### 1. Software
|
| 38 |
+
- Python 3.8+
|
| 39 |
+
- `pip` and `venv`
|
| 40 |
+
- **PortAudio:** This is a dependency for the `pyaudio` library.
|
| 41 |
+
- **macOS (via Homebrew):** `brew install portaudio`
|
| 42 |
+
- **Debian/Ubuntu:** `sudo apt-get install portaudio19-dev`
|
| 43 |
+
- **Windows:** `pyaudio` can often be installed via `pip` without manual PortAudio installation.
|
| 44 |
+
|
| 45 |
+
### 2. API Keys
|
| 46 |
+
You will need active accounts and API keys for the following services:
|
| 47 |
+
|
| 48 |
+
- **Google Cloud:**
|
| 49 |
+
- A Google Cloud Platform project with the **Speech-to-Text API** enabled.
|
| 50 |
+
- A service account key file (`.json`).
|
| 51 |
+
- **DeepL:**
|
| 52 |
+
- A DeepL API plan (the Free plan is sufficient for moderate use).
|
| 53 |
+
- **ElevenLabs:**
|
| 54 |
+
- An ElevenLabs account. You will also need your **Voice ID** for the voice you wish to use.
|
| 55 |
+
|
| 56 |
+
## Installation & Setup
|
| 57 |
+
|
| 58 |
+
1. **Clone the Repository**
|
| 59 |
+
```bash
|
| 60 |
+
git clone <your-repository-url>
|
| 61 |
+
cd realtime-translator
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
2. **Create a Virtual Environment**
|
| 65 |
+
```bash
|
| 66 |
+
python -m venv venv
|
| 67 |
+
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
3. **Install Dependencies**
|
| 71 |
+
Create a `requirements.txt` file with the following content:
|
| 72 |
+
```
|
| 73 |
+
pyaudio
|
| 74 |
+
websockets
|
| 75 |
+
google-cloud-speech
|
| 76 |
+
deepl
|
| 77 |
+
python-dotenv
|
| 78 |
+
```
|
| 79 |
+
Then, install the packages:
|
| 80 |
+
```bash
|
| 81 |
+
pip install -r requirements.txt
|
| 82 |
+
```
|
| 83 |
+
|
| 84 |
+
4. **Configure Environment Variables**
|
| 85 |
+
Create a file named `.env` in the project root and add your credentials. This file is ignored by Git to keep your keys safe.
|
| 86 |
+
|
| 87 |
+
```env
|
| 88 |
+
# Path to your Google Cloud service account JSON file
|
| 89 |
+
GOOGLE_APPLICATION_CREDENTIALS="C:/path/to/your/google-credentials.json"
|
| 90 |
+
|
| 91 |
+
# Your DeepL API Key
|
| 92 |
+
DEEPL_API_KEY="YOUR_DEEPL_API_KEY"
|
| 93 |
+
|
| 94 |
+
# Your ElevenLabs API Key and Voice ID
|
| 95 |
+
ELEVENLABS_API_KEY="YOUR_ELEVENLABS_API_KEY"
|
| 96 |
+
ELEVENLABS_VOICE_ID="YOUR_ELEVENLABS_VOICE_ID"
|
| 97 |
+
```
|
| 98 |
+
|
| 99 |
+
## Usage
|
| 100 |
+
|
| 101 |
+
Once set up, run the main application script:
|
| 102 |
+
|
| 103 |
+
```bash
|
| 104 |
+
python app.py
|
| 105 |
+
```
|
| 106 |
+
|
| 107 |
+
The application will prompt you to press `ENTER` to start and stop the translation session.
|
| 108 |
+
|
| 109 |
+
- Press `ENTER` to start recording.
|
| 110 |
+
- Speak in either English or French.
|
| 111 |
+
- Press `ENTER` again to stop the session.
|
| 112 |
+
- Press `Ctrl+C` to quit the application gracefully.
|
app.py
ADDED
|
@@ -0,0 +1,579 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Real-Time French/English Voice Translator — cleaned version
|
| 4 |
+
|
| 5 |
+
Fixes applied:
|
| 6 |
+
- Fixed TTS echo caused by double-writing audio chunks
|
| 7 |
+
- Removed prebuffer re-injection that could cause echoes
|
| 8 |
+
- Added empty transcript filtering
|
| 9 |
+
- Added within-stream deduplication
|
| 10 |
+
- Removed unnecessary sleeps (reduced latency by ~900ms)
|
| 11 |
+
- Reduced TTS prebuffer from 1s to 0.5s for faster playback start
|
| 12 |
+
- Cleaned up diagnostic logging
|
| 13 |
+
|
| 14 |
+
Keep your env vars:
|
| 15 |
+
- GOOGLE_APPLICATION_CREDENTIALS, DEEPL_API_KEY, ELEVENLABS_API_KEY, ELEVENLABS_VOICE_ID
|
| 16 |
+
"""
|
| 17 |
+
|
| 18 |
+
import asyncio
|
| 19 |
+
import json
|
| 20 |
+
import queue
|
| 21 |
+
import threading
|
| 22 |
+
import time
|
| 23 |
+
import os
|
| 24 |
+
import base64
|
| 25 |
+
from collections import deque
|
| 26 |
+
from typing import Dict, Optional
|
| 27 |
+
|
| 28 |
+
import pyaudio
|
| 29 |
+
import websockets
|
| 30 |
+
from google.cloud import speech
|
| 31 |
+
import deepl
|
| 32 |
+
from dotenv import load_dotenv
|
| 33 |
+
|
| 34 |
+
# -----------------------------------------------------------------------------
|
| 35 |
+
# VoiceTranslator
|
| 36 |
+
# -----------------------------------------------------------------------------
|
| 37 |
+
class VoiceTranslator:
|
| 38 |
+
def __init__(self, deepl_api_key: str, elevenlabs_api_key: str, elevenlabs_voice_id: str):
|
| 39 |
+
# External clients
|
| 40 |
+
self.deepl_client = deepl.Translator(deepl_api_key)
|
| 41 |
+
self.elevenlabs_api_key = elevenlabs_api_key
|
| 42 |
+
self.voice_id = elevenlabs_voice_id
|
| 43 |
+
self.stt_client = speech.SpeechClient()
|
| 44 |
+
|
| 45 |
+
# Audio params
|
| 46 |
+
self.audio_rate = 16000
|
| 47 |
+
self.audio_chunk = 1024
|
| 48 |
+
|
| 49 |
+
# Per-language audio queues (raw mic frames)
|
| 50 |
+
self.lang_queues: Dict[str, queue.Queue] = {
|
| 51 |
+
"en-US": queue.Queue(),
|
| 52 |
+
"fr-FR": queue.Queue(),
|
| 53 |
+
}
|
| 54 |
+
|
| 55 |
+
# Small rolling prebuffer to avoid missing the first bits after a restart
|
| 56 |
+
self.prebuffer = deque(maxlen=12)
|
| 57 |
+
|
| 58 |
+
# State flags
|
| 59 |
+
self.is_recording = False
|
| 60 |
+
self.is_speaking = False
|
| 61 |
+
self.speaking_event = threading.Event()
|
| 62 |
+
|
| 63 |
+
# Deduplication
|
| 64 |
+
self.last_processed_transcript = ""
|
| 65 |
+
self.last_tts_text_en = ""
|
| 66 |
+
self.last_tts_text_fr = ""
|
| 67 |
+
|
| 68 |
+
# Threshold
|
| 69 |
+
self.min_confidence_threshold = 0.5
|
| 70 |
+
|
| 71 |
+
# PyAudio
|
| 72 |
+
self.pyaudio_instance = pyaudio.PyAudio()
|
| 73 |
+
self.audio_stream = None
|
| 74 |
+
|
| 75 |
+
# Threads + async
|
| 76 |
+
self.recording_thread: Optional[threading.Thread] = None
|
| 77 |
+
self.async_loop = asyncio.new_event_loop()
|
| 78 |
+
|
| 79 |
+
# TTS queue + consumer task
|
| 80 |
+
self._tts_queue: "asyncio.Queue[Optional[dict]]" = asyncio.Queue()
|
| 81 |
+
self._tts_consumer_task: Optional[asyncio.Task] = None
|
| 82 |
+
|
| 83 |
+
# Start async loop in separate thread
|
| 84 |
+
self.async_thread = threading.Thread(target=self._run_async_loop, daemon=True)
|
| 85 |
+
self.async_thread.start()
|
| 86 |
+
|
| 87 |
+
# schedule tts consumer creation inside the async loop
|
| 88 |
+
def _start_consumer():
|
| 89 |
+
self._tts_consumer_task = asyncio.create_task(self._tts_consumer())
|
| 90 |
+
self.async_loop.call_soon_threadsafe(_start_consumer)
|
| 91 |
+
|
| 92 |
+
self.stt_threads: Dict[str, threading.Thread] = {}
|
| 93 |
+
|
| 94 |
+
# Per-language restart events (used to tell threads when to start new streams)
|
| 95 |
+
self.restart_events: Dict[str, threading.Event] = {
|
| 96 |
+
"en-US": threading.Event(),
|
| 97 |
+
"fr-FR": threading.Event(),
|
| 98 |
+
}
|
| 99 |
+
|
| 100 |
+
# Per-language stream started flag
|
| 101 |
+
self._stream_started = {"en-US": False, "fr-FR": False}
|
| 102 |
+
|
| 103 |
+
# Per-language cancel events to force request_generator to stop
|
| 104 |
+
self.stream_cancel_events: Dict[str, threading.Event] = {
|
| 105 |
+
"en-US": threading.Event(),
|
| 106 |
+
"fr-FR": threading.Event(),
|
| 107 |
+
}
|
| 108 |
+
|
| 109 |
+
# Diagnostics
|
| 110 |
+
self._tts_job_counter = 0
|
| 111 |
+
|
| 112 |
+
def _run_async_loop(self):
|
| 113 |
+
asyncio.set_event_loop(self.async_loop)
|
| 114 |
+
try:
|
| 115 |
+
self.async_loop.run_forever()
|
| 116 |
+
except Exception as e:
|
| 117 |
+
print("[async_loop] stopped with error:", e)
|
| 118 |
+
|
| 119 |
+
# ---------------------------
|
| 120 |
+
# Audio capture
|
| 121 |
+
# ---------------------------
|
| 122 |
+
def _record_audio(self):
|
| 123 |
+
try:
|
| 124 |
+
stream = self.pyaudio_instance.open(
|
| 125 |
+
format=pyaudio.paInt16,
|
| 126 |
+
channels=1,
|
| 127 |
+
rate=self.audio_rate,
|
| 128 |
+
input=True,
|
| 129 |
+
frames_per_buffer=self.audio_chunk,
|
| 130 |
+
)
|
| 131 |
+
print("🎤 Recording started...")
|
| 132 |
+
|
| 133 |
+
while self.is_recording:
|
| 134 |
+
if self.speaking_event.is_set():
|
| 135 |
+
time.sleep(0.01)
|
| 136 |
+
continue
|
| 137 |
+
|
| 138 |
+
try:
|
| 139 |
+
data = stream.read(self.audio_chunk, exception_on_overflow=False)
|
| 140 |
+
except Exception as e:
|
| 141 |
+
print(f"[recorder] read error: {e}")
|
| 142 |
+
continue
|
| 143 |
+
|
| 144 |
+
if not data:
|
| 145 |
+
continue
|
| 146 |
+
|
| 147 |
+
self.prebuffer.append(data)
|
| 148 |
+
self.lang_queues["en-US"].put(data)
|
| 149 |
+
self.lang_queues["fr-FR"].put(data)
|
| 150 |
+
|
| 151 |
+
try:
|
| 152 |
+
stream.stop_stream()
|
| 153 |
+
stream.close()
|
| 154 |
+
except Exception:
|
| 155 |
+
pass
|
| 156 |
+
print("🎤 Recording stopped.")
|
| 157 |
+
except Exception as e:
|
| 158 |
+
print(f"[recorder] fatal: {e}")
|
| 159 |
+
|
| 160 |
+
# ---------------------------
|
| 161 |
+
# TTS streaming (ElevenLabs) - async
|
| 162 |
+
# ---------------------------
|
| 163 |
+
async def _stream_tts(self, text: str):
|
| 164 |
+
uri = (
|
| 165 |
+
f"wss://api.elevenlabs.io/v1/text-to-speech/{self.voice_id}"
|
| 166 |
+
f"/stream-input?model_id=eleven_flash_v2_5&output_format=pcm_16000"
|
| 167 |
+
)
|
| 168 |
+
tts_audio_stream = None
|
| 169 |
+
websocket = None
|
| 170 |
+
try:
|
| 171 |
+
# Mark speaking and set event so recorder & STT pause
|
| 172 |
+
self.is_speaking = True
|
| 173 |
+
self.speaking_event.set()
|
| 174 |
+
|
| 175 |
+
# Clear prebuffer to avoid re-injecting TTS audio later
|
| 176 |
+
self.prebuffer.clear()
|
| 177 |
+
|
| 178 |
+
# Clear queued frames to avoid replay
|
| 179 |
+
for q in self.lang_queues.values():
|
| 180 |
+
with q.mutex:
|
| 181 |
+
q.queue.clear()
|
| 182 |
+
|
| 183 |
+
websocket = await websockets.connect(uri)
|
| 184 |
+
await websocket.send(json.dumps({
|
| 185 |
+
"text": " ",
|
| 186 |
+
"voice_settings": {"stability": 0.5, "similarity_boost": 0.8},
|
| 187 |
+
"xi_api_key": self.elevenlabs_api_key,
|
| 188 |
+
}))
|
| 189 |
+
await websocket.send(json.dumps({"text": text, "try_trigger_generation": True}))
|
| 190 |
+
await websocket.send(json.dumps({"text": ""}))
|
| 191 |
+
|
| 192 |
+
tts_audio_stream = self.pyaudio_instance.open(
|
| 193 |
+
format=pyaudio.paInt16,
|
| 194 |
+
channels=1,
|
| 195 |
+
rate=16000,
|
| 196 |
+
output=True,
|
| 197 |
+
frames_per_buffer=1024,
|
| 198 |
+
)
|
| 199 |
+
|
| 200 |
+
prebuffer = bytearray()
|
| 201 |
+
playback_started = False
|
| 202 |
+
|
| 203 |
+
try:
|
| 204 |
+
while True:
|
| 205 |
+
try:
|
| 206 |
+
message = await asyncio.wait_for(websocket.recv(), timeout=8.0)
|
| 207 |
+
except asyncio.TimeoutError:
|
| 208 |
+
if playback_started:
|
| 209 |
+
break
|
| 210 |
+
else:
|
| 211 |
+
continue
|
| 212 |
+
|
| 213 |
+
if isinstance(message, bytes):
|
| 214 |
+
if not playback_started:
|
| 215 |
+
prebuffer.extend(message)
|
| 216 |
+
if len(prebuffer) >= 8000:
|
| 217 |
+
tts_audio_stream.write(bytes(prebuffer))
|
| 218 |
+
prebuffer.clear()
|
| 219 |
+
playback_started = True
|
| 220 |
+
else:
|
| 221 |
+
tts_audio_stream.write(message)
|
| 222 |
+
continue
|
| 223 |
+
|
| 224 |
+
try:
|
| 225 |
+
data = json.loads(message)
|
| 226 |
+
except Exception:
|
| 227 |
+
continue
|
| 228 |
+
|
| 229 |
+
if data.get("audio"):
|
| 230 |
+
audio_bytes = base64.b64decode(data["audio"])
|
| 231 |
+
if not playback_started:
|
| 232 |
+
prebuffer.extend(audio_bytes)
|
| 233 |
+
if len(prebuffer) >= 8000:
|
| 234 |
+
tts_audio_stream.write(bytes(prebuffer))
|
| 235 |
+
prebuffer.clear()
|
| 236 |
+
playback_started = True
|
| 237 |
+
else:
|
| 238 |
+
tts_audio_stream.write(audio_bytes)
|
| 239 |
+
elif data.get("isFinal"):
|
| 240 |
+
break
|
| 241 |
+
elif data.get("error"):
|
| 242 |
+
print("TTS error:", data["error"])
|
| 243 |
+
break
|
| 244 |
+
|
| 245 |
+
# Handle case where playback never started (very short audio)
|
| 246 |
+
if prebuffer and not playback_started:
|
| 247 |
+
tts_audio_stream.write(bytes(prebuffer))
|
| 248 |
+
|
| 249 |
+
finally:
|
| 250 |
+
try:
|
| 251 |
+
await websocket.close()
|
| 252 |
+
except Exception:
|
| 253 |
+
pass
|
| 254 |
+
|
| 255 |
+
except Exception as e:
|
| 256 |
+
pass
|
| 257 |
+
finally:
|
| 258 |
+
if tts_audio_stream:
|
| 259 |
+
try:
|
| 260 |
+
tts_audio_stream.stop_stream()
|
| 261 |
+
tts_audio_stream.close()
|
| 262 |
+
except Exception:
|
| 263 |
+
pass
|
| 264 |
+
|
| 265 |
+
# Force the STT request generators to exit by setting cancel events
|
| 266 |
+
for lang, ev in self.stream_cancel_events.items():
|
| 267 |
+
ev.set()
|
| 268 |
+
|
| 269 |
+
# Don't re-inject prebuffer - just clear the queues and let fresh audio come in
|
| 270 |
+
for q in self.lang_queues.values():
|
| 271 |
+
with q.mutex:
|
| 272 |
+
q.queue.clear()
|
| 273 |
+
|
| 274 |
+
# Clear speaking state and signal STT threads to restart
|
| 275 |
+
self.is_speaking = False
|
| 276 |
+
self.speaking_event.clear()
|
| 277 |
+
|
| 278 |
+
# Signal restart for both language streams
|
| 279 |
+
for lang, ev in self.restart_events.items():
|
| 280 |
+
ev.set()
|
| 281 |
+
|
| 282 |
+
await asyncio.sleep(0.1)
|
| 283 |
+
|
| 284 |
+
# ---------------------------
|
| 285 |
+
# TTS consumer (serializes TTS)
|
| 286 |
+
# ---------------------------
|
| 287 |
+
async def _tts_consumer(self):
|
| 288 |
+
print("[tts_consumer] started")
|
| 289 |
+
while True:
|
| 290 |
+
item = await self._tts_queue.get()
|
| 291 |
+
if item is None:
|
| 292 |
+
print("[tts_consumer] shutdown sentinel received")
|
| 293 |
+
break
|
| 294 |
+
text = item.get("text", "")
|
| 295 |
+
self._tts_job_counter += 1
|
| 296 |
+
job_id = self._tts_job_counter
|
| 297 |
+
print(f"[tts_consumer] job #{job_id} dequeued (len={len(text)})")
|
| 298 |
+
try:
|
| 299 |
+
await asyncio.wait_for(self._stream_tts(text), timeout=35.0)
|
| 300 |
+
except asyncio.TimeoutError:
|
| 301 |
+
print(f"[tts_consumer] job #{job_id} _stream_tts timed out; proceeding.")
|
| 302 |
+
except Exception as e:
|
| 303 |
+
print(f"[tts_consumer] job #{job_id} error during _stream_tts: {e}")
|
| 304 |
+
finally:
|
| 305 |
+
await asyncio.sleep(0.05)
|
| 306 |
+
print("[tts_consumer] exiting")
|
| 307 |
+
|
| 308 |
+
# ---------------------------
|
| 309 |
+
# Translation & TTS trigger
|
| 310 |
+
# ---------------------------
|
| 311 |
+
async def _process_result(self, transcript: str, confidence: float, language: str):
|
| 312 |
+
lang_flag = "🇫🇷" if language == "fr-FR" else "🇬🇧"
|
| 313 |
+
print(f"{lang_flag} Heard ({language}, conf {confidence:.2f}): {transcript}")
|
| 314 |
+
|
| 315 |
+
# echo suppression vs last TTS in same language
|
| 316 |
+
if language == "fr-FR":
|
| 317 |
+
if transcript.strip().lower() == self.last_tts_text_fr.strip().lower():
|
| 318 |
+
print(" (echo suppressed)")
|
| 319 |
+
return
|
| 320 |
+
else:
|
| 321 |
+
if transcript.strip().lower() == self.last_tts_text_en.strip().lower():
|
| 322 |
+
print(" (echo suppressed)")
|
| 323 |
+
return
|
| 324 |
+
|
| 325 |
+
try:
|
| 326 |
+
if language == "fr-FR":
|
| 327 |
+
translated = self.deepl_client.translate_text(transcript, target_lang="EN-US").text
|
| 328 |
+
print(f"🌐 FR → EN: {translated}")
|
| 329 |
+
await self._tts_queue.put({"text": translated, "source_lang": language})
|
| 330 |
+
self.last_tts_text_en = translated
|
| 331 |
+
else:
|
| 332 |
+
translated = self.deepl_client.translate_text(transcript, target_lang="FR").text
|
| 333 |
+
print(f"🌐 EN → FR: {translated}")
|
| 334 |
+
await self._tts_queue.put({"text": translated, "source_lang": language})
|
| 335 |
+
self.last_tts_text_fr = translated
|
| 336 |
+
print("🔊 Queued for speaking...")
|
| 337 |
+
except Exception as e:
|
| 338 |
+
print(f"Translation error: {e}")
|
| 339 |
+
|
| 340 |
+
# ---------------------------
|
| 341 |
+
# STT streaming (run per language)
|
| 342 |
+
# ---------------------------
|
| 343 |
+
def _run_stt_stream(self, language: str):
|
| 344 |
+
print(f"[stt:{language}] Thread starting, thread_id={threading.get_ident()}")
|
| 345 |
+
self._stream_started[language] = False
|
| 346 |
+
last_transcript_in_stream = ""
|
| 347 |
+
|
| 348 |
+
while self.is_recording:
|
| 349 |
+
try:
|
| 350 |
+
if self._stream_started[language]:
|
| 351 |
+
print(f"[{time.strftime('%H:%M:%S')}] [stt:{language}] Waiting for restart signal...")
|
| 352 |
+
signaled = self.restart_events[language].wait(timeout=30)
|
| 353 |
+
if not signaled and self.is_recording:
|
| 354 |
+
print(f"[{time.strftime('%H:%M:%S')}] [stt:{language}] Timeout waiting for restart, restarting anyway")
|
| 355 |
+
if not self.is_recording:
|
| 356 |
+
break
|
| 357 |
+
try:
|
| 358 |
+
self.restart_events[language].clear()
|
| 359 |
+
except Exception:
|
| 360 |
+
pass
|
| 361 |
+
time.sleep(0.01)
|
| 362 |
+
|
| 363 |
+
self._stream_started[language] = True
|
| 364 |
+
last_transcript_in_stream = ""
|
| 365 |
+
print(f"[{time.strftime('%H:%M:%S')}] [stt:{language}] Starting new stream...")
|
| 366 |
+
|
| 367 |
+
config = speech.RecognitionConfig(
|
| 368 |
+
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
|
| 369 |
+
sample_rate_hertz=self.audio_rate,
|
| 370 |
+
language_code=language,
|
| 371 |
+
enable_automatic_punctuation=True,
|
| 372 |
+
model="latest_short",
|
| 373 |
+
)
|
| 374 |
+
streaming_config = speech.StreamingRecognitionConfig(
|
| 375 |
+
config=config,
|
| 376 |
+
interim_results=True,
|
| 377 |
+
single_utterance=False,
|
| 378 |
+
)
|
| 379 |
+
|
| 380 |
+
# Request generator yields StreamingRecognizeRequest messages
|
| 381 |
+
def request_generator():
|
| 382 |
+
while self.is_recording:
|
| 383 |
+
# If TTS is playing, skip sending mic frames to STT
|
| 384 |
+
if self.speaking_event.is_set():
|
| 385 |
+
time.sleep(0.01)
|
| 386 |
+
continue
|
| 387 |
+
# If cancel event set, clear and break to end stream
|
| 388 |
+
if self.stream_cancel_events[language].is_set():
|
| 389 |
+
try:
|
| 390 |
+
self.stream_cancel_events[language].clear()
|
| 391 |
+
except Exception:
|
| 392 |
+
pass
|
| 393 |
+
break
|
| 394 |
+
try:
|
| 395 |
+
chunk = self.lang_queues[language].get(timeout=1.0)
|
| 396 |
+
except queue.Empty:
|
| 397 |
+
continue
|
| 398 |
+
yield speech.StreamingRecognizeRequest(audio_content=chunk)
|
| 399 |
+
|
| 400 |
+
responses = self.stt_client.streaming_recognize(streaming_config, request_generator())
|
| 401 |
+
|
| 402 |
+
response_count = 0
|
| 403 |
+
final_received = False
|
| 404 |
+
|
| 405 |
+
for response in responses:
|
| 406 |
+
if not self.is_recording:
|
| 407 |
+
print(f"[stt:{language}] Stopped by user")
|
| 408 |
+
break
|
| 409 |
+
if not response.results:
|
| 410 |
+
continue
|
| 411 |
+
|
| 412 |
+
response_count += 1
|
| 413 |
+
for result in response.results:
|
| 414 |
+
if not result.alternatives:
|
| 415 |
+
continue
|
| 416 |
+
alt = result.alternatives[0]
|
| 417 |
+
transcript = alt.transcript.strip()
|
| 418 |
+
conf = getattr(alt, "confidence", 0.0)
|
| 419 |
+
is_final = bool(result.is_final)
|
| 420 |
+
|
| 421 |
+
if is_final:
|
| 422 |
+
now = time.strftime("%H:%M:%S")
|
| 423 |
+
print(f"[{now}] [stt:{language}] → '{transcript}' (final={is_final}, conf={conf:.2f})")
|
| 424 |
+
|
| 425 |
+
# Filter empty transcripts - don't break stream
|
| 426 |
+
if not transcript or len(transcript.strip()) == 0:
|
| 427 |
+
print(f"[{now}] [stt:{language}] Empty transcript -> ignoring, continuing stream")
|
| 428 |
+
continue
|
| 429 |
+
|
| 430 |
+
# Deduplicate within same stream
|
| 431 |
+
if transcript.strip().lower() == last_transcript_in_stream.strip().lower():
|
| 432 |
+
print(f"[{now}] [stt:{language}] Duplicate final in same stream -> suppressed")
|
| 433 |
+
continue
|
| 434 |
+
|
| 435 |
+
if conf < self.min_confidence_threshold:
|
| 436 |
+
print(f"[{now}] [stt:{language}] Final received but confidence {conf:.2f} < threshold -> suppressed")
|
| 437 |
+
continue
|
| 438 |
+
|
| 439 |
+
last_transcript_in_stream = transcript
|
| 440 |
+
|
| 441 |
+
if language == "fr-FR" and transcript.strip().lower() == self.last_tts_text_fr.strip().lower():
|
| 442 |
+
print(f"[{now}] [stt:{language}] (echo suppressed - matches last_tts_text_fr)")
|
| 443 |
+
continue
|
| 444 |
+
if language == "en-US" and transcript.strip().lower() == self.last_tts_text_en.strip().lower():
|
| 445 |
+
print(f"[{now}] [stt:{language}] (echo suppressed - matches last_tts_text_en)")
|
| 446 |
+
continue
|
| 447 |
+
|
| 448 |
+
asyncio.run_coroutine_threadsafe(
|
| 449 |
+
self._process_result(transcript, conf, language),
|
| 450 |
+
self.async_loop
|
| 451 |
+
)
|
| 452 |
+
final_received = True
|
| 453 |
+
break
|
| 454 |
+
|
| 455 |
+
if final_received:
|
| 456 |
+
break
|
| 457 |
+
|
| 458 |
+
print(f"[stt:{language}] Stream ended after {response_count} responses")
|
| 459 |
+
|
| 460 |
+
if self.is_recording and final_received:
|
| 461 |
+
print(f"[{time.strftime('%H:%M:%S')}] [stt:{language}] Final result processed. Waiting for TTS to complete and signal restart.")
|
| 462 |
+
elif self.is_recording and not final_received:
|
| 463 |
+
print(f"[stt:{language}] Stream ended unexpectedly, reconnecting...")
|
| 464 |
+
time.sleep(0.5)
|
| 465 |
+
else:
|
| 466 |
+
break
|
| 467 |
+
|
| 468 |
+
except Exception as e:
|
| 469 |
+
if self.is_recording:
|
| 470 |
+
import traceback
|
| 471 |
+
print(f"[stt:{language}] Error: {e}")
|
| 472 |
+
print(traceback.format_exc())
|
| 473 |
+
time.sleep(1.0)
|
| 474 |
+
else:
|
| 475 |
+
break
|
| 476 |
+
|
| 477 |
+
print(f"[stt:{language}] Thread exiting")
|
| 478 |
+
|
| 479 |
+
# ---------------------------
|
| 480 |
+
# Control
|
| 481 |
+
# ---------------------------
|
| 482 |
+
def start_translation(self):
|
| 483 |
+
if self.is_recording:
|
| 484 |
+
print("Already recording!")
|
| 485 |
+
return
|
| 486 |
+
self.is_recording = True
|
| 487 |
+
self.last_processed_transcript = ""
|
| 488 |
+
|
| 489 |
+
for ev in self.restart_events.values():
|
| 490 |
+
try:
|
| 491 |
+
ev.clear()
|
| 492 |
+
except Exception:
|
| 493 |
+
pass
|
| 494 |
+
self.speaking_event.clear()
|
| 495 |
+
|
| 496 |
+
for q in self.lang_queues.values():
|
| 497 |
+
with q.mutex:
|
| 498 |
+
q.queue.clear()
|
| 499 |
+
|
| 500 |
+
self.recording_thread = threading.Thread(target=self._record_audio, daemon=True)
|
| 501 |
+
self.recording_thread.start()
|
| 502 |
+
|
| 503 |
+
for lang in ("en-US", "fr-FR"):
|
| 504 |
+
t = threading.Thread(target=self._run_stt_stream, args=(lang,), daemon=True)
|
| 505 |
+
self.stt_threads[lang] = t
|
| 506 |
+
t.start()
|
| 507 |
+
print(f"[main] STT thread {lang} started: {t.is_alive()} at {time.strftime('%H:%M:%S')}")
|
| 508 |
+
|
| 509 |
+
for ev in self.restart_events.values():
|
| 510 |
+
ev.set()
|
| 511 |
+
|
| 512 |
+
def stop_translation(self):
|
| 513 |
+
print("\n⏹️ Stopping translation...")
|
| 514 |
+
self.is_recording = False
|
| 515 |
+
for ev in self.restart_events.values():
|
| 516 |
+
ev.set()
|
| 517 |
+
self.speaking_event.clear()
|
| 518 |
+
|
| 519 |
+
if self._tts_consumer_task and not (self._tts_consumer_task.done() if hasattr(self._tts_consumer_task, 'done') else False):
|
| 520 |
+
try:
|
| 521 |
+
def _put_sentinel():
|
| 522 |
+
try:
|
| 523 |
+
self._tts_queue.put_nowait(None)
|
| 524 |
+
except Exception:
|
| 525 |
+
asyncio.create_task(self._tts_queue.put(None))
|
| 526 |
+
self.async_loop.call_soon_threadsafe(_put_sentinel)
|
| 527 |
+
except Exception:
|
| 528 |
+
pass
|
| 529 |
+
|
| 530 |
+
time.sleep(0.2)
|
| 531 |
+
|
| 532 |
+
def cleanup(self):
|
| 533 |
+
self.stop_translation()
|
| 534 |
+
try:
|
| 535 |
+
if self.async_loop.is_running():
|
| 536 |
+
def _stop_loop():
|
| 537 |
+
if self._tts_consumer_task and not self._tts_consumer_task.done():
|
| 538 |
+
try:
|
| 539 |
+
self._tts_queue.put_nowait(None)
|
| 540 |
+
except Exception:
|
| 541 |
+
pass
|
| 542 |
+
self.async_loop.stop()
|
| 543 |
+
self.async_loop.call_soon_threadsafe(_stop_loop)
|
| 544 |
+
except Exception:
|
| 545 |
+
pass
|
| 546 |
+
try:
|
| 547 |
+
self.pyaudio_instance.terminate()
|
| 548 |
+
except Exception:
|
| 549 |
+
pass
|
| 550 |
+
|
| 551 |
+
# -----------------------------------------------------------------------------
|
| 552 |
+
# Main entry
|
| 553 |
+
# -----------------------------------------------------------------------------
|
| 554 |
+
def main():
|
| 555 |
+
load_dotenv()
|
| 556 |
+
google_creds = os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
|
| 557 |
+
deepl_key = os.getenv("DEEPL_API_KEY")
|
| 558 |
+
eleven_key = os.getenv("ELEVENLABS_API_KEY")
|
| 559 |
+
voice_id = os.getenv("ELEVENLABS_VOICE_ID")
|
| 560 |
+
|
| 561 |
+
if not all([google_creds, deepl_key, eleven_key, voice_id]):
|
| 562 |
+
print("Missing API keys or credentials.")
|
| 563 |
+
return
|
| 564 |
+
|
| 565 |
+
translator = VoiceTranslator(deepl_key, eleven_key, voice_id)
|
| 566 |
+
print("Ready! Press ENTER to start, ENTER again to stop, Ctrl+C to quit.\n")
|
| 567 |
+
|
| 568 |
+
try:
|
| 569 |
+
while True:
|
| 570 |
+
input("Press ENTER to start speaking...")
|
| 571 |
+
translator.start_translation()
|
| 572 |
+
input("Press ENTER to stop...\n")
|
| 573 |
+
translator.stop_translation()
|
| 574 |
+
except KeyboardInterrupt:
|
| 575 |
+
print("\nKeyboardInterrupt received — cleaning up.")
|
| 576 |
+
translator.cleanup()
|
| 577 |
+
|
| 578 |
+
if __name__ == "__main__":
|
| 579 |
+
main()
|
app_backup.py
ADDED
|
@@ -0,0 +1,597 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Real-Time French/English Voice Translator — patched single-file v2
|
| 4 |
+
|
| 5 |
+
Changes from previous:
|
| 6 |
+
- Adds per-language stream_cancel_events that force the STT request_generator
|
| 7 |
+
to exit, allowing streaming_recognize to terminate and be restarted cleanly.
|
| 8 |
+
- _stream_tts sets the cancel events immediately after playback finishes (before
|
| 9 |
+
prebuffer re-injection and restart events).
|
| 10 |
+
- Request generator checks cancel event frequently and breaks to end the stream.
|
| 11 |
+
|
| 12 |
+
Keep your env vars:
|
| 13 |
+
- GOOGLE_APPLICATION_CREDENTIALS, DEEPL_API_KEY, ELEVENLABS_API_KEY, ELEVENLABS_VOICE_ID
|
| 14 |
+
"""
|
| 15 |
+
|
| 16 |
+
import asyncio
|
| 17 |
+
import json
|
| 18 |
+
import queue
|
| 19 |
+
import threading
|
| 20 |
+
import time
|
| 21 |
+
import os
|
| 22 |
+
import base64
|
| 23 |
+
from collections import deque
|
| 24 |
+
from typing import Dict, Optional
|
| 25 |
+
|
| 26 |
+
import pyaudio
|
| 27 |
+
import websockets
|
| 28 |
+
from google.cloud import speech
|
| 29 |
+
import deepl
|
| 30 |
+
from dotenv import load_dotenv
|
| 31 |
+
|
| 32 |
+
# -----------------------------------------------------------------------------
|
| 33 |
+
# VoiceTranslator
|
| 34 |
+
# -----------------------------------------------------------------------------
|
| 35 |
+
class VoiceTranslator:
|
| 36 |
+
def __init__(self, deepl_api_key: str, elevenlabs_api_key: str, elevenlabs_voice_id: str):
|
| 37 |
+
# External clients
|
| 38 |
+
self.deepl_client = deepl.Translator(deepl_api_key)
|
| 39 |
+
self.elevenlabs_api_key = elevenlabs_api_key
|
| 40 |
+
self.voice_id = elevenlabs_voice_id
|
| 41 |
+
self.stt_client = speech.SpeechClient()
|
| 42 |
+
|
| 43 |
+
# Audio params
|
| 44 |
+
self.audio_rate = 16000
|
| 45 |
+
self.audio_chunk = 1024
|
| 46 |
+
|
| 47 |
+
# Per-language audio queues (raw mic frames)
|
| 48 |
+
self.lang_queues: Dict[str, queue.Queue] = {
|
| 49 |
+
"en-US": queue.Queue(),
|
| 50 |
+
"fr-FR": queue.Queue(),
|
| 51 |
+
}
|
| 52 |
+
|
| 53 |
+
# Small rolling prebuffer to avoid missing the first bits after a restart
|
| 54 |
+
self.prebuffer = deque(maxlen=12)
|
| 55 |
+
|
| 56 |
+
# State flags
|
| 57 |
+
self.is_recording = False
|
| 58 |
+
self.is_speaking = False
|
| 59 |
+
self.speaking_event = threading.Event()
|
| 60 |
+
|
| 61 |
+
# Deduplication
|
| 62 |
+
self.last_processed_transcript = ""
|
| 63 |
+
self.last_tts_text_en = ""
|
| 64 |
+
self.last_tts_text_fr = ""
|
| 65 |
+
|
| 66 |
+
# Threshold
|
| 67 |
+
self.min_confidence_threshold = 0.5
|
| 68 |
+
|
| 69 |
+
# PyAudio
|
| 70 |
+
self.pyaudio_instance = pyaudio.PyAudio()
|
| 71 |
+
self.audio_stream = None
|
| 72 |
+
|
| 73 |
+
# Threads + async
|
| 74 |
+
self.recording_thread: Optional[threading.Thread] = None
|
| 75 |
+
self.async_loop = asyncio.new_event_loop()
|
| 76 |
+
|
| 77 |
+
# TTS queue + consumer task
|
| 78 |
+
self._tts_queue: "asyncio.Queue[Optional[dict]]" = asyncio.Queue()
|
| 79 |
+
self._tts_consumer_task: Optional[asyncio.Task] = None
|
| 80 |
+
|
| 81 |
+
# Start async loop in separate thread
|
| 82 |
+
self.async_thread = threading.Thread(target=self._run_async_loop, daemon=True)
|
| 83 |
+
self.async_thread.start()
|
| 84 |
+
|
| 85 |
+
# schedule tts consumer creation inside the async loop
|
| 86 |
+
def _start_consumer():
|
| 87 |
+
self._tts_consumer_task = asyncio.create_task(self._tts_consumer())
|
| 88 |
+
self.async_loop.call_soon_threadsafe(_start_consumer)
|
| 89 |
+
|
| 90 |
+
self.stt_threads: Dict[str, threading.Thread] = {}
|
| 91 |
+
|
| 92 |
+
# Per-language restart events (used to tell threads when to start new streams)
|
| 93 |
+
self.restart_events: Dict[str, threading.Event] = {
|
| 94 |
+
"en-US": threading.Event(),
|
| 95 |
+
"fr-FR": threading.Event(),
|
| 96 |
+
}
|
| 97 |
+
|
| 98 |
+
# Per-language stream started flag
|
| 99 |
+
self._stream_started = {"en-US": False, "fr-FR": False}
|
| 100 |
+
|
| 101 |
+
# **NEW**: per-language cancel events to force request_generator to stop
|
| 102 |
+
self.stream_cancel_events: Dict[str, threading.Event] = {
|
| 103 |
+
"en-US": threading.Event(),
|
| 104 |
+
"fr-FR": threading.Event(),
|
| 105 |
+
}
|
| 106 |
+
|
| 107 |
+
# Diagnostics
|
| 108 |
+
self._tts_job_counter = 0
|
| 109 |
+
|
| 110 |
+
def _run_async_loop(self):
|
| 111 |
+
asyncio.set_event_loop(self.async_loop)
|
| 112 |
+
try:
|
| 113 |
+
self.async_loop.run_forever()
|
| 114 |
+
except Exception as e:
|
| 115 |
+
print("[async_loop] stopped with error:", e)
|
| 116 |
+
|
| 117 |
+
# ---------------------------
|
| 118 |
+
# Audio capture
|
| 119 |
+
# ---------------------------
|
| 120 |
+
def _record_audio(self):
|
| 121 |
+
try:
|
| 122 |
+
stream = self.pyaudio_instance.open(
|
| 123 |
+
format=pyaudio.paInt16,
|
| 124 |
+
channels=1,
|
| 125 |
+
rate=self.audio_rate,
|
| 126 |
+
input=True,
|
| 127 |
+
frames_per_buffer=self.audio_chunk,
|
| 128 |
+
)
|
| 129 |
+
print("🎤 Recording started...")
|
| 130 |
+
|
| 131 |
+
while self.is_recording:
|
| 132 |
+
if self.speaking_event.is_set():
|
| 133 |
+
time.sleep(0.01)
|
| 134 |
+
continue
|
| 135 |
+
|
| 136 |
+
try:
|
| 137 |
+
data = stream.read(self.audio_chunk, exception_on_overflow=False)
|
| 138 |
+
except Exception as e:
|
| 139 |
+
print(f"[recorder] read error: {e}")
|
| 140 |
+
continue
|
| 141 |
+
|
| 142 |
+
if not data:
|
| 143 |
+
continue
|
| 144 |
+
|
| 145 |
+
self.prebuffer.append(data)
|
| 146 |
+
self.lang_queues["en-US"].put(data)
|
| 147 |
+
self.lang_queues["fr-FR"].put(data)
|
| 148 |
+
|
| 149 |
+
try:
|
| 150 |
+
stream.stop_stream()
|
| 151 |
+
stream.close()
|
| 152 |
+
except Exception:
|
| 153 |
+
pass
|
| 154 |
+
print("🎤 Recording stopped.")
|
| 155 |
+
except Exception as e:
|
| 156 |
+
print(f"[recorder] fatal: {e}")
|
| 157 |
+
|
| 158 |
+
# ---------------------------
|
| 159 |
+
# TTS streaming (ElevenLabs) - async
|
| 160 |
+
# ---------------------------
|
| 161 |
+
async def _stream_tts(self, text: str):
|
| 162 |
+
uri = (
|
| 163 |
+
f"wss://api.elevenlabs.io/v1/text-to-speech/{self.voice_id}"
|
| 164 |
+
f"/stream-input?model_id=eleven_flash_v2_5&output_format=pcm_16000"
|
| 165 |
+
)
|
| 166 |
+
tts_audio_stream = None
|
| 167 |
+
websocket = None
|
| 168 |
+
try:
|
| 169 |
+
# Mark speaking and set event so recorder & STT pause
|
| 170 |
+
self.is_speaking = True
|
| 171 |
+
self.speaking_event.set()
|
| 172 |
+
# print(f"[{time.strftime('%H:%M:%S')}] [tts] speaking -> True")
|
| 173 |
+
|
| 174 |
+
# Clear prebuffer to avoid re-injecting TTS audio later
|
| 175 |
+
self.prebuffer.clear()
|
| 176 |
+
|
| 177 |
+
# Clear queued frames to avoid replay; we'll re-inject prebuffer after we cancel streams
|
| 178 |
+
for q in self.lang_queues.values():
|
| 179 |
+
with q.mutex:
|
| 180 |
+
q.queue.clear()
|
| 181 |
+
|
| 182 |
+
# Brief pause to ensure recorder sees speaking_event before we start TTS
|
| 183 |
+
await asyncio.sleep(0.1)
|
| 184 |
+
|
| 185 |
+
websocket = await websockets.connect(uri)
|
| 186 |
+
await websocket.send(json.dumps({
|
| 187 |
+
"text": " ",
|
| 188 |
+
"voice_settings": {"stability": 0.5, "similarity_boost": 0.8},
|
| 189 |
+
"xi_api_key": self.elevenlabs_api_key,
|
| 190 |
+
}))
|
| 191 |
+
await websocket.send(json.dumps({"text": text, "try_trigger_generation": True}))
|
| 192 |
+
await websocket.send(json.dumps({"text": ""}))
|
| 193 |
+
|
| 194 |
+
tts_audio_stream = self.pyaudio_instance.open(
|
| 195 |
+
format=pyaudio.paInt16,
|
| 196 |
+
channels=1,
|
| 197 |
+
rate=16000,
|
| 198 |
+
output=True,
|
| 199 |
+
frames_per_buffer=1024,
|
| 200 |
+
)
|
| 201 |
+
|
| 202 |
+
prebuffer = bytearray()
|
| 203 |
+
playback_started = False
|
| 204 |
+
|
| 205 |
+
try:
|
| 206 |
+
while True:
|
| 207 |
+
try:
|
| 208 |
+
message = await asyncio.wait_for(websocket.recv(), timeout=8.0)
|
| 209 |
+
except asyncio.TimeoutError:
|
| 210 |
+
if playback_started:
|
| 211 |
+
break
|
| 212 |
+
else:
|
| 213 |
+
continue
|
| 214 |
+
|
| 215 |
+
if isinstance(message, bytes):
|
| 216 |
+
prebuffer.extend(message)
|
| 217 |
+
if not playback_started and len(prebuffer) >= 8000:
|
| 218 |
+
tts_audio_stream.write(bytes(prebuffer))
|
| 219 |
+
prebuffer.clear()
|
| 220 |
+
playback_started = True
|
| 221 |
+
elif playback_started:
|
| 222 |
+
tts_audio_stream.write(message)
|
| 223 |
+
continue
|
| 224 |
+
|
| 225 |
+
try:
|
| 226 |
+
data = json.loads(message)
|
| 227 |
+
except Exception:
|
| 228 |
+
continue
|
| 229 |
+
|
| 230 |
+
if data.get("audio"):
|
| 231 |
+
audio_bytes = base64.b64decode(data["audio"])
|
| 232 |
+
if not playback_started:
|
| 233 |
+
prebuffer.extend(audio_bytes)
|
| 234 |
+
if len(prebuffer) >= 16000:
|
| 235 |
+
print(f"[tts] Starting playback, prebuffer size: {len(prebuffer)}")
|
| 236 |
+
tts_audio_stream.write(bytes(prebuffer))
|
| 237 |
+
prebuffer.clear()
|
| 238 |
+
playback_started = True
|
| 239 |
+
else:
|
| 240 |
+
tts_audio_stream.write(audio_bytes)
|
| 241 |
+
|
| 242 |
+
elif data.get("isFinal"):
|
| 243 |
+
print(f"[tts] Received isFinal, prebuffer remaining: {len(prebuffer)}")
|
| 244 |
+
break
|
| 245 |
+
|
| 246 |
+
elif data.get("error"):
|
| 247 |
+
print("TTS error:", data["error"])
|
| 248 |
+
break
|
| 249 |
+
|
| 250 |
+
# Prebuffer should be empty after playback starts, but just in case
|
| 251 |
+
if prebuffer and not playback_started:
|
| 252 |
+
print(f"[tts] Writing final prebuffer: {len(prebuffer)} bytes (playback never started)")
|
| 253 |
+
tts_audio_stream.write(bytes(prebuffer))
|
| 254 |
+
elif prebuffer:
|
| 255 |
+
print(f"[tts] WARNING: prebuffer has {len(prebuffer)} bytes after playback - this is a bug!")
|
| 256 |
+
|
| 257 |
+
finally:
|
| 258 |
+
try:
|
| 259 |
+
await websocket.close()
|
| 260 |
+
except Exception:
|
| 261 |
+
pass
|
| 262 |
+
|
| 263 |
+
except Exception as e:
|
| 264 |
+
# print(f"[tts] error: {e}")
|
| 265 |
+
pass
|
| 266 |
+
finally:
|
| 267 |
+
if tts_audio_stream:
|
| 268 |
+
try:
|
| 269 |
+
tts_audio_stream.stop_stream()
|
| 270 |
+
tts_audio_stream.close()
|
| 271 |
+
except Exception:
|
| 272 |
+
pass
|
| 273 |
+
# **NEW**: force the STT request generators to exit by setting cancel events.
|
| 274 |
+
# This makes streaming_recognize finish; threads will then wait for restart_events
|
| 275 |
+
# and start fresh streams.
|
| 276 |
+
for lang, ev in self.stream_cancel_events.items():
|
| 277 |
+
ev.set()
|
| 278 |
+
# print(f"[{time.strftime('%H:%M:%S')}] [cancel] set -> {lang}")
|
| 279 |
+
|
| 280 |
+
# Don't re-inject prebuffer - it may contain TTS echo
|
| 281 |
+
# Just clear the queues and let fresh audio come in
|
| 282 |
+
for q in self.lang_queues.values():
|
| 283 |
+
with q.mutex:
|
| 284 |
+
q.queue.clear()
|
| 285 |
+
|
| 286 |
+
# Wait for TTS audio to clear from environment (acoustic decay)
|
| 287 |
+
await asyncio.sleep(0.1)
|
| 288 |
+
|
| 289 |
+
# Clear speaking state and signal STT threads to restart (robustly)
|
| 290 |
+
self.is_speaking = False
|
| 291 |
+
self.speaking_event.clear()
|
| 292 |
+
# print(f"[{time.strftime('%H:%M:%S')}] [tts] speaking -> False")
|
| 293 |
+
|
| 294 |
+
# Primary restart: set both events
|
| 295 |
+
for lang, ev in self.restart_events.items():
|
| 296 |
+
ev.set()
|
| 297 |
+
# print(f"[{time.strftime('%H:%M:%S')}] [restart] set -> {lang}")
|
| 298 |
+
|
| 299 |
+
await asyncio.sleep(0.25)
|
| 300 |
+
for lang, ev in self.restart_events.items():
|
| 301 |
+
ev.set()
|
| 302 |
+
await asyncio.sleep(0.25)
|
| 303 |
+
|
| 304 |
+
# ---------------------------
|
| 305 |
+
# TTS consumer (serializes TTS)
|
| 306 |
+
# ---------------------------
|
| 307 |
+
async def _tts_consumer(self):
|
| 308 |
+
print("[tts_consumer] started")
|
| 309 |
+
while True:
|
| 310 |
+
item = await self._tts_queue.get()
|
| 311 |
+
if item is None:
|
| 312 |
+
print("[tts_consumer] shutdown sentinel received")
|
| 313 |
+
break
|
| 314 |
+
text = item.get("text", "")
|
| 315 |
+
self._tts_job_counter += 1
|
| 316 |
+
job_id = self._tts_job_counter
|
| 317 |
+
print(f"[tts_consumer] job #{job_id} dequeued: '{text}'")
|
| 318 |
+
try:
|
| 319 |
+
await asyncio.wait_for(self._stream_tts(text), timeout=35.0)
|
| 320 |
+
except asyncio.TimeoutError:
|
| 321 |
+
print(f"[tts_consumer] job #{job_id} _stream_tts timed out; proceeding.")
|
| 322 |
+
except Exception as e:
|
| 323 |
+
print(f"[tts_consumer] job #{job_id} error during _stream_tts: {e}")
|
| 324 |
+
finally:
|
| 325 |
+
await asyncio.sleep(0.05)
|
| 326 |
+
print("[tts_consumer] exiting")
|
| 327 |
+
|
| 328 |
+
# ---------------------------
|
| 329 |
+
# Translation & TTS trigger
|
| 330 |
+
# ---------------------------
|
| 331 |
+
async def _process_result(self, transcript: str, confidence: float, language: str):
|
| 332 |
+
lang_flag = "🇫🇷" if language == "fr-FR" else "🇬🇧"
|
| 333 |
+
print(f"{lang_flag} Heard ({language}, conf {confidence:.2f}): {transcript}")
|
| 334 |
+
|
| 335 |
+
# echo suppression vs last TTS in same language
|
| 336 |
+
if language == "fr-FR":
|
| 337 |
+
if transcript.strip().lower() == self.last_tts_text_fr.strip().lower():
|
| 338 |
+
print(" (echo suppressed)")
|
| 339 |
+
return
|
| 340 |
+
else:
|
| 341 |
+
if transcript.strip().lower() == self.last_tts_text_en.strip().lower():
|
| 342 |
+
print(" (echo suppressed)")
|
| 343 |
+
return
|
| 344 |
+
|
| 345 |
+
try:
|
| 346 |
+
if language == "fr-FR":
|
| 347 |
+
translated = self.deepl_client.translate_text(transcript, target_lang="EN-US").text
|
| 348 |
+
print(f"🌐 FR → EN: {translated}")
|
| 349 |
+
await self._tts_queue.put({"text": translated, "source_lang": language})
|
| 350 |
+
self.last_tts_text_en = translated
|
| 351 |
+
else:
|
| 352 |
+
translated = self.deepl_client.translate_text(transcript, target_lang="FR").text
|
| 353 |
+
print(f"🌐 EN → FR: {translated}")
|
| 354 |
+
await self._tts_queue.put({"text": translated, "source_lang": language})
|
| 355 |
+
self.last_tts_text_fr = translated
|
| 356 |
+
print("🔊 Queued for speaking...")
|
| 357 |
+
except Exception as e:
|
| 358 |
+
print(f"Translation error: {e}")
|
| 359 |
+
|
| 360 |
+
# ---------------------------
|
| 361 |
+
# STT streaming (run per language)
|
| 362 |
+
# ---------------------------
|
| 363 |
+
def _run_stt_stream(self, language: str):
|
| 364 |
+
print(f"[stt:{language}] Thread starting, thread_id={threading.get_ident()}")
|
| 365 |
+
self._stream_started[language] = False
|
| 366 |
+
last_transcript_in_stream = ""
|
| 367 |
+
|
| 368 |
+
while self.is_recording:
|
| 369 |
+
try:
|
| 370 |
+
if self._stream_started[language]:
|
| 371 |
+
print(f"[{time.strftime('%H:%M:%S')}] [stt:{language}] Waiting for restart signal...")
|
| 372 |
+
signaled = self.restart_events[language].wait(timeout=30)
|
| 373 |
+
if not signaled and self.is_recording:
|
| 374 |
+
print(f"[{time.strftime('%H:%M:%S')}] [stt:{language}] Timeout waiting for restart, restarting anyway")
|
| 375 |
+
if not self.is_recording:
|
| 376 |
+
break
|
| 377 |
+
try:
|
| 378 |
+
self.restart_events[language].clear()
|
| 379 |
+
except Exception:
|
| 380 |
+
pass
|
| 381 |
+
time.sleep(0.01)
|
| 382 |
+
|
| 383 |
+
self._stream_started[language] = True
|
| 384 |
+
print(f"[{time.strftime('%H:%M:%S')}] [stt:{language}] Starting new stream...")
|
| 385 |
+
|
| 386 |
+
config = speech.RecognitionConfig(
|
| 387 |
+
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
|
| 388 |
+
sample_rate_hertz=self.audio_rate,
|
| 389 |
+
language_code=language,
|
| 390 |
+
enable_automatic_punctuation=True,
|
| 391 |
+
model="latest_short",
|
| 392 |
+
)
|
| 393 |
+
streaming_config = speech.StreamingRecognitionConfig(
|
| 394 |
+
config=config,
|
| 395 |
+
interim_results=True,
|
| 396 |
+
single_utterance=False,
|
| 397 |
+
)
|
| 398 |
+
|
| 399 |
+
# Request generator yields StreamingRecognizeRequest messages
|
| 400 |
+
def request_generator():
|
| 401 |
+
while self.is_recording:
|
| 402 |
+
# If TTS is playing, skip sending mic frames to STT
|
| 403 |
+
if self.speaking_event.is_set():
|
| 404 |
+
time.sleep(0.01)
|
| 405 |
+
continue
|
| 406 |
+
# If cancel event set, clear and break to end stream
|
| 407 |
+
if self.stream_cancel_events[language].is_set():
|
| 408 |
+
# print(f"[{time.strftime('%H:%M:%S')}] [stt:{language}] request_generator observed cancel -> exiting generator")
|
| 409 |
+
try:
|
| 410 |
+
self.stream_cancel_events[language].clear()
|
| 411 |
+
except Exception:
|
| 412 |
+
pass
|
| 413 |
+
break
|
| 414 |
+
try:
|
| 415 |
+
chunk = self.lang_queues[language].get(timeout=1.0)
|
| 416 |
+
except queue.Empty:
|
| 417 |
+
continue
|
| 418 |
+
yield speech.StreamingRecognizeRequest(audio_content=chunk)
|
| 419 |
+
|
| 420 |
+
responses = self.stt_client.streaming_recognize(streaming_config, request_generator())
|
| 421 |
+
|
| 422 |
+
response_count = 0
|
| 423 |
+
final_received = False
|
| 424 |
+
|
| 425 |
+
for response in responses:
|
| 426 |
+
if not self.is_recording:
|
| 427 |
+
print(f"[stt:{language}] Stopped by user")
|
| 428 |
+
break
|
| 429 |
+
if not response.results:
|
| 430 |
+
continue
|
| 431 |
+
|
| 432 |
+
response_count += 1
|
| 433 |
+
for result in response.results:
|
| 434 |
+
if not result.alternatives:
|
| 435 |
+
continue
|
| 436 |
+
alt = result.alternatives[0]
|
| 437 |
+
transcript = alt.transcript.strip()
|
| 438 |
+
conf = getattr(alt, "confidence", 0.0)
|
| 439 |
+
is_final = bool(result.is_final)
|
| 440 |
+
|
| 441 |
+
if is_final:
|
| 442 |
+
now = time.strftime("%H:%M:%S")
|
| 443 |
+
print(f"[{now}] [stt:{language}] → '{transcript}' (final={is_final}, conf={conf:.2f})")
|
| 444 |
+
|
| 445 |
+
# Filter empty transcripts - don't break stream
|
| 446 |
+
if not transcript or len(transcript.strip()) == 0:
|
| 447 |
+
print(f"[{now}] [stt:{language}] Empty transcript -> ignoring, continuing stream")
|
| 448 |
+
continue
|
| 449 |
+
|
| 450 |
+
# Deduplicate within same stream
|
| 451 |
+
if transcript.strip().lower() == last_transcript_in_stream.strip().lower():
|
| 452 |
+
print(f"[{now}] [stt:{language}] Duplicate final in same stream -> suppressed")
|
| 453 |
+
continue
|
| 454 |
+
|
| 455 |
+
if conf < self.min_confidence_threshold:
|
| 456 |
+
print(f"[{now}] [stt:{language}] Final received but confidence {conf:.2f} < threshold -> suppressed")
|
| 457 |
+
continue
|
| 458 |
+
|
| 459 |
+
if language == "fr-FR" and transcript.strip().lower() == self.last_tts_text_fr.strip().lower():
|
| 460 |
+
print(f"[{now}] [stt:{language}] (echo suppressed - matches last_tts_text_fr)")
|
| 461 |
+
continue
|
| 462 |
+
if language == "en-US" and transcript.strip().lower() == self.last_tts_text_en.strip().lower():
|
| 463 |
+
print(f"[{now}] [stt:{language}] (echo suppressed - matches last_tts_text_en)")
|
| 464 |
+
continue
|
| 465 |
+
|
| 466 |
+
asyncio.run_coroutine_threadsafe(
|
| 467 |
+
self._process_result(transcript, conf, language),
|
| 468 |
+
self.async_loop
|
| 469 |
+
)
|
| 470 |
+
final_received = True
|
| 471 |
+
break
|
| 472 |
+
|
| 473 |
+
if final_received:
|
| 474 |
+
break
|
| 475 |
+
|
| 476 |
+
print(f"[stt:{language}] Stream ended after {response_count} responses")
|
| 477 |
+
|
| 478 |
+
if self.is_recording and final_received:
|
| 479 |
+
print(f"[{time.strftime('%H:%M:%S')}] [stt:{language}] Final result processed. Waiting for TTS to complete and signal restart.")
|
| 480 |
+
elif self.is_recording and not final_received:
|
| 481 |
+
print(f"[stt:{language}] Stream ended unexpectedly, reconnecting...")
|
| 482 |
+
time.sleep(0.5)
|
| 483 |
+
else:
|
| 484 |
+
break
|
| 485 |
+
|
| 486 |
+
except Exception as e:
|
| 487 |
+
if self.is_recording:
|
| 488 |
+
import traceback
|
| 489 |
+
print(f"[stt:{language}] Error: {e}")
|
| 490 |
+
print(traceback.format_exc())
|
| 491 |
+
time.sleep(1.0)
|
| 492 |
+
else:
|
| 493 |
+
break
|
| 494 |
+
|
| 495 |
+
print(f"[stt:{language}] Thread exiting")
|
| 496 |
+
|
| 497 |
+
# ---------------------------
|
| 498 |
+
# Control
|
| 499 |
+
# ---------------------------
|
| 500 |
+
def start_translation(self):
|
| 501 |
+
if self.is_recording:
|
| 502 |
+
print("Already recording!")
|
| 503 |
+
return
|
| 504 |
+
self.is_recording = True
|
| 505 |
+
self.last_processed_transcript = ""
|
| 506 |
+
|
| 507 |
+
for ev in self.restart_events.values():
|
| 508 |
+
try:
|
| 509 |
+
ev.clear()
|
| 510 |
+
except Exception:
|
| 511 |
+
pass
|
| 512 |
+
self.speaking_event.clear()
|
| 513 |
+
|
| 514 |
+
for q in self.lang_queues.values():
|
| 515 |
+
with q.mutex:
|
| 516 |
+
q.queue.clear()
|
| 517 |
+
|
| 518 |
+
self.recording_thread = threading.Thread(target=self._record_audio, daemon=True)
|
| 519 |
+
self.recording_thread.start()
|
| 520 |
+
|
| 521 |
+
for lang in ("en-US", "fr-FR"):
|
| 522 |
+
t = threading.Thread(target=self._run_stt_stream, args=(lang,), daemon=True)
|
| 523 |
+
self.stt_threads[lang] = t
|
| 524 |
+
t.start()
|
| 525 |
+
print(f"[main] STT thread {lang} started: {t.is_alive()} at {time.strftime('%H:%M:%S')}")
|
| 526 |
+
|
| 527 |
+
for ev in self.restart_events.values():
|
| 528 |
+
ev.set()
|
| 529 |
+
|
| 530 |
+
def stop_translation(self):
|
| 531 |
+
print("\n⏹️ Stopping translation...")
|
| 532 |
+
self.is_recording = False
|
| 533 |
+
for ev in self.restart_events.values():
|
| 534 |
+
ev.set()
|
| 535 |
+
self.speaking_event.clear()
|
| 536 |
+
|
| 537 |
+
if self._tts_consumer_task and not (self._tts_consumer_task.done() if hasattr(self._tts_consumer_task, 'done') else False):
|
| 538 |
+
try:
|
| 539 |
+
def _put_sentinel():
|
| 540 |
+
try:
|
| 541 |
+
self._tts_queue.put_nowait(None)
|
| 542 |
+
except Exception:
|
| 543 |
+
asyncio.create_task(self._tts_queue.put(None))
|
| 544 |
+
self.async_loop.call_soon_threadsafe(_put_sentinel)
|
| 545 |
+
except Exception:
|
| 546 |
+
pass
|
| 547 |
+
|
| 548 |
+
time.sleep(0.2)
|
| 549 |
+
|
| 550 |
+
def cleanup(self):
|
| 551 |
+
self.stop_translation()
|
| 552 |
+
try:
|
| 553 |
+
if self.async_loop.is_running():
|
| 554 |
+
def _stop_loop():
|
| 555 |
+
if self._tts_consumer_task and not self._tts_consumer_task.done():
|
| 556 |
+
try:
|
| 557 |
+
self._tts_queue.put_nowait(None)
|
| 558 |
+
except Exception:
|
| 559 |
+
pass
|
| 560 |
+
self.async_loop.stop()
|
| 561 |
+
self.async_loop.call_soon_threadsafe(_stop_loop)
|
| 562 |
+
except Exception:
|
| 563 |
+
pass
|
| 564 |
+
try:
|
| 565 |
+
self.pyaudio_instance.terminate()
|
| 566 |
+
except Exception:
|
| 567 |
+
pass
|
| 568 |
+
|
| 569 |
+
# -----------------------------------------------------------------------------
|
| 570 |
+
# Main entry
|
| 571 |
+
# -----------------------------------------------------------------------------
|
| 572 |
+
def main():
|
| 573 |
+
load_dotenv()
|
| 574 |
+
google_creds = os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
|
| 575 |
+
deepl_key = os.getenv("DEEPL_API_KEY")
|
| 576 |
+
eleven_key = os.getenv("ELEVENLABS_API_KEY")
|
| 577 |
+
voice_id = os.getenv("ELEVENLABS_VOICE_ID")
|
| 578 |
+
|
| 579 |
+
if not all([google_creds, deepl_key, eleven_key, voice_id]):
|
| 580 |
+
print("Missing API keys or credentials.")
|
| 581 |
+
return
|
| 582 |
+
|
| 583 |
+
translator = VoiceTranslator(deepl_key, eleven_key, voice_id)
|
| 584 |
+
print("Ready! Press ENTER to start, ENTER again to stop, Ctrl+C to quit.\n")
|
| 585 |
+
|
| 586 |
+
try:
|
| 587 |
+
while True:
|
| 588 |
+
input("Press ENTER to start speaking...")
|
| 589 |
+
translator.start_translation()
|
| 590 |
+
input("Press ENTER to stop...\n")
|
| 591 |
+
translator.stop_translation()
|
| 592 |
+
except KeyboardInterrupt:
|
| 593 |
+
print("\nKeyboardInterrupt received — cleaning up.")
|
| 594 |
+
translator.cleanup()
|
| 595 |
+
|
| 596 |
+
if __name__ == "__main__":
|
| 597 |
+
main()
|
checkpoint_nov2.py
ADDED
|
@@ -0,0 +1,570 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Real-Time French/English Voice Translator — patched single-file v2
|
| 4 |
+
|
| 5 |
+
Changes from previous:
|
| 6 |
+
- Adds per-language stream_cancel_events that force the STT request_generator
|
| 7 |
+
to exit, allowing streaming_recognize to terminate and be restarted cleanly.
|
| 8 |
+
- _stream_tts sets the cancel events immediately after playback finishes (before
|
| 9 |
+
prebuffer re-injection and restart events).
|
| 10 |
+
- Request generator checks cancel event frequently and breaks to end the stream.
|
| 11 |
+
|
| 12 |
+
Keep your env vars:
|
| 13 |
+
- GOOGLE_APPLICATION_CREDENTIALS, DEEPL_API_KEY, ELEVENLABS_API_KEY, ELEVENLABS_VOICE_ID
|
| 14 |
+
"""
|
| 15 |
+
|
| 16 |
+
import asyncio
|
| 17 |
+
import json
|
| 18 |
+
import queue
|
| 19 |
+
import threading
|
| 20 |
+
import time
|
| 21 |
+
import os
|
| 22 |
+
import base64
|
| 23 |
+
from collections import deque
|
| 24 |
+
from typing import Dict, Optional
|
| 25 |
+
|
| 26 |
+
import pyaudio
|
| 27 |
+
import websockets
|
| 28 |
+
from google.cloud import speech
|
| 29 |
+
import deepl
|
| 30 |
+
from dotenv import load_dotenv
|
| 31 |
+
|
| 32 |
+
# -----------------------------------------------------------------------------
|
| 33 |
+
# VoiceTranslator
|
| 34 |
+
# -----------------------------------------------------------------------------
|
| 35 |
+
class VoiceTranslator:
|
| 36 |
+
def __init__(self, deepl_api_key: str, elevenlabs_api_key: str, elevenlabs_voice_id: str):
|
| 37 |
+
# External clients
|
| 38 |
+
self.deepl_client = deepl.Translator(deepl_api_key)
|
| 39 |
+
self.elevenlabs_api_key = elevenlabs_api_key
|
| 40 |
+
self.voice_id = elevenlabs_voice_id
|
| 41 |
+
self.stt_client = speech.SpeechClient()
|
| 42 |
+
|
| 43 |
+
# Audio params
|
| 44 |
+
self.audio_rate = 16000
|
| 45 |
+
self.audio_chunk = 1024
|
| 46 |
+
|
| 47 |
+
# Per-language audio queues (raw mic frames)
|
| 48 |
+
self.lang_queues: Dict[str, queue.Queue] = {
|
| 49 |
+
"en-US": queue.Queue(),
|
| 50 |
+
"fr-FR": queue.Queue(),
|
| 51 |
+
}
|
| 52 |
+
|
| 53 |
+
# Small rolling prebuffer to avoid missing the first bits after a restart
|
| 54 |
+
self.prebuffer = deque(maxlen=12)
|
| 55 |
+
|
| 56 |
+
# State flags
|
| 57 |
+
self.is_recording = False
|
| 58 |
+
self.is_speaking = False
|
| 59 |
+
self.speaking_event = threading.Event()
|
| 60 |
+
|
| 61 |
+
# Deduplication
|
| 62 |
+
self.last_processed_transcript = ""
|
| 63 |
+
self.last_tts_text_en = ""
|
| 64 |
+
self.last_tts_text_fr = ""
|
| 65 |
+
|
| 66 |
+
# Threshold
|
| 67 |
+
self.min_confidence_threshold = 0.5
|
| 68 |
+
|
| 69 |
+
# PyAudio
|
| 70 |
+
self.pyaudio_instance = pyaudio.PyAudio()
|
| 71 |
+
self.audio_stream = None
|
| 72 |
+
|
| 73 |
+
# Threads + async
|
| 74 |
+
self.recording_thread: Optional[threading.Thread] = None
|
| 75 |
+
self.async_loop = asyncio.new_event_loop()
|
| 76 |
+
|
| 77 |
+
# TTS queue + consumer task
|
| 78 |
+
self._tts_queue: "asyncio.Queue[Optional[dict]]" = asyncio.Queue()
|
| 79 |
+
self._tts_consumer_task: Optional[asyncio.Task] = None
|
| 80 |
+
|
| 81 |
+
# Start async loop in separate thread
|
| 82 |
+
self.async_thread = threading.Thread(target=self._run_async_loop, daemon=True)
|
| 83 |
+
self.async_thread.start()
|
| 84 |
+
|
| 85 |
+
# schedule tts consumer creation inside the async loop
|
| 86 |
+
def _start_consumer():
|
| 87 |
+
self._tts_consumer_task = asyncio.create_task(self._tts_consumer())
|
| 88 |
+
self.async_loop.call_soon_threadsafe(_start_consumer)
|
| 89 |
+
|
| 90 |
+
self.stt_threads: Dict[str, threading.Thread] = {}
|
| 91 |
+
|
| 92 |
+
# Per-language restart events (used to tell threads when to start new streams)
|
| 93 |
+
self.restart_events: Dict[str, threading.Event] = {
|
| 94 |
+
"en-US": threading.Event(),
|
| 95 |
+
"fr-FR": threading.Event(),
|
| 96 |
+
}
|
| 97 |
+
|
| 98 |
+
# Per-language stream started flag
|
| 99 |
+
self._stream_started = {"en-US": False, "fr-FR": False}
|
| 100 |
+
|
| 101 |
+
# **NEW**: per-language cancel events to force request_generator to stop
|
| 102 |
+
self.stream_cancel_events: Dict[str, threading.Event] = {
|
| 103 |
+
"en-US": threading.Event(),
|
| 104 |
+
"fr-FR": threading.Event(),
|
| 105 |
+
}
|
| 106 |
+
|
| 107 |
+
# Diagnostics
|
| 108 |
+
self._tts_job_counter = 0
|
| 109 |
+
|
| 110 |
+
def _run_async_loop(self):
|
| 111 |
+
asyncio.set_event_loop(self.async_loop)
|
| 112 |
+
try:
|
| 113 |
+
self.async_loop.run_forever()
|
| 114 |
+
except Exception as e:
|
| 115 |
+
print("[async_loop] stopped with error:", e)
|
| 116 |
+
|
| 117 |
+
# ---------------------------
|
| 118 |
+
# Audio capture
|
| 119 |
+
# ---------------------------
|
| 120 |
+
def _record_audio(self):
|
| 121 |
+
try:
|
| 122 |
+
stream = self.pyaudio_instance.open(
|
| 123 |
+
format=pyaudio.paInt16,
|
| 124 |
+
channels=1,
|
| 125 |
+
rate=self.audio_rate,
|
| 126 |
+
input=True,
|
| 127 |
+
frames_per_buffer=self.audio_chunk,
|
| 128 |
+
)
|
| 129 |
+
print("🎤 Recording started...")
|
| 130 |
+
|
| 131 |
+
while self.is_recording:
|
| 132 |
+
if self.speaking_event.is_set():
|
| 133 |
+
time.sleep(0.01)
|
| 134 |
+
continue
|
| 135 |
+
|
| 136 |
+
try:
|
| 137 |
+
data = stream.read(self.audio_chunk, exception_on_overflow=False)
|
| 138 |
+
except Exception as e:
|
| 139 |
+
print(f"[recorder] read error: {e}")
|
| 140 |
+
continue
|
| 141 |
+
|
| 142 |
+
if not data:
|
| 143 |
+
continue
|
| 144 |
+
|
| 145 |
+
self.prebuffer.append(data)
|
| 146 |
+
self.lang_queues["en-US"].put(data)
|
| 147 |
+
self.lang_queues["fr-FR"].put(data)
|
| 148 |
+
|
| 149 |
+
try:
|
| 150 |
+
stream.stop_stream()
|
| 151 |
+
stream.close()
|
| 152 |
+
except Exception:
|
| 153 |
+
pass
|
| 154 |
+
print("🎤 Recording stopped.")
|
| 155 |
+
except Exception as e:
|
| 156 |
+
print(f"[recorder] fatal: {e}")
|
| 157 |
+
|
| 158 |
+
# ---------------------------
|
| 159 |
+
# TTS streaming (ElevenLabs) - async
|
| 160 |
+
# ---------------------------
|
| 161 |
+
async def _stream_tts(self, text: str):
|
| 162 |
+
uri = (
|
| 163 |
+
f"wss://api.elevenlabs.io/v1/text-to-speech/{self.voice_id}"
|
| 164 |
+
f"/stream-input?model_id=eleven_flash_v2_5&output_format=pcm_16000"
|
| 165 |
+
)
|
| 166 |
+
tts_audio_stream = None
|
| 167 |
+
websocket = None
|
| 168 |
+
try:
|
| 169 |
+
# Mark speaking and set event so recorder & STT pause
|
| 170 |
+
self.is_speaking = True
|
| 171 |
+
self.speaking_event.set()
|
| 172 |
+
# print(f"[{time.strftime('%H:%M:%S')}] [tts] speaking -> True")
|
| 173 |
+
|
| 174 |
+
# Clear queued frames to avoid replay; we'll re-inject prebuffer after we cancel streams
|
| 175 |
+
for q in self.lang_queues.values():
|
| 176 |
+
with q.mutex:
|
| 177 |
+
q.queue.clear()
|
| 178 |
+
|
| 179 |
+
websocket = await websockets.connect(uri)
|
| 180 |
+
await websocket.send(json.dumps({
|
| 181 |
+
"text": " ",
|
| 182 |
+
"voice_settings": {"stability": 0.5, "similarity_boost": 0.8},
|
| 183 |
+
"xi_api_key": self.elevenlabs_api_key,
|
| 184 |
+
}))
|
| 185 |
+
await websocket.send(json.dumps({"text": text, "try_trigger_generation": True}))
|
| 186 |
+
await websocket.send(json.dumps({"text": ""}))
|
| 187 |
+
|
| 188 |
+
tts_audio_stream = self.pyaudio_instance.open(
|
| 189 |
+
format=pyaudio.paInt16,
|
| 190 |
+
channels=1,
|
| 191 |
+
rate=16000,
|
| 192 |
+
output=True,
|
| 193 |
+
frames_per_buffer=1024,
|
| 194 |
+
)
|
| 195 |
+
|
| 196 |
+
prebuffer = bytearray()
|
| 197 |
+
playback_started = False
|
| 198 |
+
|
| 199 |
+
try:
|
| 200 |
+
while True:
|
| 201 |
+
try:
|
| 202 |
+
message = await asyncio.wait_for(websocket.recv(), timeout=8.0)
|
| 203 |
+
except asyncio.TimeoutError:
|
| 204 |
+
if playback_started:
|
| 205 |
+
break
|
| 206 |
+
else:
|
| 207 |
+
continue
|
| 208 |
+
|
| 209 |
+
if isinstance(message, bytes):
|
| 210 |
+
prebuffer.extend(message)
|
| 211 |
+
if not playback_started and len(prebuffer) >= 16000:
|
| 212 |
+
tts_audio_stream.write(bytes(prebuffer))
|
| 213 |
+
prebuffer.clear()
|
| 214 |
+
playback_started = True
|
| 215 |
+
elif playback_started:
|
| 216 |
+
tts_audio_stream.write(message)
|
| 217 |
+
continue
|
| 218 |
+
|
| 219 |
+
try:
|
| 220 |
+
data = json.loads(message)
|
| 221 |
+
except Exception:
|
| 222 |
+
continue
|
| 223 |
+
|
| 224 |
+
if data.get("audio"):
|
| 225 |
+
audio_bytes = base64.b64decode(data["audio"])
|
| 226 |
+
prebuffer.extend(audio_bytes)
|
| 227 |
+
if not playback_started and len(prebuffer) >= 16000:
|
| 228 |
+
tts_audio_stream.write(bytes(prebuffer))
|
| 229 |
+
prebuffer.clear()
|
| 230 |
+
playback_started = True
|
| 231 |
+
elif playback_started:
|
| 232 |
+
tts_audio_stream.write(audio_bytes)
|
| 233 |
+
elif data.get("isFinal"):
|
| 234 |
+
break
|
| 235 |
+
elif data.get("error"):
|
| 236 |
+
print("TTS error:", data["error"])
|
| 237 |
+
break
|
| 238 |
+
|
| 239 |
+
if prebuffer:
|
| 240 |
+
tts_audio_stream.write(bytes(prebuffer))
|
| 241 |
+
|
| 242 |
+
finally:
|
| 243 |
+
try:
|
| 244 |
+
await websocket.close()
|
| 245 |
+
except Exception:
|
| 246 |
+
pass
|
| 247 |
+
|
| 248 |
+
except Exception as e:
|
| 249 |
+
# print(f"[tts] error: {e}")
|
| 250 |
+
pass
|
| 251 |
+
finally:
|
| 252 |
+
if tts_audio_stream:
|
| 253 |
+
try:
|
| 254 |
+
tts_audio_stream.stop_stream()
|
| 255 |
+
tts_audio_stream.close()
|
| 256 |
+
except Exception:
|
| 257 |
+
pass
|
| 258 |
+
|
| 259 |
+
# **NEW**: force the STT request generators to exit by setting cancel events.
|
| 260 |
+
# This makes streaming_recognize finish; threads will then wait for restart_events
|
| 261 |
+
# and start fresh streams.
|
| 262 |
+
for lang, ev in self.stream_cancel_events.items():
|
| 263 |
+
ev.set()
|
| 264 |
+
# print(f"[{time.strftime('%H:%M:%S')}] [cancel] set -> {lang}")
|
| 265 |
+
|
| 266 |
+
# Now re-inject prebuffer so new streams start with warm audio
|
| 267 |
+
pre_list = list(self.prebuffer)
|
| 268 |
+
if pre_list:
|
| 269 |
+
print(f"[{time.strftime('%H:%M:%S')}] [prebuffer] re-injecting {len(pre_list)} chunks into queues")
|
| 270 |
+
for chunk in pre_list:
|
| 271 |
+
self.lang_queues["en-US"].put(chunk)
|
| 272 |
+
self.lang_queues["fr-FR"].put(chunk)
|
| 273 |
+
|
| 274 |
+
# Clear speaking state and signal STT threads to restart (robustly)
|
| 275 |
+
self.is_speaking = False
|
| 276 |
+
self.speaking_event.clear()
|
| 277 |
+
# print(f"[{time.strftime('%H:%M:%S')}] [tts] speaking -> False")
|
| 278 |
+
|
| 279 |
+
# Primary restart: set both events
|
| 280 |
+
for lang, ev in self.restart_events.items():
|
| 281 |
+
ev.set()
|
| 282 |
+
# print(f"[{time.strftime('%H:%M:%S')}] [restart] set -> {lang}")
|
| 283 |
+
|
| 284 |
+
await asyncio.sleep(0.25)
|
| 285 |
+
for lang, ev in self.restart_events.items():
|
| 286 |
+
ev.set()
|
| 287 |
+
await asyncio.sleep(0.25)
|
| 288 |
+
|
| 289 |
+
# ---------------------------
|
| 290 |
+
# TTS consumer (serializes TTS)
|
| 291 |
+
# ---------------------------
|
| 292 |
+
async def _tts_consumer(self):
|
| 293 |
+
print("[tts_consumer] started")
|
| 294 |
+
while True:
|
| 295 |
+
item = await self._tts_queue.get()
|
| 296 |
+
if item is None:
|
| 297 |
+
print("[tts_consumer] shutdown sentinel received")
|
| 298 |
+
break
|
| 299 |
+
text = item.get("text", "")
|
| 300 |
+
self._tts_job_counter += 1
|
| 301 |
+
job_id = self._tts_job_counter
|
| 302 |
+
print(f"[tts_consumer] job #{job_id} dequeued (len={len(text)})")
|
| 303 |
+
try:
|
| 304 |
+
await asyncio.wait_for(self._stream_tts(text), timeout=35.0)
|
| 305 |
+
except asyncio.TimeoutError:
|
| 306 |
+
print(f"[tts_consumer] job #{job_id} _stream_tts timed out; proceeding.")
|
| 307 |
+
except Exception as e:
|
| 308 |
+
print(f"[tts_consumer] job #{job_id} error during _stream_tts: {e}")
|
| 309 |
+
finally:
|
| 310 |
+
await asyncio.sleep(0.05)
|
| 311 |
+
print("[tts_consumer] exiting")
|
| 312 |
+
|
| 313 |
+
# ---------------------------
|
| 314 |
+
# Translation & TTS trigger
|
| 315 |
+
# ---------------------------
|
| 316 |
+
async def _process_result(self, transcript: str, confidence: float, language: str):
|
| 317 |
+
lang_flag = "🇫🇷" if language == "fr-FR" else "🇬🇧"
|
| 318 |
+
print(f"{lang_flag} Heard ({language}, conf {confidence:.2f}): {transcript}")
|
| 319 |
+
|
| 320 |
+
# echo suppression vs last TTS in same language
|
| 321 |
+
if language == "fr-FR":
|
| 322 |
+
if transcript.strip().lower() == self.last_tts_text_fr.strip().lower():
|
| 323 |
+
print(" (echo suppressed)")
|
| 324 |
+
return
|
| 325 |
+
else:
|
| 326 |
+
if transcript.strip().lower() == self.last_tts_text_en.strip().lower():
|
| 327 |
+
print(" (echo suppressed)")
|
| 328 |
+
return
|
| 329 |
+
|
| 330 |
+
try:
|
| 331 |
+
if language == "fr-FR":
|
| 332 |
+
translated = self.deepl_client.translate_text(transcript, target_lang="EN-US").text
|
| 333 |
+
print(f"🌐 FR → EN: {translated}")
|
| 334 |
+
await self._tts_queue.put({"text": translated, "source_lang": language})
|
| 335 |
+
self.last_tts_text_en = translated
|
| 336 |
+
else:
|
| 337 |
+
translated = self.deepl_client.translate_text(transcript, target_lang="FR").text
|
| 338 |
+
print(f"🌐 EN → FR: {translated}")
|
| 339 |
+
await self._tts_queue.put({"text": translated, "source_lang": language})
|
| 340 |
+
self.last_tts_text_fr = translated
|
| 341 |
+
print("🔊 Queued for speaking...")
|
| 342 |
+
except Exception as e:
|
| 343 |
+
print(f"Translation error: {e}")
|
| 344 |
+
|
| 345 |
+
# ---------------------------
|
| 346 |
+
# STT streaming (run per language)
|
| 347 |
+
# ---------------------------
|
| 348 |
+
def _run_stt_stream(self, language: str):
|
| 349 |
+
print(f"[stt:{language}] Thread starting, thread_id={threading.get_ident()}")
|
| 350 |
+
self._stream_started[language] = False
|
| 351 |
+
|
| 352 |
+
while self.is_recording:
|
| 353 |
+
try:
|
| 354 |
+
if self._stream_started[language]:
|
| 355 |
+
print(f"[{time.strftime('%H:%M:%S')}] [stt:{language}] Waiting for restart signal...")
|
| 356 |
+
signaled = self.restart_events[language].wait(timeout=30)
|
| 357 |
+
if not signaled and self.is_recording:
|
| 358 |
+
print(f"[{time.strftime('%H:%M:%S')}] [stt:{language}] Timeout waiting for restart, restarting anyway")
|
| 359 |
+
if not self.is_recording:
|
| 360 |
+
break
|
| 361 |
+
try:
|
| 362 |
+
self.restart_events[language].clear()
|
| 363 |
+
except Exception:
|
| 364 |
+
pass
|
| 365 |
+
time.sleep(0.01)
|
| 366 |
+
|
| 367 |
+
self._stream_started[language] = True
|
| 368 |
+
print(f"[{time.strftime('%H:%M:%S')}] [stt:{language}] Starting new stream...")
|
| 369 |
+
|
| 370 |
+
config = speech.RecognitionConfig(
|
| 371 |
+
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
|
| 372 |
+
sample_rate_hertz=self.audio_rate,
|
| 373 |
+
language_code=language,
|
| 374 |
+
enable_automatic_punctuation=True,
|
| 375 |
+
model="latest_short",
|
| 376 |
+
)
|
| 377 |
+
streaming_config = speech.StreamingRecognitionConfig(
|
| 378 |
+
config=config,
|
| 379 |
+
interim_results=True,
|
| 380 |
+
single_utterance=False,
|
| 381 |
+
)
|
| 382 |
+
|
| 383 |
+
# Request generator yields StreamingRecognizeRequest messages
|
| 384 |
+
def request_generator():
|
| 385 |
+
while self.is_recording:
|
| 386 |
+
# If TTS is playing, skip sending mic frames to STT
|
| 387 |
+
if self.speaking_event.is_set():
|
| 388 |
+
time.sleep(0.01)
|
| 389 |
+
continue
|
| 390 |
+
# If cancel event set, clear and break to end stream
|
| 391 |
+
if self.stream_cancel_events[language].is_set():
|
| 392 |
+
# print(f"[{time.strftime('%H:%M:%S')}] [stt:{language}] request_generator observed cancel -> exiting generator")
|
| 393 |
+
try:
|
| 394 |
+
self.stream_cancel_events[language].clear()
|
| 395 |
+
except Exception:
|
| 396 |
+
pass
|
| 397 |
+
break
|
| 398 |
+
try:
|
| 399 |
+
chunk = self.lang_queues[language].get(timeout=1.0)
|
| 400 |
+
except queue.Empty:
|
| 401 |
+
continue
|
| 402 |
+
yield speech.StreamingRecognizeRequest(audio_content=chunk)
|
| 403 |
+
|
| 404 |
+
responses = self.stt_client.streaming_recognize(streaming_config, request_generator())
|
| 405 |
+
|
| 406 |
+
response_count = 0
|
| 407 |
+
final_received = False
|
| 408 |
+
|
| 409 |
+
for response in responses:
|
| 410 |
+
if not self.is_recording:
|
| 411 |
+
print(f"[stt:{language}] Stopped by user")
|
| 412 |
+
break
|
| 413 |
+
if not response.results:
|
| 414 |
+
continue
|
| 415 |
+
|
| 416 |
+
response_count += 1
|
| 417 |
+
for result in response.results:
|
| 418 |
+
if not result.alternatives:
|
| 419 |
+
continue
|
| 420 |
+
alt = result.alternatives[0]
|
| 421 |
+
transcript = alt.transcript.strip()
|
| 422 |
+
conf = getattr(alt, "confidence", 0.0)
|
| 423 |
+
is_final = bool(result.is_final)
|
| 424 |
+
|
| 425 |
+
if is_final:
|
| 426 |
+
now = time.strftime("%H:%M:%S")
|
| 427 |
+
print(f"[{now}] [stt:{language}] → '{transcript}' (final={is_final}, conf={conf:.2f})")
|
| 428 |
+
if conf < self.min_confidence_threshold:
|
| 429 |
+
print(f"[{now}] [stt:{language}] Final received but confidence {conf:.2f} < threshold -> suppressed")
|
| 430 |
+
continue
|
| 431 |
+
|
| 432 |
+
if language == "fr-FR" and transcript.strip().lower() == self.last_tts_text_fr.strip().lower():
|
| 433 |
+
print(f"[{now}] [stt:{language}] (echo suppressed - matches last_tts_text_fr)")
|
| 434 |
+
continue
|
| 435 |
+
if language == "en-US" and transcript.strip().lower() == self.last_tts_text_en.strip().lower():
|
| 436 |
+
print(f"[{now}] [stt:{language}] (echo suppressed - matches last_tts_text_en)")
|
| 437 |
+
continue
|
| 438 |
+
|
| 439 |
+
asyncio.run_coroutine_threadsafe(
|
| 440 |
+
self._process_result(transcript, conf, language),
|
| 441 |
+
self.async_loop
|
| 442 |
+
)
|
| 443 |
+
final_received = True
|
| 444 |
+
break
|
| 445 |
+
|
| 446 |
+
if final_received:
|
| 447 |
+
break
|
| 448 |
+
|
| 449 |
+
print(f"[stt:{language}] Stream ended after {response_count} responses")
|
| 450 |
+
|
| 451 |
+
if self.is_recording and final_received:
|
| 452 |
+
print(f"[{time.strftime('%H:%M:%S')}] [stt:{language}] Final result processed. Waiting for TTS to complete and signal restart.")
|
| 453 |
+
elif self.is_recording and not final_received:
|
| 454 |
+
print(f"[stt:{language}] Stream ended unexpectedly, reconnecting...")
|
| 455 |
+
time.sleep(0.5)
|
| 456 |
+
else:
|
| 457 |
+
break
|
| 458 |
+
|
| 459 |
+
except Exception as e:
|
| 460 |
+
if self.is_recording:
|
| 461 |
+
import traceback
|
| 462 |
+
print(f"[stt:{language}] Error: {e}")
|
| 463 |
+
print(traceback.format_exc())
|
| 464 |
+
time.sleep(1.0)
|
| 465 |
+
else:
|
| 466 |
+
break
|
| 467 |
+
|
| 468 |
+
print(f"[stt:{language}] Thread exiting")
|
| 469 |
+
|
| 470 |
+
# ---------------------------
|
| 471 |
+
# Control
|
| 472 |
+
# ---------------------------
|
| 473 |
+
def start_translation(self):
|
| 474 |
+
if self.is_recording:
|
| 475 |
+
print("Already recording!")
|
| 476 |
+
return
|
| 477 |
+
self.is_recording = True
|
| 478 |
+
self.last_processed_transcript = ""
|
| 479 |
+
|
| 480 |
+
for ev in self.restart_events.values():
|
| 481 |
+
try:
|
| 482 |
+
ev.clear()
|
| 483 |
+
except Exception:
|
| 484 |
+
pass
|
| 485 |
+
self.speaking_event.clear()
|
| 486 |
+
|
| 487 |
+
for q in self.lang_queues.values():
|
| 488 |
+
with q.mutex:
|
| 489 |
+
q.queue.clear()
|
| 490 |
+
|
| 491 |
+
self.recording_thread = threading.Thread(target=self._record_audio, daemon=True)
|
| 492 |
+
self.recording_thread.start()
|
| 493 |
+
|
| 494 |
+
for lang in ("en-US", "fr-FR"):
|
| 495 |
+
t = threading.Thread(target=self._run_stt_stream, args=(lang,), daemon=True)
|
| 496 |
+
self.stt_threads[lang] = t
|
| 497 |
+
t.start()
|
| 498 |
+
print(f"[main] STT thread {lang} started: {t.is_alive()} at {time.strftime('%H:%M:%S')}")
|
| 499 |
+
|
| 500 |
+
for ev in self.restart_events.values():
|
| 501 |
+
ev.set()
|
| 502 |
+
|
| 503 |
+
def stop_translation(self):
|
| 504 |
+
print("\n⏹️ Stopping translation...")
|
| 505 |
+
self.is_recording = False
|
| 506 |
+
for ev in self.restart_events.values():
|
| 507 |
+
ev.set()
|
| 508 |
+
self.speaking_event.clear()
|
| 509 |
+
|
| 510 |
+
if self._tts_consumer_task and not (self._tts_consumer_task.done() if hasattr(self._tts_consumer_task, 'done') else False):
|
| 511 |
+
try:
|
| 512 |
+
def _put_sentinel():
|
| 513 |
+
try:
|
| 514 |
+
self._tts_queue.put_nowait(None)
|
| 515 |
+
except Exception:
|
| 516 |
+
asyncio.create_task(self._tts_queue.put(None))
|
| 517 |
+
self.async_loop.call_soon_threadsafe(_put_sentinel)
|
| 518 |
+
except Exception:
|
| 519 |
+
pass
|
| 520 |
+
|
| 521 |
+
time.sleep(0.2)
|
| 522 |
+
|
| 523 |
+
def cleanup(self):
|
| 524 |
+
self.stop_translation()
|
| 525 |
+
try:
|
| 526 |
+
if self.async_loop.is_running():
|
| 527 |
+
def _stop_loop():
|
| 528 |
+
if self._tts_consumer_task and not self._tts_consumer_task.done():
|
| 529 |
+
try:
|
| 530 |
+
self._tts_queue.put_nowait(None)
|
| 531 |
+
except Exception:
|
| 532 |
+
pass
|
| 533 |
+
self.async_loop.stop()
|
| 534 |
+
self.async_loop.call_soon_threadsafe(_stop_loop)
|
| 535 |
+
except Exception:
|
| 536 |
+
pass
|
| 537 |
+
try:
|
| 538 |
+
self.pyaudio_instance.terminate()
|
| 539 |
+
except Exception:
|
| 540 |
+
pass
|
| 541 |
+
|
| 542 |
+
# -----------------------------------------------------------------------------
|
| 543 |
+
# Main entry
|
| 544 |
+
# -----------------------------------------------------------------------------
|
| 545 |
+
def main():
|
| 546 |
+
load_dotenv()
|
| 547 |
+
google_creds = os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
|
| 548 |
+
deepl_key = os.getenv("DEEPL_API_KEY")
|
| 549 |
+
eleven_key = os.getenv("ELEVENLABS_API_KEY")
|
| 550 |
+
voice_id = os.getenv("ELEVENLABS_VOICE_ID")
|
| 551 |
+
|
| 552 |
+
if not all([google_creds, deepl_key, eleven_key, voice_id]):
|
| 553 |
+
print("Missing API keys or credentials.")
|
| 554 |
+
return
|
| 555 |
+
|
| 556 |
+
translator = VoiceTranslator(deepl_key, eleven_key, voice_id)
|
| 557 |
+
print("Ready! Press ENTER to start, ENTER again to stop, Ctrl+C to quit.\n")
|
| 558 |
+
|
| 559 |
+
try:
|
| 560 |
+
while True:
|
| 561 |
+
input("Press ENTER to start speaking...")
|
| 562 |
+
translator.start_translation()
|
| 563 |
+
input("Press ENTER to stop...\n")
|
| 564 |
+
translator.stop_translation()
|
| 565 |
+
except KeyboardInterrupt:
|
| 566 |
+
print("\nKeyboardInterrupt received — cleaning up.")
|
| 567 |
+
translator.cleanup()
|
| 568 |
+
|
| 569 |
+
if __name__ == "__main__":
|
| 570 |
+
main()
|
mic_check.py
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import pyaudio
|
| 2 |
+
p = pyaudio.PyAudio()
|
| 3 |
+
print("Default input device index:", p.get_default_input_device_info()['index'])
|
| 4 |
+
print("Default input name:", p.get_default_input_device_info()['name'])
|
| 5 |
+
p.terminate()
|
requirements.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
google-cloud-speech
|
| 2 |
+
deepl
|
| 3 |
+
pyaudio
|
| 4 |
+
websockets
|
| 5 |
+
python-dotenv
|
working.py
ADDED
|
@@ -0,0 +1,334 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Real-Time French/English Voice Translator - FIXED VERSION v4.2
|
| 3 |
+
Improvements:
|
| 4 |
+
- Removed noisy [audio_gen]/[tts] prints
|
| 5 |
+
- Added TTS pre-buffer to eliminate start bursts
|
| 6 |
+
- Added silence-based auto-finalization when no STT final detected
|
| 7 |
+
- Switched to "latest_long" model for better segmentation
|
| 8 |
+
- Added echo suppression (skip self-spoken TTS text)
|
| 9 |
+
"""
|
| 10 |
+
|
| 11 |
+
import asyncio
|
| 12 |
+
import json
|
| 13 |
+
import queue
|
| 14 |
+
import threading
|
| 15 |
+
import time
|
| 16 |
+
from typing import Optional, Dict, List
|
| 17 |
+
import pyaudio
|
| 18 |
+
import websockets
|
| 19 |
+
from google.cloud import speech
|
| 20 |
+
import deepl
|
| 21 |
+
import os
|
| 22 |
+
from dotenv import load_dotenv
|
| 23 |
+
import base64
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
class VoiceTranslator:
|
| 27 |
+
def __init__(self, deepl_api_key: str, elevenlabs_api_key: str, elevenlabs_voice_id: str):
|
| 28 |
+
self.stt_client = speech.SpeechClient()
|
| 29 |
+
self.deepl_client = deepl.Translator(deepl_api_key)
|
| 30 |
+
self.elevenlabs_api_key = elevenlabs_api_key
|
| 31 |
+
self.voice_id = elevenlabs_voice_id
|
| 32 |
+
|
| 33 |
+
self.audio_rate = 16000
|
| 34 |
+
self.audio_chunk = 1024
|
| 35 |
+
|
| 36 |
+
self.audio_queue_en = queue.Queue()
|
| 37 |
+
self.audio_queue_fr = queue.Queue()
|
| 38 |
+
self.result_queue = queue.Queue()
|
| 39 |
+
self.is_recording = False
|
| 40 |
+
self.processing_lock = threading.Lock()
|
| 41 |
+
|
| 42 |
+
self.last_processed_transcript = ""
|
| 43 |
+
self.last_tts_text = ""
|
| 44 |
+
|
| 45 |
+
self.pyaudio_instance = pyaudio.PyAudio()
|
| 46 |
+
self.audio_stream = None
|
| 47 |
+
|
| 48 |
+
# ---------- AUDIO CAPTURE ----------
|
| 49 |
+
|
| 50 |
+
def _audio_generator(self, audio_queue: queue.Queue):
|
| 51 |
+
while self.is_recording:
|
| 52 |
+
try:
|
| 53 |
+
chunk = audio_queue.get(timeout=0.2)
|
| 54 |
+
if chunk:
|
| 55 |
+
yield chunk
|
| 56 |
+
except queue.Empty:
|
| 57 |
+
continue
|
| 58 |
+
|
| 59 |
+
def _record_audio(self):
|
| 60 |
+
try:
|
| 61 |
+
stream = self.pyaudio_instance.open(
|
| 62 |
+
format=pyaudio.paInt16,
|
| 63 |
+
channels=1,
|
| 64 |
+
rate=self.audio_rate,
|
| 65 |
+
input=True,
|
| 66 |
+
frames_per_buffer=self.audio_chunk,
|
| 67 |
+
)
|
| 68 |
+
print("🎤 Recording started...")
|
| 69 |
+
while self.is_recording:
|
| 70 |
+
try:
|
| 71 |
+
data = stream.read(self.audio_chunk, exception_on_overflow=False)
|
| 72 |
+
if not data:
|
| 73 |
+
continue
|
| 74 |
+
self.audio_queue_en.put(data)
|
| 75 |
+
self.audio_queue_fr.put(data)
|
| 76 |
+
except Exception as e:
|
| 77 |
+
print(f"[recorder] error: {e}")
|
| 78 |
+
break
|
| 79 |
+
stream.stop_stream()
|
| 80 |
+
stream.close()
|
| 81 |
+
print("🎤 Recording stopped.")
|
| 82 |
+
except Exception as e:
|
| 83 |
+
print(f"[recorder] fatal: {e}")
|
| 84 |
+
|
| 85 |
+
# ---------- TEXT TO SPEECH ----------
|
| 86 |
+
|
| 87 |
+
async def _stream_tts(self, text: str):
|
| 88 |
+
"""Stream TTS with small pre-buffer to smooth playback."""
|
| 89 |
+
uri = (
|
| 90 |
+
f"wss://api.elevenlabs.io/v1/text-to-speech/{self.voice_id}"
|
| 91 |
+
f"/stream-input?model_id=eleven_flash_v2_5&output_format=pcm_16000"
|
| 92 |
+
)
|
| 93 |
+
|
| 94 |
+
try:
|
| 95 |
+
async with websockets.connect(uri) as websocket:
|
| 96 |
+
await websocket.send(json.dumps({
|
| 97 |
+
"text": " ",
|
| 98 |
+
"voice_settings": {"stability": 0.5, "similarity_boost": 0.8},
|
| 99 |
+
"xi_api_key": self.elevenlabs_api_key,
|
| 100 |
+
}))
|
| 101 |
+
await websocket.send(json.dumps({"text": text, "try_trigger_generation": True}))
|
| 102 |
+
await websocket.send(json.dumps({"text": ""}))
|
| 103 |
+
|
| 104 |
+
if self.audio_stream is None:
|
| 105 |
+
self.audio_stream = self.pyaudio_instance.open(
|
| 106 |
+
format=pyaudio.paInt16,
|
| 107 |
+
channels=1,
|
| 108 |
+
rate=16000,
|
| 109 |
+
output=True,
|
| 110 |
+
frames_per_buffer=1024,
|
| 111 |
+
)
|
| 112 |
+
|
| 113 |
+
prebuffer = bytearray()
|
| 114 |
+
playback_started = False
|
| 115 |
+
last_chunk_time = time.time()
|
| 116 |
+
|
| 117 |
+
async for message in websocket:
|
| 118 |
+
if isinstance(message, bytes):
|
| 119 |
+
prebuffer.extend(message)
|
| 120 |
+
# Start playback after ~0.5 s of audio buffered
|
| 121 |
+
if not playback_started and len(prebuffer) >= 16000:
|
| 122 |
+
self.audio_stream.write(bytes(prebuffer))
|
| 123 |
+
prebuffer.clear()
|
| 124 |
+
playback_started = True
|
| 125 |
+
elif playback_started:
|
| 126 |
+
self.audio_stream.write(message)
|
| 127 |
+
last_chunk_time = time.time()
|
| 128 |
+
continue
|
| 129 |
+
|
| 130 |
+
try:
|
| 131 |
+
data = json.loads(message)
|
| 132 |
+
except Exception:
|
| 133 |
+
continue
|
| 134 |
+
|
| 135 |
+
if data.get("audio"):
|
| 136 |
+
audio_bytes = base64.b64decode(data["audio"])
|
| 137 |
+
prebuffer.extend(audio_bytes)
|
| 138 |
+
if not playback_started and len(prebuffer) >= 16000:
|
| 139 |
+
self.audio_stream.write(bytes(prebuffer))
|
| 140 |
+
prebuffer.clear()
|
| 141 |
+
playback_started = True
|
| 142 |
+
elif playback_started:
|
| 143 |
+
self.audio_stream.write(audio_bytes)
|
| 144 |
+
last_chunk_time = time.time()
|
| 145 |
+
elif data.get("isFinal"):
|
| 146 |
+
break
|
| 147 |
+
elif data.get("error"):
|
| 148 |
+
print("TTS error:", data["error"])
|
| 149 |
+
break
|
| 150 |
+
|
| 151 |
+
if prebuffer:
|
| 152 |
+
self.audio_stream.write(bytes(prebuffer))
|
| 153 |
+
except Exception as e:
|
| 154 |
+
print(f"[tts] error: {e}")
|
| 155 |
+
|
| 156 |
+
# ---------- TRANSLATION ----------
|
| 157 |
+
|
| 158 |
+
async def _process_result(self, transcript: str, confidence: Optional[float], language: str):
|
| 159 |
+
lang_flag = "🇫🇷" if language == "fr-FR" else "🇬🇧"
|
| 160 |
+
conf_display = f"{confidence:.2f}" if confidence is not None else "n/a"
|
| 161 |
+
print(f"{lang_flag} Heard ({language}, conf {conf_display}): {transcript}")
|
| 162 |
+
|
| 163 |
+
# Simple echo suppression
|
| 164 |
+
if transcript.strip().lower() == self.last_tts_text.strip().lower():
|
| 165 |
+
return
|
| 166 |
+
|
| 167 |
+
try:
|
| 168 |
+
if language == "fr-FR":
|
| 169 |
+
translated = self.deepl_client.translate_text(transcript, target_lang="EN-US").text
|
| 170 |
+
print(f"🌐 FR → EN: {translated}")
|
| 171 |
+
else:
|
| 172 |
+
translated = self.deepl_client.translate_text(transcript, target_lang="FR").text
|
| 173 |
+
print(f"🌐 EN → FR: {translated}")
|
| 174 |
+
|
| 175 |
+
self.last_tts_text = translated
|
| 176 |
+
print("🔊 Speaking...")
|
| 177 |
+
await self._stream_tts(translated)
|
| 178 |
+
print("✅ Done\n")
|
| 179 |
+
|
| 180 |
+
except Exception as e:
|
| 181 |
+
print(f"Translation error: {e}")
|
| 182 |
+
|
| 183 |
+
# ---------- STT STREAMING ----------
|
| 184 |
+
|
| 185 |
+
def _run_stt_stream(self, language: str, audio_queue: queue.Queue):
|
| 186 |
+
print(f"[stt] Thread start for {language}")
|
| 187 |
+
|
| 188 |
+
config = speech.RecognitionConfig(
|
| 189 |
+
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
|
| 190 |
+
sample_rate_hertz=self.audio_rate,
|
| 191 |
+
language_code=language,
|
| 192 |
+
enable_automatic_punctuation=True,
|
| 193 |
+
model="latest_long",
|
| 194 |
+
)
|
| 195 |
+
|
| 196 |
+
streaming_config = speech.StreamingRecognitionConfig(
|
| 197 |
+
config=config, interim_results=True, single_utterance=False
|
| 198 |
+
)
|
| 199 |
+
|
| 200 |
+
def requests():
|
| 201 |
+
for content in self._audio_generator(audio_queue):
|
| 202 |
+
yield speech.StreamingRecognizeRequest(audio_content=content)
|
| 203 |
+
|
| 204 |
+
try:
|
| 205 |
+
responses = self.stt_client.streaming_recognize(streaming_config, requests())
|
| 206 |
+
|
| 207 |
+
last_update_time = time.time()
|
| 208 |
+
current_text = ""
|
| 209 |
+
for response in responses:
|
| 210 |
+
if not self.is_recording:
|
| 211 |
+
break
|
| 212 |
+
if not response.results:
|
| 213 |
+
continue
|
| 214 |
+
|
| 215 |
+
for result in response.results:
|
| 216 |
+
if not result.alternatives:
|
| 217 |
+
continue
|
| 218 |
+
alt = result.alternatives[0]
|
| 219 |
+
transcript = alt.transcript.strip()
|
| 220 |
+
conf = getattr(alt, "confidence", None)
|
| 221 |
+
current_text = transcript
|
| 222 |
+
last_update_time = time.time()
|
| 223 |
+
|
| 224 |
+
self.result_queue.put({
|
| 225 |
+
"transcript": transcript,
|
| 226 |
+
"confidence": conf,
|
| 227 |
+
"language": language,
|
| 228 |
+
"is_final": bool(result.is_final),
|
| 229 |
+
})
|
| 230 |
+
|
| 231 |
+
# If we haven’t heard anything new for 1.2 s, flush it as “final”
|
| 232 |
+
if time.time() - last_update_time > 1.2 and current_text:
|
| 233 |
+
self.result_queue.put({
|
| 234 |
+
"transcript": current_text,
|
| 235 |
+
"confidence": 0.5,
|
| 236 |
+
"language": language,
|
| 237 |
+
"is_final": True,
|
| 238 |
+
})
|
| 239 |
+
current_text = ""
|
| 240 |
+
|
| 241 |
+
except Exception as e:
|
| 242 |
+
print(f"[stt:{language}] exception: {e}")
|
| 243 |
+
|
| 244 |
+
# ---------- RESULT AGGREGATION ----------
|
| 245 |
+
|
| 246 |
+
async def _process_results_queue(self):
|
| 247 |
+
while self.is_recording:
|
| 248 |
+
try:
|
| 249 |
+
r = self.result_queue.get(timeout=0.2)
|
| 250 |
+
if r["is_final"] and r["transcript"] != self.last_processed_transcript:
|
| 251 |
+
with self.processing_lock:
|
| 252 |
+
self.last_processed_transcript = r["transcript"]
|
| 253 |
+
await self._process_result(
|
| 254 |
+
r["transcript"], r.get("confidence"), r["language"]
|
| 255 |
+
)
|
| 256 |
+
await asyncio.sleep(0.01)
|
| 257 |
+
except queue.Empty:
|
| 258 |
+
await asyncio.sleep(0.05)
|
| 259 |
+
except Exception as e:
|
| 260 |
+
print("Queue error:", e)
|
| 261 |
+
await asyncio.sleep(0.1)
|
| 262 |
+
|
| 263 |
+
# ---------- CONTROL ----------
|
| 264 |
+
|
| 265 |
+
async def _run_dual_streams(self):
|
| 266 |
+
print("🔄 Dual-stream: English ⇄ French\n")
|
| 267 |
+
en_thread = threading.Thread(target=self._run_stt_stream, args=("en-US", self.audio_queue_en), daemon=True)
|
| 268 |
+
fr_thread = threading.Thread(target=self._run_stt_stream, args=("fr-FR", self.audio_queue_fr), daemon=True)
|
| 269 |
+
en_thread.start()
|
| 270 |
+
fr_thread.start()
|
| 271 |
+
await self._process_results_queue()
|
| 272 |
+
|
| 273 |
+
def start_translation(self):
|
| 274 |
+
if self.is_recording:
|
| 275 |
+
print("Already recording!")
|
| 276 |
+
return
|
| 277 |
+
self.is_recording = True
|
| 278 |
+
self.last_processed_transcript = ""
|
| 279 |
+
while not self.result_queue.empty():
|
| 280 |
+
try: self.result_queue.get_nowait()
|
| 281 |
+
except: break
|
| 282 |
+
threading.Thread(target=self._record_audio, daemon=True).start()
|
| 283 |
+
try:
|
| 284 |
+
asyncio.run(self._run_dual_streams())
|
| 285 |
+
except KeyboardInterrupt:
|
| 286 |
+
self.stop_translation()
|
| 287 |
+
|
| 288 |
+
def stop_translation(self):
|
| 289 |
+
print("\n⏹️ Stopping translation...")
|
| 290 |
+
self.is_recording = False
|
| 291 |
+
if self.audio_stream:
|
| 292 |
+
try:
|
| 293 |
+
self.audio_stream.stop_stream()
|
| 294 |
+
self.audio_stream.close()
|
| 295 |
+
except Exception:
|
| 296 |
+
pass
|
| 297 |
+
self.audio_stream = None
|
| 298 |
+
|
| 299 |
+
def cleanup(self):
|
| 300 |
+
self.stop_translation()
|
| 301 |
+
try:
|
| 302 |
+
self.pyaudio_instance.terminate()
|
| 303 |
+
except Exception:
|
| 304 |
+
pass
|
| 305 |
+
|
| 306 |
+
|
| 307 |
+
# ---------- MAIN ----------
|
| 308 |
+
|
| 309 |
+
def main():
|
| 310 |
+
load_dotenv()
|
| 311 |
+
google_creds = os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
|
| 312 |
+
deepl_key = os.getenv("DEEPL_API_KEY")
|
| 313 |
+
eleven_key = os.getenv("ELEVENLABS_API_KEY")
|
| 314 |
+
voice_id = os.getenv("ELEVENLABS_VOICE_ID")
|
| 315 |
+
|
| 316 |
+
if not all([google_creds, deepl_key, eleven_key, voice_id]):
|
| 317 |
+
print("Missing API keys or credentials.")
|
| 318 |
+
return
|
| 319 |
+
|
| 320 |
+
translator = VoiceTranslator(deepl_key, eleven_key, voice_id)
|
| 321 |
+
print("Ready! Press ENTER to start, ENTER again to stop, Ctrl+C to quit.\n")
|
| 322 |
+
|
| 323 |
+
try:
|
| 324 |
+
while True:
|
| 325 |
+
input("Press ENTER to start speaking...")
|
| 326 |
+
threading.Thread(target=translator.start_translation, daemon=True).start()
|
| 327 |
+
input("Press ENTER to stop...\n")
|
| 328 |
+
translator.stop_translation()
|
| 329 |
+
except KeyboardInterrupt:
|
| 330 |
+
translator.cleanup()
|
| 331 |
+
|
| 332 |
+
|
| 333 |
+
if __name__ == "__main__":
|
| 334 |
+
main()
|