Spaces:

majweldon
/

RealtimeTranslator

Sleeping

App Files Files Community

Mike W commited on Nov 3, 2025

Commit

ed699be

0 Parent(s):

Initial commit of real-time voice translator

Browse files

Files changed (8) hide show

.gitignore +9 -0
README.md +112 -0
app.py +579 -0
app_backup.py +597 -0
checkpoint_nov2.py +570 -0
mic_check.py +5 -0
requirements.txt +5 -0
working.py +334 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,9 @@

+# Environment variables
+.env
+# Python virtual environment
+venv/
+# Python cache
+__pycache__/
+*.pyc

README.md ADDED Viewed

	@@ -0,0 +1,112 @@

+# Real-Time English/French Voice Translator
+This project provides a real-time, bidirectional voice translation tool that runs in your terminal. Speak in English or French, and hear the translation in the other language almost instantly.
+It uses a combination of cutting-edge APIs for high-quality speech recognition, translation, and synthesis:
+- **Speech-to-Text (STT):** Google Cloud Speech-to-Text
+- **Translation:** DeepL API
+- **Text-to-Speech (TTS):** ElevenLabs API
+*(Note: You can replace this with a real GIF of the application in action.)*
+## Features
+- **Bidirectional Translation:** Simultaneously listens for both English and French and translates to the other language.
+- **Low Latency:** Built with `asyncio` and multithreading for a responsive, conversational experience.
+- **High-Quality Voice:** Leverages ElevenLabs for natural-sounding synthesized speech.
+- **Echo Suppression:** The translator is smart enough not to translate its own spoken output.
+- **Robust Streaming:** Automatically manages and restarts API connections to handle pauses in conversation.
+- **Simple CLI:** Easy to start and stop from the command line.
+## How It Works
+The application orchestrates several processes concurrently:
+1.  **Audio Capture:** A dedicated thread captures audio from your default microphone.
+2.  **Dual STT Streams:** The captured audio is fed into two separate Google Cloud STT streams in parallel: one configured for `en-US` and one for `fr-FR`.
+3.  **Transcription & Translation:** When either STT stream detects a final utterance, it's sent to the DeepL API for translation.
+4.  **Speech Synthesis:** The translated text is sent to the ElevenLabs streaming TTS API.
+5.  **Audio Playback:** The synthesized audio is played back through your speakers.
+To prevent the system from re-translating its own output, the application pauses microphone processing during TTS playback and intelligently discards any recognized text that matches its last-spoken phrase.
+## Requirements
+### 1. Software
+- Python 3.8+
+- `pip` and `venv`
+- **PortAudio:** This is a dependency for the `pyaudio` library.
+  - **macOS (via Homebrew):** `brew install portaudio`
+  - **Debian/Ubuntu:** `sudo apt-get install portaudio19-dev`
+  - **Windows:** `pyaudio` can often be installed via `pip` without manual PortAudio installation.
+### 2. API Keys
+You will need active accounts and API keys for the following services:
+- **Google Cloud:**
+  - A Google Cloud Platform project with the **Speech-to-Text API** enabled.
+  - A service account key file (`.json`).
+- **DeepL:**
+  - A DeepL API plan (the Free plan is sufficient for moderate use).
+- **ElevenLabs:**
+  - An ElevenLabs account. You will also need your **Voice ID** for the voice you wish to use.
+## Installation & Setup
+1.  **Clone the Repository**
+    ```bash
+    git clone <your-repository-url>
+    cd realtime-translator
+    ```
+2.  **Create a Virtual Environment**
+    ```bash
+    python -m venv venv
+    source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
+    ```
+3.  **Install Dependencies**
+    Create a `requirements.txt` file with the following content:
+    ```
+    pyaudio
+    websockets
+    google-cloud-speech
+    deepl
+    python-dotenv
+    ```
+    Then, install the packages:
+    ```bash
+    pip install -r requirements.txt
+    ```
+4.  **Configure Environment Variables**
+    Create a file named `.env` in the project root and add your credentials. This file is ignored by Git to keep your keys safe.
+    ```env
+    # Path to your Google Cloud service account JSON file
+    GOOGLE_APPLICATION_CREDENTIALS="C:/path/to/your/google-credentials.json"
+    # Your DeepL API Key
+    DEEPL_API_KEY="YOUR_DEEPL_API_KEY"
+    # Your ElevenLabs API Key and Voice ID
+    ELEVENLABS_API_KEY="YOUR_ELEVENLABS_API_KEY"
+    ELEVENLABS_VOICE_ID="YOUR_ELEVENLABS_VOICE_ID"
+    ```
+## Usage
+Once set up, run the main application script:
+```bash
+python app.py
+```
+The application will prompt you to press `ENTER` to start and stop the translation session.
+- Press `ENTER` to start recording.
+- Speak in either English or French.
+- Press `ENTER` again to stop the session.
+- Press `Ctrl+C` to quit the application gracefully.

app.py ADDED Viewed

	@@ -0,0 +1,579 @@

+#!/usr/bin/env python3
+"""
+Real-Time French/English Voice Translator — cleaned version
+Fixes applied:
+ - Fixed TTS echo caused by double-writing audio chunks
+ - Removed prebuffer re-injection that could cause echoes
+ - Added empty transcript filtering
+ - Added within-stream deduplication
+ - Removed unnecessary sleeps (reduced latency by ~900ms)
+ - Reduced TTS prebuffer from 1s to 0.5s for faster playback start
+ - Cleaned up diagnostic logging
+Keep your env vars:
+ - GOOGLE_APPLICATION_CREDENTIALS, DEEPL_API_KEY, ELEVENLABS_API_KEY, ELEVENLABS_VOICE_ID
+"""
+import asyncio
+import json
+import queue
+import threading
+import time
+import os
+import base64
+from collections import deque
+from typing import Dict, Optional
+import pyaudio
+import websockets
+from google.cloud import speech
+import deepl
+from dotenv import load_dotenv
+# -----------------------------------------------------------------------------
+# VoiceTranslator
+# -----------------------------------------------------------------------------
+class VoiceTranslator:
+    def __init__(self, deepl_api_key: str, elevenlabs_api_key: str, elevenlabs_voice_id: str):
+        # External clients
+        self.deepl_client = deepl.Translator(deepl_api_key)
+        self.elevenlabs_api_key = elevenlabs_api_key
+        self.voice_id = elevenlabs_voice_id
+        self.stt_client = speech.SpeechClient()
+        # Audio params
+        self.audio_rate = 16000
+        self.audio_chunk = 1024
+        # Per-language audio queues (raw mic frames)
+        self.lang_queues: Dict[str, queue.Queue] = {
+            "en-US": queue.Queue(),
+            "fr-FR": queue.Queue(),
+        }
+        # Small rolling prebuffer to avoid missing the first bits after a restart
+        self.prebuffer = deque(maxlen=12)
+        # State flags
+        self.is_recording = False
+        self.is_speaking = False
+        self.speaking_event = threading.Event()
+        # Deduplication
+        self.last_processed_transcript = ""
+        self.last_tts_text_en = ""
+        self.last_tts_text_fr = ""
+        # Threshold
+        self.min_confidence_threshold = 0.5
+        # PyAudio
+        self.pyaudio_instance = pyaudio.PyAudio()
+        self.audio_stream = None
+        # Threads + async
+        self.recording_thread: Optional[threading.Thread] = None
+        self.async_loop = asyncio.new_event_loop()
+        # TTS queue + consumer task
+        self._tts_queue: "asyncio.Queue[Optional[dict]]" = asyncio.Queue()
+        self._tts_consumer_task: Optional[asyncio.Task] = None
+        # Start async loop in separate thread
+        self.async_thread = threading.Thread(target=self._run_async_loop, daemon=True)
+        self.async_thread.start()
+        # schedule tts consumer creation inside the async loop
+        def _start_consumer():
+            self._tts_consumer_task = asyncio.create_task(self._tts_consumer())
+        self.async_loop.call_soon_threadsafe(_start_consumer)
+        self.stt_threads: Dict[str, threading.Thread] = {}
+        # Per-language restart events (used to tell threads when to start new streams)
+        self.restart_events: Dict[str, threading.Event] = {
+            "en-US": threading.Event(),
+            "fr-FR": threading.Event(),
+        }
+        # Per-language stream started flag
+        self._stream_started = {"en-US": False, "fr-FR": False}
+        # Per-language cancel events to force request_generator to stop
+        self.stream_cancel_events: Dict[str, threading.Event] = {
+            "en-US": threading.Event(),
+            "fr-FR": threading.Event(),
+        }
+        # Diagnostics
+        self._tts_job_counter = 0
+    def _run_async_loop(self):
+        asyncio.set_event_loop(self.async_loop)
+        try:
+            self.async_loop.run_forever()
+        except Exception as e:
+            print("[async_loop] stopped with error:", e)
+    # ---------------------------
+    # Audio capture
+    # ---------------------------
+    def _record_audio(self):
+        try:
+            stream = self.pyaudio_instance.open(
+                format=pyaudio.paInt16,
+                channels=1,
+                rate=self.audio_rate,
+                input=True,
+                frames_per_buffer=self.audio_chunk,
+            )
+            print("🎤 Recording started...")
+            while self.is_recording:
+                if self.speaking_event.is_set():
+                    time.sleep(0.01)
+                    continue
+                try:
+                    data = stream.read(self.audio_chunk, exception_on_overflow=False)
+                except Exception as e:
+                    print(f"[recorder] read error: {e}")
+                    continue
+                if not data:
+                    continue
+                self.prebuffer.append(data)
+                self.lang_queues["en-US"].put(data)
+                self.lang_queues["fr-FR"].put(data)
+            try:
+                stream.stop_stream()
+                stream.close()
+            except Exception:
+                pass
+            print("🎤 Recording stopped.")
+        except Exception as e:
+            print(f"[recorder] fatal: {e}")
+    # ---------------------------
+    # TTS streaming (ElevenLabs) - async
+    # ---------------------------
+    async def _stream_tts(self, text: str):
+        uri = (
+            f"wss://api.elevenlabs.io/v1/text-to-speech/{self.voice_id}"
+            f"/stream-input?model_id=eleven_flash_v2_5&output_format=pcm_16000"
+        )
+        tts_audio_stream = None
+        websocket = None
+        try:
+            # Mark speaking and set event so recorder & STT pause
+            self.is_speaking = True
+            self.speaking_event.set()
+            # Clear prebuffer to avoid re-injecting TTS audio later
+            self.prebuffer.clear()
+            # Clear queued frames to avoid replay
+            for q in self.lang_queues.values():
+                with q.mutex:
+                    q.queue.clear()
+            websocket = await websockets.connect(uri)
+            await websocket.send(json.dumps({
+                "text": " ",
+                "voice_settings": {"stability": 0.5, "similarity_boost": 0.8},
+                "xi_api_key": self.elevenlabs_api_key,
+            }))
+            await websocket.send(json.dumps({"text": text, "try_trigger_generation": True}))
+            await websocket.send(json.dumps({"text": ""}))
+            tts_audio_stream = self.pyaudio_instance.open(
+                format=pyaudio.paInt16,
+                channels=1,
+                rate=16000,
+                output=True,
+                frames_per_buffer=1024,
+            )
+            prebuffer = bytearray()
+            playback_started = False
+            try:
+                while True:
+                    try:
+                        message = await asyncio.wait_for(websocket.recv(), timeout=8.0)
+                    except asyncio.TimeoutError:
+                        if playback_started:
+                            break
+                        else:
+                            continue
+                    if isinstance(message, bytes):
+                        if not playback_started:
+                            prebuffer.extend(message)
+                            if len(prebuffer) >= 8000:
+                                tts_audio_stream.write(bytes(prebuffer))
+                                prebuffer.clear()
+                                playback_started = True
+                        else:
+                            tts_audio_stream.write(message)
+                        continue
+                    try:
+                        data = json.loads(message)
+                    except Exception:
+                        continue
+                    if data.get("audio"):
+                        audio_bytes = base64.b64decode(data["audio"])
+                        if not playback_started:
+                            prebuffer.extend(audio_bytes)
+                            if len(prebuffer) >= 8000:
+                                tts_audio_stream.write(bytes(prebuffer))
+                                prebuffer.clear()
+                                playback_started = True
+                        else:
+                            tts_audio_stream.write(audio_bytes)
+                    elif data.get("isFinal"):
+                        break
+                    elif data.get("error"):
+                        print("TTS error:", data["error"])
+                        break
+                # Handle case where playback never started (very short audio)
+                if prebuffer and not playback_started:
+                    tts_audio_stream.write(bytes(prebuffer))
+            finally:
+                try:
+                    await websocket.close()
+                except Exception:
+                    pass
+        except Exception as e:
+            pass
+        finally:
+            if tts_audio_stream:
+                try:
+                    tts_audio_stream.stop_stream()
+                    tts_audio_stream.close()
+                except Exception:
+                    pass
+            # Force the STT request generators to exit by setting cancel events
+            for lang, ev in self.stream_cancel_events.items():
+                ev.set()
+            # Don't re-inject prebuffer - just clear the queues and let fresh audio come in
+            for q in self.lang_queues.values():
+                with q.mutex:
+                    q.queue.clear()
+            # Clear speaking state and signal STT threads to restart
+            self.is_speaking = False
+            self.speaking_event.clear()
+            # Signal restart for both language streams
+            for lang, ev in self.restart_events.items():
+                ev.set()
+            await asyncio.sleep(0.1)
+    # ---------------------------
+    # TTS consumer (serializes TTS)
+    # ---------------------------
+    async def _tts_consumer(self):
+        print("[tts_consumer] started")
+        while True:
+            item = await self._tts_queue.get()
+            if item is None:
+                print("[tts_consumer] shutdown sentinel received")
+                break
+            text = item.get("text", "")
+            self._tts_job_counter += 1
+            job_id = self._tts_job_counter
+            print(f"[tts_consumer] job #{job_id} dequeued (len={len(text)})")
+            try:
+                await asyncio.wait_for(self._stream_tts(text), timeout=35.0)
+            except asyncio.TimeoutError:
+                print(f"[tts_consumer] job #{job_id} _stream_tts timed out; proceeding.")
+            except Exception as e:
+                print(f"[tts_consumer] job #{job_id} error during _stream_tts: {e}")
+            finally:
+                await asyncio.sleep(0.05)
+        print("[tts_consumer] exiting")
+    # ---------------------------
+    # Translation & TTS trigger
+    # ---------------------------
+    async def _process_result(self, transcript: str, confidence: float, language: str):
+        lang_flag = "🇫🇷" if language == "fr-FR" else "🇬🇧"
+        print(f"{lang_flag} Heard ({language}, conf {confidence:.2f}): {transcript}")
+        # echo suppression vs last TTS in same language
+        if language == "fr-FR":
+            if transcript.strip().lower() == self.last_tts_text_fr.strip().lower():
+                print("  (echo suppressed)")
+                return
+        else:
+            if transcript.strip().lower() == self.last_tts_text_en.strip().lower():
+                print("  (echo suppressed)")
+                return
+        try:
+            if language == "fr-FR":
+                translated = self.deepl_client.translate_text(transcript, target_lang="EN-US").text
+                print(f"🌐 FR → EN: {translated}")
+                await self._tts_queue.put({"text": translated, "source_lang": language})
+                self.last_tts_text_en = translated
+            else:
+                translated = self.deepl_client.translate_text(transcript, target_lang="FR").text
+                print(f"🌐 EN → FR: {translated}")
+                await self._tts_queue.put({"text": translated, "source_lang": language})
+                self.last_tts_text_fr = translated
+            print("🔊 Queued for speaking...")
+        except Exception as e:
+            print(f"Translation error: {e}")
+    # ---------------------------
+    # STT streaming (run per language)
+    # ---------------------------
+    def _run_stt_stream(self, language: str):
+        print(f"[stt:{language}] Thread starting, thread_id={threading.get_ident()}")
+        self._stream_started[language] = False
+        last_transcript_in_stream = ""
+        while self.is_recording:
+            try:
+                if self._stream_started[language]:
+                    print(f"[{time.strftime('%H:%M:%S')}] [stt:{language}] Waiting for restart signal...")
+                    signaled = self.restart_events[language].wait(timeout=30)
+                    if not signaled and self.is_recording:
+                        print(f"[{time.strftime('%H:%M:%S')}] [stt:{language}] Timeout waiting for restart, restarting anyway")
+                    if not self.is_recording:
+                        break
+                    try:
+                        self.restart_events[language].clear()
+                    except Exception:
+                        pass
+                    time.sleep(0.01)
+                self._stream_started[language] = True
+                last_transcript_in_stream = ""
+                print(f"[{time.strftime('%H:%M:%S')}] [stt:{language}] Starting new stream...")
+                config = speech.RecognitionConfig(
+                    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
+                    sample_rate_hertz=self.audio_rate,
+                    language_code=language,
+                    enable_automatic_punctuation=True,
+                    model="latest_short",
+                )
+                streaming_config = speech.StreamingRecognitionConfig(
+                    config=config,
+                    interim_results=True,
+                    single_utterance=False,
+                )
+                # Request generator yields StreamingRecognizeRequest messages
+                def request_generator():
+                    while self.is_recording:
+                        # If TTS is playing, skip sending mic frames to STT
+                        if self.speaking_event.is_set():
+                            time.sleep(0.01)
+                            continue
+                        # If cancel event set, clear and break to end stream
+                        if self.stream_cancel_events[language].is_set():
+                            try:
+                                self.stream_cancel_events[language].clear()
+                            except Exception:
+                                pass
+                            break
+                        try:
+                            chunk = self.lang_queues[language].get(timeout=1.0)
+                        except queue.Empty:
+                            continue
+                        yield speech.StreamingRecognizeRequest(audio_content=chunk)
+                responses = self.stt_client.streaming_recognize(streaming_config, request_generator())
+                response_count = 0
+                final_received = False
+                for response in responses:
+                    if not self.is_recording:
+                        print(f"[stt:{language}] Stopped by user")
+                        break
+                    if not response.results:
+                        continue
+                    response_count += 1
+                    for result in response.results:
+                        if not result.alternatives:
+                            continue
+                        alt = result.alternatives[0]
+                        transcript = alt.transcript.strip()
+                        conf = getattr(alt, "confidence", 0.0)
+                        is_final = bool(result.is_final)
+                        if is_final:
+                            now = time.strftime("%H:%M:%S")
+                            print(f"[{now}] [stt:{language}] → '{transcript}' (final={is_final}, conf={conf:.2f})")
+                            # Filter empty transcripts - don't break stream
+                            if not transcript or len(transcript.strip()) == 0:
+                                print(f"[{now}] [stt:{language}] Empty transcript -> ignoring, continuing stream")
+                                continue
+                            # Deduplicate within same stream
+                            if transcript.strip().lower() == last_transcript_in_stream.strip().lower():
+                                print(f"[{now}] [stt:{language}] Duplicate final in same stream -> suppressed")
+                                continue
+                            if conf < self.min_confidence_threshold:
+                                print(f"[{now}] [stt:{language}] Final received but confidence {conf:.2f} < threshold -> suppressed")
+                                continue
+                            last_transcript_in_stream = transcript
+                            if language == "fr-FR" and transcript.strip().lower() == self.last_tts_text_fr.strip().lower():
+                                print(f"[{now}] [stt:{language}] (echo suppressed - matches last_tts_text_fr)")
+                                continue
+                            if language == "en-US" and transcript.strip().lower() == self.last_tts_text_en.strip().lower():
+                                print(f"[{now}] [stt:{language}] (echo suppressed - matches last_tts_text_en)")
+                                continue
+                            asyncio.run_coroutine_threadsafe(
+                                self._process_result(transcript, conf, language),
+                                self.async_loop
+                            )
+                            final_received = True
+                            break
+                    if final_received:
+                        break
+                print(f"[stt:{language}] Stream ended after {response_count} responses")
+                if self.is_recording and final_received:
+                    print(f"[{time.strftime('%H:%M:%S')}] [stt:{language}] Final result processed. Waiting for TTS to complete and signal restart.")
+                elif self.is_recording and not final_received:
+                    print(f"[stt:{language}] Stream ended unexpectedly, reconnecting...")
+                    time.sleep(0.5)
+                else:
+                    break
+            except Exception as e:
+                if self.is_recording:
+                    import traceback
+                    print(f"[stt:{language}] Error: {e}")
+                    print(traceback.format_exc())
+                    time.sleep(1.0)
+                else:
+                    break
+        print(f"[stt:{language}] Thread exiting")
+    # ---------------------------
+    # Control
+    # ---------------------------
+    def start_translation(self):
+        if self.is_recording:
+            print("Already recording!")
+            return
+        self.is_recording = True
+        self.last_processed_transcript = ""
+        for ev in self.restart_events.values():
+            try:
+                ev.clear()
+            except Exception:
+                pass
+        self.speaking_event.clear()
+        for q in self.lang_queues.values():
+            with q.mutex:
+                q.queue.clear()
+        self.recording_thread = threading.Thread(target=self._record_audio, daemon=True)
+        self.recording_thread.start()
+        for lang in ("en-US", "fr-FR"):
+            t = threading.Thread(target=self._run_stt_stream, args=(lang,), daemon=True)
+            self.stt_threads[lang] = t
+            t.start()
+            print(f"[main] STT thread {lang} started: {t.is_alive()} at {time.strftime('%H:%M:%S')}")
+        for ev in self.restart_events.values():
+            ev.set()
+    def stop_translation(self):
+        print("\n⏹️  Stopping translation...")
+        self.is_recording = False
+        for ev in self.restart_events.values():
+            ev.set()
+        self.speaking_event.clear()
+        if self._tts_consumer_task and not (self._tts_consumer_task.done() if hasattr(self._tts_consumer_task, 'done') else False):
+            try:
+                def _put_sentinel():
+                    try:
+                        self._tts_queue.put_nowait(None)
+                    except Exception:
+                        asyncio.create_task(self._tts_queue.put(None))
+                self.async_loop.call_soon_threadsafe(_put_sentinel)
+            except Exception:
+                pass
+        time.sleep(0.2)
+    def cleanup(self):
+        self.stop_translation()
+        try:
+            if self.async_loop.is_running():
+                def _stop_loop():
+                    if self._tts_consumer_task and not self._tts_consumer_task.done():
+                        try:
+                            self._tts_queue.put_nowait(None)
+                        except Exception:
+                            pass
+                    self.async_loop.stop()
+                self.async_loop.call_soon_threadsafe(_stop_loop)
+        except Exception:
+            pass
+        try:
+            self.pyaudio_instance.terminate()
+        except Exception:
+            pass
+# -----------------------------------------------------------------------------
+# Main entry
+# -----------------------------------------------------------------------------
+def main():
+    load_dotenv()
+    google_creds = os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
+    deepl_key = os.getenv("DEEPL_API_KEY")
+    eleven_key = os.getenv("ELEVENLABS_API_KEY")
+    voice_id = os.getenv("ELEVENLABS_VOICE_ID")
+    if not all([google_creds, deepl_key, eleven_key, voice_id]):
+        print("Missing API keys or credentials.")
+        return
+    translator = VoiceTranslator(deepl_key, eleven_key, voice_id)
+    print("Ready! Press ENTER to start, ENTER again to stop, Ctrl+C to quit.\n")
+    try:
+        while True:
+            input("Press ENTER to start speaking...")
+            translator.start_translation()
+            input("Press ENTER to stop...\n")
+            translator.stop_translation()
+    except KeyboardInterrupt:
+        print("\nKeyboardInterrupt received — cleaning up.")
+        translator.cleanup()
+if __name__ == "__main__":
+    main()

app_backup.py ADDED Viewed

	@@ -0,0 +1,597 @@

+#!/usr/bin/env python3
+"""
+Real-Time French/English Voice Translator — patched single-file v2
+Changes from previous:
+ - Adds per-language stream_cancel_events that force the STT request_generator
+   to exit, allowing streaming_recognize to terminate and be restarted cleanly.
+ - _stream_tts sets the cancel events immediately after playback finishes (before
+   prebuffer re-injection and restart events).
+ - Request generator checks cancel event frequently and breaks to end the stream.
+Keep your env vars:
+ - GOOGLE_APPLICATION_CREDENTIALS, DEEPL_API_KEY, ELEVENLABS_API_KEY, ELEVENLABS_VOICE_ID
+"""
+import asyncio
+import json
+import queue
+import threading
+import time
+import os
+import base64
+from collections import deque
+from typing import Dict, Optional
+import pyaudio
+import websockets
+from google.cloud import speech
+import deepl
+from dotenv import load_dotenv
+# -----------------------------------------------------------------------------
+# VoiceTranslator
+# -----------------------------------------------------------------------------
+class VoiceTranslator:
+    def __init__(self, deepl_api_key: str, elevenlabs_api_key: str, elevenlabs_voice_id: str):
+        # External clients
+        self.deepl_client = deepl.Translator(deepl_api_key)
+        self.elevenlabs_api_key = elevenlabs_api_key
+        self.voice_id = elevenlabs_voice_id
+        self.stt_client = speech.SpeechClient()
+        # Audio params
+        self.audio_rate = 16000
+        self.audio_chunk = 1024
+        # Per-language audio queues (raw mic frames)
+        self.lang_queues: Dict[str, queue.Queue] = {
+            "en-US": queue.Queue(),
+            "fr-FR": queue.Queue(),
+        }
+        # Small rolling prebuffer to avoid missing the first bits after a restart
+        self.prebuffer = deque(maxlen=12)
+        # State flags
+        self.is_recording = False
+        self.is_speaking = False
+        self.speaking_event = threading.Event()
+        # Deduplication
+        self.last_processed_transcript = ""
+        self.last_tts_text_en = ""
+        self.last_tts_text_fr = ""
+        # Threshold
+        self.min_confidence_threshold = 0.5
+        # PyAudio
+        self.pyaudio_instance = pyaudio.PyAudio()
+        self.audio_stream = None
+        # Threads + async
+        self.recording_thread: Optional[threading.Thread] = None
+        self.async_loop = asyncio.new_event_loop()
+        # TTS queue + consumer task
+        self._tts_queue: "asyncio.Queue[Optional[dict]]" = asyncio.Queue()
+        self._tts_consumer_task: Optional[asyncio.Task] = None
+        # Start async loop in separate thread
+        self.async_thread = threading.Thread(target=self._run_async_loop, daemon=True)
+        self.async_thread.start()
+        # schedule tts consumer creation inside the async loop
+        def _start_consumer():
+            self._tts_consumer_task = asyncio.create_task(self._tts_consumer())
+        self.async_loop.call_soon_threadsafe(_start_consumer)
+        self.stt_threads: Dict[str, threading.Thread] = {}
+        # Per-language restart events (used to tell threads when to start new streams)
+        self.restart_events: Dict[str, threading.Event] = {
+            "en-US": threading.Event(),
+            "fr-FR": threading.Event(),
+        }
+        # Per-language stream started flag
+        self._stream_started = {"en-US": False, "fr-FR": False}
+        # **NEW**: per-language cancel events to force request_generator to stop
+        self.stream_cancel_events: Dict[str, threading.Event] = {
+            "en-US": threading.Event(),
+            "fr-FR": threading.Event(),
+        }
+        # Diagnostics
+        self._tts_job_counter = 0
+    def _run_async_loop(self):
+        asyncio.set_event_loop(self.async_loop)
+        try:
+            self.async_loop.run_forever()
+        except Exception as e:
+            print("[async_loop] stopped with error:", e)
+    # ---------------------------
+    # Audio capture
+    # ---------------------------
+    def _record_audio(self):
+        try:
+            stream = self.pyaudio_instance.open(
+                format=pyaudio.paInt16,
+                channels=1,
+                rate=self.audio_rate,
+                input=True,
+                frames_per_buffer=self.audio_chunk,
+            )
+            print("🎤 Recording started...")
+            while self.is_recording:
+                if self.speaking_event.is_set():
+                    time.sleep(0.01)
+                    continue
+                try:
+                    data = stream.read(self.audio_chunk, exception_on_overflow=False)
+                except Exception as e:
+                    print(f"[recorder] read error: {e}")
+                    continue
+                if not data:
+                    continue
+                self.prebuffer.append(data)
+                self.lang_queues["en-US"].put(data)
+                self.lang_queues["fr-FR"].put(data)
+            try:
+                stream.stop_stream()
+                stream.close()
+            except Exception:
+                pass
+            print("🎤 Recording stopped.")
+        except Exception as e:
+            print(f"[recorder] fatal: {e}")
+    # ---------------------------
+    # TTS streaming (ElevenLabs) - async
+    # ---------------------------
+    async def _stream_tts(self, text: str):
+        uri = (
+            f"wss://api.elevenlabs.io/v1/text-to-speech/{self.voice_id}"
+            f"/stream-input?model_id=eleven_flash_v2_5&output_format=pcm_16000"
+        )
+        tts_audio_stream = None
+        websocket = None
+        try:
+            # Mark speaking and set event so recorder & STT pause
+            self.is_speaking = True
+            self.speaking_event.set()
+            # print(f"[{time.strftime('%H:%M:%S')}] [tts] speaking -> True")
+            # Clear prebuffer to avoid re-injecting TTS audio later
+            self.prebuffer.clear()
+            # Clear queued frames to avoid replay; we'll re-inject prebuffer after we cancel streams
+            for q in self.lang_queues.values():
+                with q.mutex:
+                    q.queue.clear()
+            # Brief pause to ensure recorder sees speaking_event before we start TTS
+            await asyncio.sleep(0.1)
+            websocket = await websockets.connect(uri)
+            await websocket.send(json.dumps({
+                "text": " ",
+                "voice_settings": {"stability": 0.5, "similarity_boost": 0.8},
+                "xi_api_key": self.elevenlabs_api_key,
+            }))
+            await websocket.send(json.dumps({"text": text, "try_trigger_generation": True}))
+            await websocket.send(json.dumps({"text": ""}))
+            tts_audio_stream = self.pyaudio_instance.open(
+                format=pyaudio.paInt16,
+                channels=1,
+                rate=16000,
+                output=True,
+                frames_per_buffer=1024,
+            )
+            prebuffer = bytearray()
+            playback_started = False
+            try:
+                while True:
+                    try:
+                        message = await asyncio.wait_for(websocket.recv(), timeout=8.0)
+                    except asyncio.TimeoutError:
+                        if playback_started:
+                            break
+                        else:
+                            continue
+                    if isinstance(message, bytes):
+                        prebuffer.extend(message)
+                        if not playback_started and len(prebuffer) >= 8000:
+                            tts_audio_stream.write(bytes(prebuffer))
+                            prebuffer.clear()
+                            playback_started = True
+                        elif playback_started:
+                            tts_audio_stream.write(message)
+                        continue
+                    try:
+                        data = json.loads(message)
+                    except Exception:
+                        continue
+                    if data.get("audio"):
+                        audio_bytes = base64.b64decode(data["audio"])
+                        if not playback_started:
+                            prebuffer.extend(audio_bytes)
+                            if len(prebuffer) >= 16000:
+                                print(f"[tts] Starting playback, prebuffer size: {len(prebuffer)}")
+                                tts_audio_stream.write(bytes(prebuffer))
+                                prebuffer.clear()
+                                playback_started = True
+                        else:
+                            tts_audio_stream.write(audio_bytes)
+                    elif data.get("isFinal"):
+                        print(f"[tts] Received isFinal, prebuffer remaining: {len(prebuffer)}")
+                        break
+                    elif data.get("error"):
+                        print("TTS error:", data["error"])
+                        break
+                # Prebuffer should be empty after playback starts, but just in case
+                if prebuffer and not playback_started:
+                    print(f"[tts] Writing final prebuffer: {len(prebuffer)} bytes (playback never started)")
+                    tts_audio_stream.write(bytes(prebuffer))
+                elif prebuffer:
+                    print(f"[tts] WARNING: prebuffer has {len(prebuffer)} bytes after playback - this is a bug!")
+            finally:
+                try:
+                    await websocket.close()
+                except Exception:
+                    pass
+        except Exception as e:
+            # print(f"[tts] error: {e}")
+            pass
+        finally:
+            if tts_audio_stream:
+                try:
+                    tts_audio_stream.stop_stream()
+                    tts_audio_stream.close()
+                except Exception:
+                    pass
+            # **NEW**: force the STT request generators to exit by setting cancel events.
+            # This makes streaming_recognize finish; threads will then wait for restart_events
+            # and start fresh streams.
+            for lang, ev in self.stream_cancel_events.items():
+                ev.set()
+                # print(f"[{time.strftime('%H:%M:%S')}] [cancel] set -> {lang}")
+            # Don't re-inject prebuffer - it may contain TTS echo
+            # Just clear the queues and let fresh audio come in
+            for q in self.lang_queues.values():
+                with q.mutex:
+                    q.queue.clear()
+            # Wait for TTS audio to clear from environment (acoustic decay)
+            await asyncio.sleep(0.1)
+            # Clear speaking state and signal STT threads to restart (robustly)
+            self.is_speaking = False
+            self.speaking_event.clear()
+            # print(f"[{time.strftime('%H:%M:%S')}] [tts] speaking -> False")
+            # Primary restart: set both events
+            for lang, ev in self.restart_events.items():
+                ev.set()
+                # print(f"[{time.strftime('%H:%M:%S')}] [restart] set -> {lang}")
+            await asyncio.sleep(0.25)
+            for lang, ev in self.restart_events.items():
+                ev.set()
+            await asyncio.sleep(0.25)
+    # ---------------------------
+    # TTS consumer (serializes TTS)
+    # ---------------------------
+    async def _tts_consumer(self):
+        print("[tts_consumer] started")
+        while True:
+            item = await self._tts_queue.get()
+            if item is None:
+                print("[tts_consumer] shutdown sentinel received")
+                break
+            text = item.get("text", "")
+            self._tts_job_counter += 1
+            job_id = self._tts_job_counter
+            print(f"[tts_consumer] job #{job_id} dequeued: '{text}'")
+            try:
+                await asyncio.wait_for(self._stream_tts(text), timeout=35.0)
+            except asyncio.TimeoutError:
+                print(f"[tts_consumer] job #{job_id} _stream_tts timed out; proceeding.")
+            except Exception as e:
+                print(f"[tts_consumer] job #{job_id} error during _stream_tts: {e}")
+            finally:
+                await asyncio.sleep(0.05)
+        print("[tts_consumer] exiting")
+    # ---------------------------
+    # Translation & TTS trigger
+    # ---------------------------
+    async def _process_result(self, transcript: str, confidence: float, language: str):
+        lang_flag = "🇫🇷" if language == "fr-FR" else "🇬🇧"
+        print(f"{lang_flag} Heard ({language}, conf {confidence:.2f}): {transcript}")
+        # echo suppression vs last TTS in same language
+        if language == "fr-FR":
+            if transcript.strip().lower() == self.last_tts_text_fr.strip().lower():
+                print("  (echo suppressed)")
+                return
+        else:
+            if transcript.strip().lower() == self.last_tts_text_en.strip().lower():
+                print("  (echo suppressed)")
+                return
+        try:
+            if language == "fr-FR":
+                translated = self.deepl_client.translate_text(transcript, target_lang="EN-US").text
+                print(f"🌐 FR → EN: {translated}")
+                await self._tts_queue.put({"text": translated, "source_lang": language})
+                self.last_tts_text_en = translated
+            else:
+                translated = self.deepl_client.translate_text(transcript, target_lang="FR").text
+                print(f"🌐 EN → FR: {translated}")
+                await self._tts_queue.put({"text": translated, "source_lang": language})
+                self.last_tts_text_fr = translated
+            print("🔊 Queued for speaking...")
+        except Exception as e:
+            print(f"Translation error: {e}")
+    # ---------------------------
+    # STT streaming (run per language)
+    # ---------------------------
+    def _run_stt_stream(self, language: str):
+        print(f"[stt:{language}] Thread starting, thread_id={threading.get_ident()}")
+        self._stream_started[language] = False
+        last_transcript_in_stream = ""
+        while self.is_recording:
+            try:
+                if self._stream_started[language]:
+                    print(f"[{time.strftime('%H:%M:%S')}] [stt:{language}] Waiting for restart signal...")
+                    signaled = self.restart_events[language].wait(timeout=30)
+                    if not signaled and self.is_recording:
+                        print(f"[{time.strftime('%H:%M:%S')}] [stt:{language}] Timeout waiting for restart, restarting anyway")
+                    if not self.is_recording:
+                        break
+                    try:
+                        self.restart_events[language].clear()
+                    except Exception:
+                        pass
+                    time.sleep(0.01)
+                self._stream_started[language] = True
+                print(f"[{time.strftime('%H:%M:%S')}] [stt:{language}] Starting new stream...")
+                config = speech.RecognitionConfig(
+                    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
+                    sample_rate_hertz=self.audio_rate,
+                    language_code=language,
+                    enable_automatic_punctuation=True,
+                    model="latest_short",
+                )
+                streaming_config = speech.StreamingRecognitionConfig(
+                    config=config,
+                    interim_results=True,
+                    single_utterance=False,
+                )
+                # Request generator yields StreamingRecognizeRequest messages
+                def request_generator():
+                    while self.is_recording:
+                        # If TTS is playing, skip sending mic frames to STT
+                        if self.speaking_event.is_set():
+                            time.sleep(0.01)
+                            continue
+                        # If cancel event set, clear and break to end stream
+                        if self.stream_cancel_events[language].is_set():
+                            # print(f"[{time.strftime('%H:%M:%S')}] [stt:{language}] request_generator observed cancel -> exiting generator")
+                            try:
+                                self.stream_cancel_events[language].clear()
+                            except Exception:
+                                pass
+                            break
+                        try:
+                            chunk = self.lang_queues[language].get(timeout=1.0)
+                        except queue.Empty:
+                            continue
+                        yield speech.StreamingRecognizeRequest(audio_content=chunk)
+                responses = self.stt_client.streaming_recognize(streaming_config, request_generator())
+                response_count = 0
+                final_received = False
+                for response in responses:
+                    if not self.is_recording:
+                        print(f"[stt:{language}] Stopped by user")
+                        break
+                    if not response.results:
+                        continue
+                    response_count += 1
+                    for result in response.results:
+                        if not result.alternatives:
+                            continue
+                        alt = result.alternatives[0]
+                        transcript = alt.transcript.strip()
+                        conf = getattr(alt, "confidence", 0.0)
+                        is_final = bool(result.is_final)
+                        if is_final:
+                            now = time.strftime("%H:%M:%S")
+                            print(f"[{now}] [stt:{language}] → '{transcript}' (final={is_final}, conf={conf:.2f})")
+                            # Filter empty transcripts - don't break stream
+                            if not transcript or len(transcript.strip()) == 0:
+                                print(f"[{now}] [stt:{language}] Empty transcript -> ignoring, continuing stream")
+                                continue
+                            # Deduplicate within same stream
+                            if transcript.strip().lower() == last_transcript_in_stream.strip().lower():
+                                print(f"[{now}] [stt:{language}] Duplicate final in same stream -> suppressed")
+                                continue
+                            if conf < self.min_confidence_threshold:
+                                print(f"[{now}] [stt:{language}] Final received but confidence {conf:.2f} < threshold -> suppressed")
+                                continue
+                            if language == "fr-FR" and transcript.strip().lower() == self.last_tts_text_fr.strip().lower():
+                                print(f"[{now}] [stt:{language}] (echo suppressed - matches last_tts_text_fr)")
+                                continue
+                            if language == "en-US" and transcript.strip().lower() == self.last_tts_text_en.strip().lower():
+                                print(f"[{now}] [stt:{language}] (echo suppressed - matches last_tts_text_en)")
+                                continue
+                            asyncio.run_coroutine_threadsafe(
+                                self._process_result(transcript, conf, language),
+                                self.async_loop
+                            )
+                            final_received = True
+                            break
+                    if final_received:
+                        break
+                print(f"[stt:{language}] Stream ended after {response_count} responses")
+                if self.is_recording and final_received:
+                    print(f"[{time.strftime('%H:%M:%S')}] [stt:{language}] Final result processed. Waiting for TTS to complete and signal restart.")
+                elif self.is_recording and not final_received:
+                    print(f"[stt:{language}] Stream ended unexpectedly, reconnecting...")
+                    time.sleep(0.5)
+                else:
+                    break
+            except Exception as e:
+                if self.is_recording:
+                    import traceback
+                    print(f"[stt:{language}] Error: {e}")
+                    print(traceback.format_exc())
+                    time.sleep(1.0)
+                else:
+                    break
+        print(f"[stt:{language}] Thread exiting")
+    # ---------------------------
+    # Control
+    # ---------------------------
+    def start_translation(self):
+        if self.is_recording:
+            print("Already recording!")
+            return
+        self.is_recording = True
+        self.last_processed_transcript = ""
+        for ev in self.restart_events.values():
+            try:
+                ev.clear()
+            except Exception:
+                pass
+        self.speaking_event.clear()
+        for q in self.lang_queues.values():
+            with q.mutex:
+                q.queue.clear()
+        self.recording_thread = threading.Thread(target=self._record_audio, daemon=True)
+        self.recording_thread.start()
+        for lang in ("en-US", "fr-FR"):
+            t = threading.Thread(target=self._run_stt_stream, args=(lang,), daemon=True)
+            self.stt_threads[lang] = t
+            t.start()
+            print(f"[main] STT thread {lang} started: {t.is_alive()} at {time.strftime('%H:%M:%S')}")
+        for ev in self.restart_events.values():
+            ev.set()
+    def stop_translation(self):
+        print("\n⏹️  Stopping translation...")
+        self.is_recording = False
+        for ev in self.restart_events.values():
+            ev.set()
+        self.speaking_event.clear()
+        if self._tts_consumer_task and not (self._tts_consumer_task.done() if hasattr(self._tts_consumer_task, 'done') else False):
+            try:
+                def _put_sentinel():
+                    try:
+                        self._tts_queue.put_nowait(None)
+                    except Exception:
+                        asyncio.create_task(self._tts_queue.put(None))
+                self.async_loop.call_soon_threadsafe(_put_sentinel)
+            except Exception:
+                pass
+        time.sleep(0.2)
+    def cleanup(self):
+        self.stop_translation()
+        try:
+            if self.async_loop.is_running():
+                def _stop_loop():
+                    if self._tts_consumer_task and not self._tts_consumer_task.done():
+                        try:
+                            self._tts_queue.put_nowait(None)
+                        except Exception:
+                            pass
+                    self.async_loop.stop()
+                self.async_loop.call_soon_threadsafe(_stop_loop)
+        except Exception:
+            pass
+        try:
+            self.pyaudio_instance.terminate()
+        except Exception:
+            pass
+# -----------------------------------------------------------------------------
+# Main entry
+# -----------------------------------------------------------------------------
+def main():
+    load_dotenv()
+    google_creds = os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
+    deepl_key = os.getenv("DEEPL_API_KEY")
+    eleven_key = os.getenv("ELEVENLABS_API_KEY")
+    voice_id = os.getenv("ELEVENLABS_VOICE_ID")
+    if not all([google_creds, deepl_key, eleven_key, voice_id]):
+        print("Missing API keys or credentials.")
+        return
+    translator = VoiceTranslator(deepl_key, eleven_key, voice_id)
+    print("Ready! Press ENTER to start, ENTER again to stop, Ctrl+C to quit.\n")
+    try:
+        while True:
+            input("Press ENTER to start speaking...")
+            translator.start_translation()
+            input("Press ENTER to stop...\n")
+            translator.stop_translation()
+    except KeyboardInterrupt:
+        print("\nKeyboardInterrupt received — cleaning up.")
+        translator.cleanup()
+if __name__ == "__main__":
+    main()

checkpoint_nov2.py ADDED Viewed

	@@ -0,0 +1,570 @@

+#!/usr/bin/env python3
+"""
+Real-Time French/English Voice Translator — patched single-file v2
+Changes from previous:
+ - Adds per-language stream_cancel_events that force the STT request_generator
+   to exit, allowing streaming_recognize to terminate and be restarted cleanly.
+ - _stream_tts sets the cancel events immediately after playback finishes (before
+   prebuffer re-injection and restart events).
+ - Request generator checks cancel event frequently and breaks to end the stream.
+Keep your env vars:
+ - GOOGLE_APPLICATION_CREDENTIALS, DEEPL_API_KEY, ELEVENLABS_API_KEY, ELEVENLABS_VOICE_ID
+"""
+import asyncio
+import json
+import queue
+import threading
+import time
+import os
+import base64
+from collections import deque
+from typing import Dict, Optional
+import pyaudio
+import websockets
+from google.cloud import speech
+import deepl
+from dotenv import load_dotenv
+# -----------------------------------------------------------------------------
+# VoiceTranslator
+# -----------------------------------------------------------------------------
+class VoiceTranslator:
+    def __init__(self, deepl_api_key: str, elevenlabs_api_key: str, elevenlabs_voice_id: str):
+        # External clients
+        self.deepl_client = deepl.Translator(deepl_api_key)
+        self.elevenlabs_api_key = elevenlabs_api_key
+        self.voice_id = elevenlabs_voice_id
+        self.stt_client = speech.SpeechClient()
+        # Audio params
+        self.audio_rate = 16000
+        self.audio_chunk = 1024
+        # Per-language audio queues (raw mic frames)
+        self.lang_queues: Dict[str, queue.Queue] = {
+            "en-US": queue.Queue(),
+            "fr-FR": queue.Queue(),
+        }
+        # Small rolling prebuffer to avoid missing the first bits after a restart
+        self.prebuffer = deque(maxlen=12)
+        # State flags
+        self.is_recording = False
+        self.is_speaking = False
+        self.speaking_event = threading.Event()
+        # Deduplication
+        self.last_processed_transcript = ""
+        self.last_tts_text_en = ""
+        self.last_tts_text_fr = ""
+        # Threshold
+        self.min_confidence_threshold = 0.5
+        # PyAudio
+        self.pyaudio_instance = pyaudio.PyAudio()
+        self.audio_stream = None
+        # Threads + async
+        self.recording_thread: Optional[threading.Thread] = None
+        self.async_loop = asyncio.new_event_loop()
+        # TTS queue + consumer task
+        self._tts_queue: "asyncio.Queue[Optional[dict]]" = asyncio.Queue()
+        self._tts_consumer_task: Optional[asyncio.Task] = None
+        # Start async loop in separate thread
+        self.async_thread = threading.Thread(target=self._run_async_loop, daemon=True)
+        self.async_thread.start()
+        # schedule tts consumer creation inside the async loop
+        def _start_consumer():
+            self._tts_consumer_task = asyncio.create_task(self._tts_consumer())
+        self.async_loop.call_soon_threadsafe(_start_consumer)
+        self.stt_threads: Dict[str, threading.Thread] = {}
+        # Per-language restart events (used to tell threads when to start new streams)
+        self.restart_events: Dict[str, threading.Event] = {
+            "en-US": threading.Event(),
+            "fr-FR": threading.Event(),
+        }
+        # Per-language stream started flag
+        self._stream_started = {"en-US": False, "fr-FR": False}
+        # **NEW**: per-language cancel events to force request_generator to stop
+        self.stream_cancel_events: Dict[str, threading.Event] = {
+            "en-US": threading.Event(),
+            "fr-FR": threading.Event(),
+        }
+        # Diagnostics
+        self._tts_job_counter = 0
+    def _run_async_loop(self):
+        asyncio.set_event_loop(self.async_loop)
+        try:
+            self.async_loop.run_forever()
+        except Exception as e:
+            print("[async_loop] stopped with error:", e)
+    # ---------------------------
+    # Audio capture
+    # ---------------------------
+    def _record_audio(self):
+        try:
+            stream = self.pyaudio_instance.open(
+                format=pyaudio.paInt16,
+                channels=1,
+                rate=self.audio_rate,
+                input=True,
+                frames_per_buffer=self.audio_chunk,
+            )
+            print("🎤 Recording started...")
+            while self.is_recording:
+                if self.speaking_event.is_set():
+                    time.sleep(0.01)
+                    continue
+                try:
+                    data = stream.read(self.audio_chunk, exception_on_overflow=False)
+                except Exception as e:
+                    print(f"[recorder] read error: {e}")
+                    continue
+                if not data:
+                    continue
+                self.prebuffer.append(data)
+                self.lang_queues["en-US"].put(data)
+                self.lang_queues["fr-FR"].put(data)
+            try:
+                stream.stop_stream()
+                stream.close()
+            except Exception:
+                pass
+            print("🎤 Recording stopped.")
+        except Exception as e:
+            print(f"[recorder] fatal: {e}")
+    # ---------------------------
+    # TTS streaming (ElevenLabs) - async
+    # ---------------------------
+    async def _stream_tts(self, text: str):
+        uri = (
+            f"wss://api.elevenlabs.io/v1/text-to-speech/{self.voice_id}"
+            f"/stream-input?model_id=eleven_flash_v2_5&output_format=pcm_16000"
+        )
+        tts_audio_stream = None
+        websocket = None
+        try:
+            # Mark speaking and set event so recorder & STT pause
+            self.is_speaking = True
+            self.speaking_event.set()
+            # print(f"[{time.strftime('%H:%M:%S')}] [tts] speaking -> True")
+            # Clear queued frames to avoid replay; we'll re-inject prebuffer after we cancel streams
+            for q in self.lang_queues.values():
+                with q.mutex:
+                    q.queue.clear()
+            websocket = await websockets.connect(uri)
+            await websocket.send(json.dumps({
+                "text": " ",
+                "voice_settings": {"stability": 0.5, "similarity_boost": 0.8},
+                "xi_api_key": self.elevenlabs_api_key,
+            }))
+            await websocket.send(json.dumps({"text": text, "try_trigger_generation": True}))
+            await websocket.send(json.dumps({"text": ""}))
+            tts_audio_stream = self.pyaudio_instance.open(
+                format=pyaudio.paInt16,
+                channels=1,
+                rate=16000,
+                output=True,
+                frames_per_buffer=1024,
+            )
+            prebuffer = bytearray()
+            playback_started = False
+            try:
+                while True:
+                    try:
+                        message = await asyncio.wait_for(websocket.recv(), timeout=8.0)
+                    except asyncio.TimeoutError:
+                        if playback_started:
+                            break
+                        else:
+                            continue
+                    if isinstance(message, bytes):
+                        prebuffer.extend(message)
+                        if not playback_started and len(prebuffer) >= 16000:
+                            tts_audio_stream.write(bytes(prebuffer))
+                            prebuffer.clear()
+                            playback_started = True
+                        elif playback_started:
+                            tts_audio_stream.write(message)
+                        continue
+                    try:
+                        data = json.loads(message)
+                    except Exception:
+                        continue
+                    if data.get("audio"):
+                        audio_bytes = base64.b64decode(data["audio"])
+                        prebuffer.extend(audio_bytes)
+                        if not playback_started and len(prebuffer) >= 16000:
+                            tts_audio_stream.write(bytes(prebuffer))
+                            prebuffer.clear()
+                            playback_started = True
+                        elif playback_started:
+                            tts_audio_stream.write(audio_bytes)
+                    elif data.get("isFinal"):
+                        break
+                    elif data.get("error"):
+                        print("TTS error:", data["error"])
+                        break
+                if prebuffer:
+                    tts_audio_stream.write(bytes(prebuffer))
+            finally:
+                try:
+                    await websocket.close()
+                except Exception:
+                    pass
+        except Exception as e:
+            # print(f"[tts] error: {e}")
+            pass
+        finally:
+            if tts_audio_stream:
+                try:
+                    tts_audio_stream.stop_stream()
+                    tts_audio_stream.close()
+                except Exception:
+                    pass
+            # **NEW**: force the STT request generators to exit by setting cancel events.
+            # This makes streaming_recognize finish; threads will then wait for restart_events
+            # and start fresh streams.
+            for lang, ev in self.stream_cancel_events.items():
+                ev.set()
+                # print(f"[{time.strftime('%H:%M:%S')}] [cancel] set -> {lang}")
+            # Now re-inject prebuffer so new streams start with warm audio
+            pre_list = list(self.prebuffer)
+            if pre_list:
+                print(f"[{time.strftime('%H:%M:%S')}] [prebuffer] re-injecting {len(pre_list)} chunks into queues")
+                for chunk in pre_list:
+                    self.lang_queues["en-US"].put(chunk)
+                    self.lang_queues["fr-FR"].put(chunk)
+            # Clear speaking state and signal STT threads to restart (robustly)
+            self.is_speaking = False
+            self.speaking_event.clear()
+            # print(f"[{time.strftime('%H:%M:%S')}] [tts] speaking -> False")
+            # Primary restart: set both events
+            for lang, ev in self.restart_events.items():
+                ev.set()
+                # print(f"[{time.strftime('%H:%M:%S')}] [restart] set -> {lang}")
+            await asyncio.sleep(0.25)
+            for lang, ev in self.restart_events.items():
+                ev.set()
+            await asyncio.sleep(0.25)
+    # ---------------------------
+    # TTS consumer (serializes TTS)
+    # ---------------------------
+    async def _tts_consumer(self):
+        print("[tts_consumer] started")
+        while True:
+            item = await self._tts_queue.get()
+            if item is None:
+                print("[tts_consumer] shutdown sentinel received")
+                break
+            text = item.get("text", "")
+            self._tts_job_counter += 1
+            job_id = self._tts_job_counter
+            print(f"[tts_consumer] job #{job_id} dequeued (len={len(text)})")
+            try:
+                await asyncio.wait_for(self._stream_tts(text), timeout=35.0)
+            except asyncio.TimeoutError:
+                print(f"[tts_consumer] job #{job_id} _stream_tts timed out; proceeding.")
+            except Exception as e:
+                print(f"[tts_consumer] job #{job_id} error during _stream_tts: {e}")
+            finally:
+                await asyncio.sleep(0.05)
+        print("[tts_consumer] exiting")
+    # ---------------------------
+    # Translation & TTS trigger
+    # ---------------------------
+    async def _process_result(self, transcript: str, confidence: float, language: str):
+        lang_flag = "🇫🇷" if language == "fr-FR" else "🇬🇧"
+        print(f"{lang_flag} Heard ({language}, conf {confidence:.2f}): {transcript}")
+        # echo suppression vs last TTS in same language
+        if language == "fr-FR":
+            if transcript.strip().lower() == self.last_tts_text_fr.strip().lower():
+                print("  (echo suppressed)")
+                return
+        else:
+            if transcript.strip().lower() == self.last_tts_text_en.strip().lower():
+                print("  (echo suppressed)")
+                return
+        try:
+            if language == "fr-FR":
+                translated = self.deepl_client.translate_text(transcript, target_lang="EN-US").text
+                print(f"🌐 FR → EN: {translated}")
+                await self._tts_queue.put({"text": translated, "source_lang": language})
+                self.last_tts_text_en = translated
+            else:
+                translated = self.deepl_client.translate_text(transcript, target_lang="FR").text
+                print(f"🌐 EN → FR: {translated}")
+                await self._tts_queue.put({"text": translated, "source_lang": language})
+                self.last_tts_text_fr = translated
+            print("🔊 Queued for speaking...")
+        except Exception as e:
+            print(f"Translation error: {e}")
+    # ---------------------------
+    # STT streaming (run per language)
+    # ---------------------------
+    def _run_stt_stream(self, language: str):
+        print(f"[stt:{language}] Thread starting, thread_id={threading.get_ident()}")
+        self._stream_started[language] = False
+        while self.is_recording:
+            try:
+                if self._stream_started[language]:
+                    print(f"[{time.strftime('%H:%M:%S')}] [stt:{language}] Waiting for restart signal...")
+                    signaled = self.restart_events[language].wait(timeout=30)
+                    if not signaled and self.is_recording:
+                        print(f"[{time.strftime('%H:%M:%S')}] [stt:{language}] Timeout waiting for restart, restarting anyway")
+                    if not self.is_recording:
+                        break
+                    try:
+                        self.restart_events[language].clear()
+                    except Exception:
+                        pass
+                    time.sleep(0.01)
+                self._stream_started[language] = True
+                print(f"[{time.strftime('%H:%M:%S')}] [stt:{language}] Starting new stream...")
+                config = speech.RecognitionConfig(
+                    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
+                    sample_rate_hertz=self.audio_rate,
+                    language_code=language,
+                    enable_automatic_punctuation=True,
+                    model="latest_short",
+                )
+                streaming_config = speech.StreamingRecognitionConfig(
+                    config=config,
+                    interim_results=True,
+                    single_utterance=False,
+                )
+                # Request generator yields StreamingRecognizeRequest messages
+                def request_generator():
+                    while self.is_recording:
+                        # If TTS is playing, skip sending mic frames to STT
+                        if self.speaking_event.is_set():
+                            time.sleep(0.01)
+                            continue
+                        # If cancel event set, clear and break to end stream
+                        if self.stream_cancel_events[language].is_set():
+                            # print(f"[{time.strftime('%H:%M:%S')}] [stt:{language}] request_generator observed cancel -> exiting generator")
+                            try:
+                                self.stream_cancel_events[language].clear()
+                            except Exception:
+                                pass
+                            break
+                        try:
+                            chunk = self.lang_queues[language].get(timeout=1.0)
+                        except queue.Empty:
+                            continue
+                        yield speech.StreamingRecognizeRequest(audio_content=chunk)
+                responses = self.stt_client.streaming_recognize(streaming_config, request_generator())
+                response_count = 0
+                final_received = False
+                for response in responses:
+                    if not self.is_recording:
+                        print(f"[stt:{language}] Stopped by user")
+                        break
+                    if not response.results:
+                        continue
+                    response_count += 1
+                    for result in response.results:
+                        if not result.alternatives:
+                            continue
+                        alt = result.alternatives[0]
+                        transcript = alt.transcript.strip()
+                        conf = getattr(alt, "confidence", 0.0)
+                        is_final = bool(result.is_final)
+                        if is_final:
+                            now = time.strftime("%H:%M:%S")
+                            print(f"[{now}] [stt:{language}] → '{transcript}' (final={is_final}, conf={conf:.2f})")
+                            if conf < self.min_confidence_threshold:
+                                print(f"[{now}] [stt:{language}] Final received but confidence {conf:.2f} < threshold -> suppressed")
+                                continue
+                            if language == "fr-FR" and transcript.strip().lower() == self.last_tts_text_fr.strip().lower():
+                                print(f"[{now}] [stt:{language}] (echo suppressed - matches last_tts_text_fr)")
+                                continue
+                            if language == "en-US" and transcript.strip().lower() == self.last_tts_text_en.strip().lower():
+                                print(f"[{now}] [stt:{language}] (echo suppressed - matches last_tts_text_en)")
+                                continue
+                            asyncio.run_coroutine_threadsafe(
+                                self._process_result(transcript, conf, language),
+                                self.async_loop
+                            )
+                            final_received = True
+                            break
+                    if final_received:
+                        break
+                print(f"[stt:{language}] Stream ended after {response_count} responses")
+                if self.is_recording and final_received:
+                    print(f"[{time.strftime('%H:%M:%S')}] [stt:{language}] Final result processed. Waiting for TTS to complete and signal restart.")
+                elif self.is_recording and not final_received:
+                    print(f"[stt:{language}] Stream ended unexpectedly, reconnecting...")
+                    time.sleep(0.5)
+                else:
+                    break
+            except Exception as e:
+                if self.is_recording:
+                    import traceback
+                    print(f"[stt:{language}] Error: {e}")
+                    print(traceback.format_exc())
+                    time.sleep(1.0)
+                else:
+                    break
+        print(f"[stt:{language}] Thread exiting")
+    # ---------------------------
+    # Control
+    # ---------------------------
+    def start_translation(self):
+        if self.is_recording:
+            print("Already recording!")
+            return
+        self.is_recording = True
+        self.last_processed_transcript = ""
+        for ev in self.restart_events.values():
+            try:
+                ev.clear()
+            except Exception:
+                pass
+        self.speaking_event.clear()
+        for q in self.lang_queues.values():
+            with q.mutex:
+                q.queue.clear()
+        self.recording_thread = threading.Thread(target=self._record_audio, daemon=True)
+        self.recording_thread.start()
+        for lang in ("en-US", "fr-FR"):
+            t = threading.Thread(target=self._run_stt_stream, args=(lang,), daemon=True)
+            self.stt_threads[lang] = t
+            t.start()
+            print(f"[main] STT thread {lang} started: {t.is_alive()} at {time.strftime('%H:%M:%S')}")
+        for ev in self.restart_events.values():
+            ev.set()
+    def stop_translation(self):
+        print("\n⏹️  Stopping translation...")
+        self.is_recording = False
+        for ev in self.restart_events.values():
+            ev.set()
+        self.speaking_event.clear()
+        if self._tts_consumer_task and not (self._tts_consumer_task.done() if hasattr(self._tts_consumer_task, 'done') else False):
+            try:
+                def _put_sentinel():
+                    try:
+                        self._tts_queue.put_nowait(None)
+                    except Exception:
+                        asyncio.create_task(self._tts_queue.put(None))
+                self.async_loop.call_soon_threadsafe(_put_sentinel)
+            except Exception:
+                pass
+        time.sleep(0.2)
+    def cleanup(self):
+        self.stop_translation()
+        try:
+            if self.async_loop.is_running():
+                def _stop_loop():
+                    if self._tts_consumer_task and not self._tts_consumer_task.done():
+                        try:
+                            self._tts_queue.put_nowait(None)
+                        except Exception:
+                            pass
+                    self.async_loop.stop()
+                self.async_loop.call_soon_threadsafe(_stop_loop)
+        except Exception:
+            pass
+        try:
+            self.pyaudio_instance.terminate()
+        except Exception:
+            pass
+# -----------------------------------------------------------------------------
+# Main entry
+# -----------------------------------------------------------------------------
+def main():
+    load_dotenv()
+    google_creds = os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
+    deepl_key = os.getenv("DEEPL_API_KEY")
+    eleven_key = os.getenv("ELEVENLABS_API_KEY")
+    voice_id = os.getenv("ELEVENLABS_VOICE_ID")
+    if not all([google_creds, deepl_key, eleven_key, voice_id]):
+        print("Missing API keys or credentials.")
+        return
+    translator = VoiceTranslator(deepl_key, eleven_key, voice_id)
+    print("Ready! Press ENTER to start, ENTER again to stop, Ctrl+C to quit.\n")
+    try:
+        while True:
+            input("Press ENTER to start speaking...")
+            translator.start_translation()
+            input("Press ENTER to stop...\n")
+            translator.stop_translation()
+    except KeyboardInterrupt:
+        print("\nKeyboardInterrupt received — cleaning up.")
+        translator.cleanup()
+if __name__ == "__main__":
+    main()

mic_check.py ADDED Viewed

	@@ -0,0 +1,5 @@

+import pyaudio
+p = pyaudio.PyAudio()
+print("Default input device index:", p.get_default_input_device_info()['index'])
+print("Default input name:", p.get_default_input_device_info()['name'])
+p.terminate()

requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+google-cloud-speech
+deepl
+pyaudio
+websockets
+python-dotenv

working.py ADDED Viewed

	@@ -0,0 +1,334 @@

+"""
+Real-Time French/English Voice Translator - FIXED VERSION v4.2
+Improvements:
+ - Removed noisy [audio_gen]/[tts] prints
+ - Added TTS pre-buffer to eliminate start bursts
+ - Added silence-based auto-finalization when no STT final detected
+ - Switched to "latest_long" model for better segmentation
+ - Added echo suppression (skip self-spoken TTS text)
+"""
+import asyncio
+import json
+import queue
+import threading
+import time
+from typing import Optional, Dict, List
+import pyaudio
+import websockets
+from google.cloud import speech
+import deepl
+import os
+from dotenv import load_dotenv
+import base64
+class VoiceTranslator:
+    def __init__(self, deepl_api_key: str, elevenlabs_api_key: str, elevenlabs_voice_id: str):
+        self.stt_client = speech.SpeechClient()
+        self.deepl_client = deepl.Translator(deepl_api_key)
+        self.elevenlabs_api_key = elevenlabs_api_key
+        self.voice_id = elevenlabs_voice_id
+        self.audio_rate = 16000
+        self.audio_chunk = 1024
+        self.audio_queue_en = queue.Queue()
+        self.audio_queue_fr = queue.Queue()
+        self.result_queue = queue.Queue()
+        self.is_recording = False
+        self.processing_lock = threading.Lock()
+        self.last_processed_transcript = ""
+        self.last_tts_text = ""
+        self.pyaudio_instance = pyaudio.PyAudio()
+        self.audio_stream = None
+    # ---------- AUDIO CAPTURE ----------
+    def _audio_generator(self, audio_queue: queue.Queue):
+        while self.is_recording:
+            try:
+                chunk = audio_queue.get(timeout=0.2)
+                if chunk:
+                    yield chunk
+            except queue.Empty:
+                continue
+    def _record_audio(self):
+        try:
+            stream = self.pyaudio_instance.open(
+                format=pyaudio.paInt16,
+                channels=1,
+                rate=self.audio_rate,
+                input=True,
+                frames_per_buffer=self.audio_chunk,
+            )
+            print("🎤 Recording started...")
+            while self.is_recording:
+                try:
+                    data = stream.read(self.audio_chunk, exception_on_overflow=False)
+                    if not data:
+                        continue
+                    self.audio_queue_en.put(data)
+                    self.audio_queue_fr.put(data)
+                except Exception as e:
+                    print(f"[recorder] error: {e}")
+                    break
+            stream.stop_stream()
+            stream.close()
+            print("🎤 Recording stopped.")
+        except Exception as e:
+            print(f"[recorder] fatal: {e}")
+    # ---------- TEXT TO SPEECH ----------
+    async def _stream_tts(self, text: str):
+        """Stream TTS with small pre-buffer to smooth playback."""
+        uri = (
+            f"wss://api.elevenlabs.io/v1/text-to-speech/{self.voice_id}"
+            f"/stream-input?model_id=eleven_flash_v2_5&output_format=pcm_16000"
+        )
+        try:
+            async with websockets.connect(uri) as websocket:
+                await websocket.send(json.dumps({
+                    "text": " ",
+                    "voice_settings": {"stability": 0.5, "similarity_boost": 0.8},
+                    "xi_api_key": self.elevenlabs_api_key,
+                }))
+                await websocket.send(json.dumps({"text": text, "try_trigger_generation": True}))
+                await websocket.send(json.dumps({"text": ""}))
+                if self.audio_stream is None:
+                    self.audio_stream = self.pyaudio_instance.open(
+                        format=pyaudio.paInt16,
+                        channels=1,
+                        rate=16000,
+                        output=True,
+                        frames_per_buffer=1024,
+                    )
+                prebuffer = bytearray()
+                playback_started = False
+                last_chunk_time = time.time()
+                async for message in websocket:
+                    if isinstance(message, bytes):
+                        prebuffer.extend(message)
+                        # Start playback after ~0.5 s of audio buffered
+                        if not playback_started and len(prebuffer) >= 16000:
+                            self.audio_stream.write(bytes(prebuffer))
+                            prebuffer.clear()
+                            playback_started = True
+                        elif playback_started:
+                            self.audio_stream.write(message)
+                        last_chunk_time = time.time()
+                        continue
+                    try:
+                        data = json.loads(message)
+                    except Exception:
+                        continue
+                    if data.get("audio"):
+                        audio_bytes = base64.b64decode(data["audio"])
+                        prebuffer.extend(audio_bytes)
+                        if not playback_started and len(prebuffer) >= 16000:
+                            self.audio_stream.write(bytes(prebuffer))
+                            prebuffer.clear()
+                            playback_started = True
+                        elif playback_started:
+                            self.audio_stream.write(audio_bytes)
+                        last_chunk_time = time.time()
+                    elif data.get("isFinal"):
+                        break
+                    elif data.get("error"):
+                        print("TTS error:", data["error"])
+                        break
+                if prebuffer:
+                    self.audio_stream.write(bytes(prebuffer))
+        except Exception as e:
+            print(f"[tts] error: {e}")
+    # ---------- TRANSLATION ----------
+    async def _process_result(self, transcript: str, confidence: Optional[float], language: str):
+        lang_flag = "🇫🇷" if language == "fr-FR" else "🇬🇧"
+        conf_display = f"{confidence:.2f}" if confidence is not None else "n/a"
+        print(f"{lang_flag} Heard ({language}, conf {conf_display}): {transcript}")
+        # Simple echo suppression
+        if transcript.strip().lower() == self.last_tts_text.strip().lower():
+            return
+        try:
+            if language == "fr-FR":
+                translated = self.deepl_client.translate_text(transcript, target_lang="EN-US").text
+                print(f"🌐 FR → EN: {translated}")
+            else:
+                translated = self.deepl_client.translate_text(transcript, target_lang="FR").text
+                print(f"🌐 EN → FR: {translated}")
+            self.last_tts_text = translated
+            print("🔊 Speaking...")
+            await self._stream_tts(translated)
+            print("✅ Done\n")
+        except Exception as e:
+            print(f"Translation error: {e}")
+    # ---------- STT STREAMING ----------
+    def _run_stt_stream(self, language: str, audio_queue: queue.Queue):
+        print(f"[stt] Thread start for {language}")
+        config = speech.RecognitionConfig(
+            encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
+            sample_rate_hertz=self.audio_rate,
+            language_code=language,
+            enable_automatic_punctuation=True,
+            model="latest_long",
+        )
+        streaming_config = speech.StreamingRecognitionConfig(
+            config=config, interim_results=True, single_utterance=False
+        )
+        def requests():
+            for content in self._audio_generator(audio_queue):
+                yield speech.StreamingRecognizeRequest(audio_content=content)
+        try:
+            responses = self.stt_client.streaming_recognize(streaming_config, requests())
+            last_update_time = time.time()
+            current_text = ""
+            for response in responses:
+                if not self.is_recording:
+                    break
+                if not response.results:
+                    continue
+                for result in response.results:
+                    if not result.alternatives:
+                        continue
+                    alt = result.alternatives[0]
+                    transcript = alt.transcript.strip()
+                    conf = getattr(alt, "confidence", None)
+                    current_text = transcript
+                    last_update_time = time.time()
+                    self.result_queue.put({
+                        "transcript": transcript,
+                        "confidence": conf,
+                        "language": language,
+                        "is_final": bool(result.is_final),
+                    })
+                # If we haven’t heard anything new for 1.2 s, flush it as “final”
+                if time.time() - last_update_time > 1.2 and current_text:
+                    self.result_queue.put({
+                        "transcript": current_text,
+                        "confidence": 0.5,
+                        "language": language,
+                        "is_final": True,
+                    })
+                    current_text = ""
+        except Exception as e:
+            print(f"[stt:{language}] exception: {e}")
+    # ---------- RESULT AGGREGATION ----------
+    async def _process_results_queue(self):
+        while self.is_recording:
+            try:
+                r = self.result_queue.get(timeout=0.2)
+                if r["is_final"] and r["transcript"] != self.last_processed_transcript:
+                    with self.processing_lock:
+                        self.last_processed_transcript = r["transcript"]
+                        await self._process_result(
+                            r["transcript"], r.get("confidence"), r["language"]
+                        )
+                await asyncio.sleep(0.01)
+            except queue.Empty:
+                await asyncio.sleep(0.05)
+            except Exception as e:
+                print("Queue error:", e)
+                await asyncio.sleep(0.1)
+    # ---------- CONTROL ----------
+    async def _run_dual_streams(self):
+        print("🔄 Dual-stream: English ⇄ French\n")
+        en_thread = threading.Thread(target=self._run_stt_stream, args=("en-US", self.audio_queue_en), daemon=True)
+        fr_thread = threading.Thread(target=self._run_stt_stream, args=("fr-FR", self.audio_queue_fr), daemon=True)
+        en_thread.start()
+        fr_thread.start()
+        await self._process_results_queue()
+    def start_translation(self):
+        if self.is_recording:
+            print("Already recording!")
+            return
+        self.is_recording = True
+        self.last_processed_transcript = ""
+        while not self.result_queue.empty():
+            try: self.result_queue.get_nowait()
+            except: break
+        threading.Thread(target=self._record_audio, daemon=True).start()
+        try:
+            asyncio.run(self._run_dual_streams())
+        except KeyboardInterrupt:
+            self.stop_translation()
+    def stop_translation(self):
+        print("\n⏹️  Stopping translation...")
+        self.is_recording = False
+        if self.audio_stream:
+            try:
+                self.audio_stream.stop_stream()
+                self.audio_stream.close()
+            except Exception:
+                pass
+            self.audio_stream = None
+    def cleanup(self):
+        self.stop_translation()
+        try:
+            self.pyaudio_instance.terminate()
+        except Exception:
+            pass
+# ---------- MAIN ----------
+def main():
+    load_dotenv()
+    google_creds = os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
+    deepl_key = os.getenv("DEEPL_API_KEY")
+    eleven_key = os.getenv("ELEVENLABS_API_KEY")
+    voice_id = os.getenv("ELEVENLABS_VOICE_ID")
+    if not all([google_creds, deepl_key, eleven_key, voice_id]):
+        print("Missing API keys or credentials.")
+        return
+    translator = VoiceTranslator(deepl_key, eleven_key, voice_id)
+    print("Ready! Press ENTER to start, ENTER again to stop, Ctrl+C to quit.\n")
+    try:
+        while True:
+            input("Press ENTER to start speaking...")
+            threading.Thread(target=translator.start_translation, daemon=True).start()
+            input("Press ENTER to stop...\n")
+            translator.stop_translation()
+    except KeyboardInterrupt:
+        translator.cleanup()
+if __name__ == "__main__":
+    main()