Spaces:

latishab
/

tars-conversation-app

Running

App Files Files Community

latishab commited on Feb 16

Commit

e8ed0e1

verified ·

1 Parent(s): 3c12fa0

Update TARS Conversation App with TarsApp framework

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +1 -0
.gitignore +48 -0
CLAUDE.md +50 -0
LICENSE +21 -0
README.md +340 -8
app.json +55 -0
assets/audio/tars-clean-compressed.mp3 +3 -0
bot.py +605 -0
config.ini.example +52 -0
docs/DAEMON_INTEGRATION.md +393 -0
docs/DASHBOARD_UPDATE_SUMMARY.md +218 -0
docs/DEVELOPING_APPS.md +400 -0
docs/INSTALLATION_GUIDE.md +264 -0
docs/MEMORY.md +190 -0
env.example +59 -0
index.html +333 -0
install.sh +99 -0
manifest.json +47 -0
pipecat_service.py +272 -0
publish-to-hf.sh +87 -0
pyproject.toml +25 -0
requirements.txt +18 -0
scripts/update_daemon.py +388 -0
src/README.md +55 -0
src/character/TARS.json +25 -0
src/character/persona.ini +21 -0
src/character/prompts.py +331 -0
src/config/__init__.py +152 -0
src/config/connection.py +179 -0
src/observers/__init__.py +21 -0
src/observers/assistant_observer.py +142 -0
src/observers/debug_observer.py +22 -0
src/observers/display_events_observer.py +100 -0
src/observers/metrics_observer.py +196 -0
src/observers/state_observer.py +166 -0
src/observers/transcription_observer.py +70 -0
src/observers/tts_state_observer.py +56 -0
src/observers/vision_observer.py +142 -0
src/processors/__init__.py +18 -0
src/processors/emotional_monitor.py +303 -0
src/processors/filters.py +81 -0
src/processors/gating.py +129 -0
src/processors/visual_observer.py +389 -0
src/services/README.md +110 -0
src/services/__init__.py +1 -0
src/services/factories/__init__.py +6 -0
src/services/factories/stt_factory.py +127 -0
src/services/factories/tts_factory.py +84 -0
src/services/memory/memory_chromadb.py +195 -0
src/services/memory/memory_hybrid.py +393 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+assets/audio/tars-clean-compressed.mp3 filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,48 @@

+# dependencies
+node_modules/
+/.pnp
+.pnp.js
+# testing
+/coverage
+# next.js
+/.next/
+/out/
+# cache
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+# production
+/build
+# misc
+.DS_Store
+*.pem
+/.models/
+/.claude/
+/chroma_memory/
+/deprecated/
+/memory_data/
+# debug
+npm-debug.log*
+yarn-debug.log*
+yarn-error.log*
+# local env files
+.env*.local
+.env
+# local config files
+config.ini
+# vercel
+.vercel
+# typescript
+*.tsbuildinfo
+next-env.d.ts

CLAUDE.md ADDED Viewed

	@@ -0,0 +1,50 @@

+# TARS Omni
+AI brain that connects to Raspberry Pi hardware daemon.
+## Pi Access
+```
+ssh tars-pi  # 100.84.133.74, user: mac, repo: ~/tars-daemon
+```
+## Install
+Pi (from tars-daemon dashboard):
+- Apps tab → Install button
+Pi (manual):
+```bash
+ssh tars-pi "cd ~/tars-conversation-app && bash install.sh"
+```
+See: docs/INSTALLATION_GUIDE.md
+## Run
+1. Pi: `ssh tars-pi "cd ~/tars && python tars_daemon.py"`
+2. Mac: `python tars_bot.py`
+---
+## Docs
+- Installation: docs/INSTALLATION_GUIDE.md
+- App Development: docs/DEVELOPING_APPS.md
+- Daemon Integration: docs/DAEMON_INTEGRATION.md
+- Dashboard Update: docs/DASHBOARD_UPDATE_SUMMARY.md
+## Dashboard Install
+tars-daemon dashboard now supports app management:
+- Apps tab shows all apps in ~/tars-apps/
+- Install/Uninstall buttons
+- Start/Stop controls
+- Auto-discovery via app.json
+## Claude Code Guidelines
+- No emojis, no [NEW] markers, no "vs" comparisons
+- Concise, technical, factual only
+- No fluff, benefits sections, or marketing language
+- Commits: imperative mood, no emojis
+- Comments: minimal, explain "why" not "what"

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2026 Latisha Besariani Hendra and TARS Omni Contributors
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md CHANGED Viewed

@@ -1,13 +1,345 @@
 ---
-title: Tars Conversation App
-emoji: 🏢
-colorFrom: indigo
-colorTo: yellow
 sdk: gradio
-sdk_version: 6.5.1
-app_file: app.py
 pinned: false
-license: mit
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: TARS Conversation App
+emoji: 🤖
+colorFrom: blue
+colorTo: purple
 sdk: gradio
+sdk_version: "4.0.0"
+app_file: ui/app.py
 pinned: false
 ---
+# TARS Conversation App
+Real-time voice AI with transcription, vision, and intelligent conversation using Speechmatics/Deepgram, Qwen3-TTS/ElevenLabs, DeepInfra LLM, and Moondream.
+## Features
+- **Dual Operation Modes**
+  - **WebRTC Mode** (`bot.py`) - Browser-based voice AI with real-time metrics dashboard
+  - **Robot Mode** (`tars_bot.py`) - Connect to Raspberry Pi TARS robot via WebRTC and gRPC
+- **Real-time Transcription** - Speechmatics or Deepgram with smart turn detection
+- **Dual TTS Options** - Qwen3-TTS (local, free, voice cloning) or ElevenLabs (cloud)
+- **LLM Integration** - Any model via DeepInfra
+- **Vision Analysis** - Moondream for image understanding
+- **Smart Gating Layer** - AI-powered decision system for natural conversation flow
+- **Hybrid Memory** - SQLite-based hybrid search (70% vector + 30% BM25)
+- **Emotional Monitoring** - Real-time detection of confusion, hesitation, and frustration
+- **Gradio Dashboard** - Live TTFB metrics, latency charts, and conversation transcription
+- **WebRTC Transport** - Low-latency peer-to-peer audio
+- **gRPC Robot Control** - Hardware control with 5-10ms latency (robot mode only)
+## Project Structure
+```
+tars-conversation-app/
+├── bot.py                      # WebRTC mode - Browser voice AI
+├── tars_bot.py                 # Robot mode - Raspberry Pi hardware
+├── pipecat_service.py          # FastAPI backend (WebRTC signaling)
+├── config.py                   # Configuration management
+├── config.ini                  # User configuration file
+├── requirements.txt            # Python dependencies
+│
+├── src/                        # Backend
+│   ├── observers/              # Pipeline observers (metrics, transcription)
+│   ├── processors/             # Pipeline processors (silence filter, gating)
+│   ├── services/               # Services (STT, TTS, Memory, Robot)
+│   ├── tools/                  # LLM callable functions
+│   ├── transport/              # WebRTC transport (aiortc)
+│   ├── character/              # TARS personality and prompts
+│   └── shared_state.py         # Shared metrics storage
+│
+├── ui/                         # Frontend
+│   └── app.py                  # Gradio dashboard (metrics + transcription)
+│
+├── tests/                      # Tests
+│   └── gradio/
+│       └── test_gradio.py      # UI integration test
+│
+├── character/                  # TARS character data
+│   ├── TARS.json              # Character definition
+│   └── persona.ini            # Personality parameters
+```
+## Operation Modes
+### WebRTC Mode (`bot.py`)
+- **Use case**: Browser-based voice AI conversations
+- **Transport**: SmallWebRTC (browser ↔ Pipecat)
+- **Features**: Full pipeline with STT, LLM, TTS, Memory
+- **UI**: Gradio dashboard for metrics and transcription
+- **Best for**: Development, testing, remote conversations
+### Robot Mode (`tars_bot.py`)
+- **Use case**: Physical TARS robot on Raspberry Pi
+- **Transport**: aiortc (RPi ↔ Pipecat) + gRPC (commands)
+- **Features**: Same pipeline + robot control (eyes, gestures, movement)
+- **Hardware**: Requires TARS robot with servos and display
+- **Best for**: Physical robot interactions, demos
+## Quick Start
+### Installation on TARS Robot (Recommended)
+Install directly from HuggingFace Space via the TARS dashboard:
+1. Open TARS dashboard at `http://your-pi:8000`
+2. Go to **App Store** tab
+3. Enter Space ID: `latishab/tars-conversation-app`
+4. Click **Install from HuggingFace**
+5. Configure API keys in `.env.local`
+6. Click **Start**
+7. Access metrics dashboard at `http://your-pi:7860`
+The app will:
+- Auto-install dependencies
+- Set up virtual environment
+- Configure for robot mode
+- Start Gradio dashboard
+### Easy Installation (Manual)
+For first-time setup on Raspberry Pi:
+```bash
+# Clone and install
+git clone https://github.com/latishab/tars-conversation-app.git
+cd tars-conversation-app
+bash install.sh
+```
+The installer handles:
+- System dependencies (portaudio, ffmpeg)
+- Python virtual environment
+- All Python packages
+- Configuration file setup
+### Manual Installation
+```bash
+# Python dependencies
+pip install -r requirements.txt
+# For robot mode, install TARS SDK
+pip install tars-robot[sdk]
+```
+### 2. Configure Environment
+```bash
+# Copy and edit environment file with your API keys
+cp env.example .env.local
+# Copy and edit configuration file
+cp config.ini.example config.ini
+```
+Required API Keys (in `.env.local`):
+- `SPEECHMATICS_API_KEY` or `DEEPGRAM_API_KEY` - For speech-to-text
+- `DEEPINFRA_API_KEY` - For LLM
+- `ELEVENLABS_API_KEY` - Optional (if using ElevenLabs TTS)
+Settings (in `config.ini`):
+```ini
+[LLM]
+model = meta-llama/Llama-3.3-70B-Instruct
+[STT]
+provider = deepgram  # or speechmatics
+[TTS]
+provider = qwen3  # or elevenlabs
+[Memory]
+type = hybrid  # SQLite-based hybrid search (vector + BM25)
+```
+### 3. Run
+#### WebRTC Mode (Browser)
+**Terminal 1: Python backend**
+```bash
+python pipecat_service.py
+```
+**Terminal 2: Gradio UI (optional)**
+```bash
+python ui/app.py
+```
+Then:
+1. Open WebRTC client in browser (connect to pipecat_service)
+2. Open Gradio dashboard at http://localhost:7861 (for metrics)
+3. Start talking
+#### Robot Mode (Raspberry Pi)
+Prerequisites:
+- Raspberry Pi TARS robot running tars_daemon.py
+- Network connection (LAN or Tailscale)
+- TARS SDK installed
+Configuration in `config.ini`:
+```ini
+[Connection]
+mode = robot
+rpi_url = http://<your-rpi-ip>:8001
+rpi_grpc = <your-rpi-ip>:50051
+auto_connect = true
+[Display]
+enabled = true
+```
+Deployment detection:
+- **Remote** (Mac/computer): Uses configured addresses
+- **Local** (on RPi): Auto-detects localhost:50051
+Run:
+```bash
+python tars_bot.py
+```
+## Gradio Dashboard
+The Gradio UI (`ui/app.py`) provides real-time monitoring:
+### Latency Dashboard
+- Service configuration (STT, Memory, LLM, TTS)
+- TTFB metrics with min/max/avg/last stats
+- Line chart: Latency trends over time
+- Bar chart: Stacked latency breakdown
+- Metrics table: Last 15 turns
+### Conversation Tab
+- Live user and assistant transcriptions
+- Auto-updates every second
+### Connection Tab
+- Architecture documentation
+- Usage instructions
+## Architecture
+### WebRTC Mode Data Flow
+```
+Browser (WebRTC client)
+    ↕ (audio)
+SmallWebRTC Transport
+    ↓
+Pipeline: STT → Memory → LLM → TTS
+    ↓
+Observers (metrics, transcription, assistant)
+    ↓
+shared_state.py
+    ↓
+Gradio UI (http://localhost:7861)
+```
+### Robot Mode Data Flow
+```
+RPi Mic → WebRTC → Pipecat Pipeline → WebRTC → RPi Speaker
+          (audio)        ↓              (audio)
+                        STT → Memory → LLM → TTS
+                                ↓
+                         LLM Tools (set_emotion, do_gesture)
+                                ↓
+                        gRPC → RPi Hardware
+                            (eyes, servos, display)
+```
+Communication channels (Robot Mode):
+| Channel | Protocol | Purpose | Latency |
+|---------|----------|---------|---------|
+| Audio | WebRTC (aiortc) | Voice conversation | ~20ms |
+| Commands | gRPC | Hardware control | ~5-10ms |
+| State | DataChannel | Battery, movement status | ~10ms |
+## Testing
+```bash
+# Test Gradio integration
+python tests/gradio/test_gradio.py
+# Test gesture recognition (robot mode)
+python tests/test_gesture.py
+# Test hardware connection (robot mode, from RPi)
+ssh tars-pi "cd ~/tars && python tests/test_hardware.py"
+```
+## Development
+See [docs/DEVELOPING_APPS.md](docs/DEVELOPING_APPS.md) for comprehensive guide on creating TARS SDK apps.
+### Adding Metrics
+1. Emit `MetricsFrame` in your service/processor
+2. `MetricsObserver` will capture it automatically
+3. Metrics appear in Gradio dashboard
+### Adding Tools
+1. Create function in `src/tools/`
+2. Create schema with `create_*_schema()`
+3. Register in `bot.py` or `tars_bot.py`
+4. LLM can now call your tool
+### Modifying UI
+1. Edit `ui/app.py`
+2. Gradio hot-reloads automatically
+3. Access `metrics_store` for data
+### Uninstalling
+```bash
+bash uninstall.sh
+```
+Removes virtual environment and optionally data/config files.
+## Troubleshooting
+### No metrics in Gradio UI
+- Ensure bot is running (`bot.py` or `tars_bot.py`)
+- Check WebRTC client is connected
+- Verify at least one conversation turn completed
+### Robot mode connection issues
+- Check RPi is reachable: `ping <rpi-ip>`
+- Verify tars_daemon is running on RPi
+- Check gRPC port 50051 is open
+- Review config.ini addresses
+### Import errors
+```bash
+pip install -r requirements.txt
+pip install gradio plotly  # For UI
+```
+### Audio issues (robot mode)
+- Check RPi mic/speaker with `arecord`/`aplay`
+- Verify WebRTC connection in logs
+- Test with `tests/test_hardware.py`
+## Contributing
+Contributions welcome.
+1. Fork the repository
+2. Create a feature branch
+3. Make your changes
+4. Test with `python tests/gradio/test_gradio.py`
+5. Commit with clear messages (see CLAUDE.md for style)
+6. Push to your fork
+7. Open a Pull Request
+Code Style:
+- Python: Follow PEP 8
+- Add comments for complex logic
+- Update docs for new features
+- See CLAUDE.md for guidelines (concise, technical, no fluff)
+## License
+MIT License - see LICENSE file for details

app.json ADDED Viewed

	@@ -0,0 +1,55 @@

+{
+  "name": "tars-conversation-app",
+  "version": "1.0.0",
+  "description": "Real-time conversational AI with WebRTC, memory, and vision",
+  "author": "TARS Project",
+  "repository": "https://github.com/latishab/tars-conversation-app.git",
+  "main": "tars_bot.py",
+  "install_script": "install.sh",
+  "uninstall_script": "uninstall.sh",
+  "dependencies": {
+    "python": ">=3.10",
+    "system": [
+      "portaudio19-dev",
+      "ffmpeg",
+      "build-essential",
+      "python3-dev"
+    ]
+  },
+  "environment": [
+    "DEEPINFRA_API_KEY",
+    "SPEECHMATICS_API_KEY",
+    "DEEPGRAM_API_KEY",
+    "ELEVENLABS_API_KEY"
+  ],
+  "configuration": {
+    "file": "config.ini",
+    "example": "config.ini.example",
+    "env_file": ".env.local",
+    "env_example": "env.example"
+  },
+  "ports": {
+    "grpc": 50051,
+    "http": 8765,
+    "fastapi": 8080
+  },
+  "modes": [
+    {
+      "name": "robot",
+      "description": "Connect to Pi hardware via gRPC",
+      "command": "python tars_bot.py"
+    },
+    {
+      "name": "browser",
+      "description": "Browser-based WebRTC mode",
+      "command": "python bot.py"
+    }
+  ],
+  "services": {
+    "dashboard": {
+      "enabled": true,
+      "command": "python ui/app.py",
+      "port": 7860
+    }
+  }
+}

assets/audio/tars-clean-compressed.mp3 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:35e66e7ef9dfd3e64ed70fcdb32b220686d3ad4451af88bfa72a48563a85b120
+size 289820

bot.py ADDED Viewed

	@@ -0,0 +1,605 @@

+"""Bot pipeline setup and execution."""
+import sys
+from pathlib import Path
+# Add src/ to Python path
+sys.path.insert(0, str(Path(__file__).parent / "src"))
+import asyncio
+import json
+import os
+import logging
+import uuid
+import httpx
+from pipecat.adapters.schemas.tools_schema import ToolsSchema
+from pipecat.frames.frames import (
+    LLMRunFrame,
+    TranscriptionFrame,
+    InterimTranscriptionFrame,
+    Frame,
+    TranscriptionMessage,
+    TranslationFrame,
+    UserImageRawFrame,
+    UserAudioRawFrame,
+    UserImageRequestFrame,
+)
+from pipecat.processors.frame_processor import FrameProcessor, FrameDirection
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineTask, PipelineParams
+from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    LLMUserAggregatorParams
+)
+from pipecat.observers.turn_tracking_observer import TurnTrackingObserver
+from pipecat.observers.loggers.user_bot_latency_log_observer import UserBotLatencyLogObserver
+from pipecat.services.moondream.vision import MoondreamService
+from pipecat.services.openai.llm import OpenAILLMService
+from pipecat.services.llm_service import FunctionCallParams
+from services.memory_hybrid import HybridMemoryService
+from pipecat.transcriptions.language import Language
+from pipecat.transports.base_transport import TransportParams
+from pipecat.transports.smallwebrtc.transport import SmallWebRTCTransport
+from loguru import logger
+from config import (
+    SPEECHMATICS_API_KEY,
+    DEEPGRAM_API_KEY,
+    ELEVENLABS_API_KEY,
+    ELEVENLABS_VOICE_ID,
+    DEEPINFRA_API_KEY,
+    DEEPINFRA_BASE_URL,
+    MEM0_API_KEY,
+    get_fresh_config,
+)
+from services.factories import create_stt_service, create_tts_service
+from processors import (
+    SilenceFilter,
+    InputAudioFilter,
+    InterventionGating,
+    VisualObserver,
+    EmotionalStateMonitor,
+)
+from observers import (
+    MetricsObserver,
+    TranscriptionObserver,
+    AssistantResponseObserver,
+    TTSStateObserver,
+    VisionObserver,
+    DebugObserver,
+    DisplayEventsObserver,
+)
+from character.prompts import (
+    load_persona_ini,
+    load_tars_json,
+    build_tars_system_prompt,
+    get_introduction_instruction,
+)
+from tools import (
+    fetch_user_image,
+    adjust_persona_parameter,
+    execute_movement,
+    capture_camera_view,
+    create_fetch_image_schema,
+    create_adjust_persona_schema,
+    create_identity_schema,
+    create_movement_schema,
+    create_camera_capture_schema,
+    get_persona_storage,
+    get_crossword_hint,
+    create_crossword_hint_schema,
+)
+from shared_state import metrics_store
+# ============================================================================
+# CUSTOM FRAME PROCESSORS
+# ============================================================================
+class IdentityUnifier(FrameProcessor):
+    """
+    Applies 'guest_ID' ONLY to specific user input frames.
+    Leaves other frames untouched.
+    """
+    # Define the frame types that should have user_id set
+    TARGET_FRAME_TYPES = (
+        TranscriptionFrame,
+        TranscriptionMessage,
+        TranslationFrame,
+        InterimTranscriptionFrame,
+        UserImageRawFrame,
+        UserAudioRawFrame,
+        UserImageRequestFrame,
+    )
+    def __init__(self, target_user_id):
+        super().__init__()
+        self.target_user_id = target_user_id
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        # 1. Handle internal state
+        await super().process_frame(frame, direction)
+        # 2. Only modify specific frame types
+        if isinstance(frame, self.TARGET_FRAME_TYPES):
+            try:
+                frame.user_id = self.target_user_id
+            except Exception:
+                pass
+        # 3. Push downstream
+        await self.push_frame(frame, direction)
+# ============================================================================
+# HELPER FUNCTIONS
+# ============================================================================
+async def _cleanup_services(service_refs: dict):
+    if service_refs.get("stt"):
+        try:
+            await service_refs["stt"].close()
+            logger.info("✓ STT service cleaned up")
+        except Exception:
+            pass
+    if service_refs.get("tts"):
+        try:
+            await service_refs["tts"].close()
+            logger.info("✓ TTS service cleaned up")
+        except Exception:
+            pass
+# ============================================================================
+# MAIN BOT PIPELINE
+# ============================================================================
+async def run_bot(webrtc_connection):
+    """Initialize and run the TARS bot pipeline."""
+    logger.info("Starting bot pipeline for WebRTC connection...")
+    # Load fresh configuration for this connection (allows runtime config updates)
+    runtime_config = get_fresh_config()
+    DEEPINFRA_MODEL = runtime_config['DEEPINFRA_MODEL']
+    DEEPINFRA_GATING_MODEL = runtime_config['DEEPINFRA_GATING_MODEL']
+    STT_PROVIDER = runtime_config['STT_PROVIDER']
+    TTS_PROVIDER = runtime_config['TTS_PROVIDER']
+    QWEN3_TTS_MODEL = runtime_config['QWEN3_TTS_MODEL']
+    QWEN3_TTS_DEVICE = runtime_config['QWEN3_TTS_DEVICE']
+    QWEN3_TTS_REF_AUDIO = runtime_config['QWEN3_TTS_REF_AUDIO']
+    EMOTIONAL_MONITORING_ENABLED = runtime_config['EMOTIONAL_MONITORING_ENABLED']
+    EMOTIONAL_SAMPLING_INTERVAL = runtime_config['EMOTIONAL_SAMPLING_INTERVAL']
+    EMOTIONAL_INTERVENTION_THRESHOLD = runtime_config['EMOTIONAL_INTERVENTION_THRESHOLD']
+    TARS_DISPLAY_URL = runtime_config['TARS_DISPLAY_URL']
+    TARS_DISPLAY_ENABLED = runtime_config['TARS_DISPLAY_ENABLED']
+    logger.info(f"📋 Runtime config loaded - STT: {STT_PROVIDER}, LLM: {DEEPINFRA_MODEL}, TTS: {TTS_PROVIDER}, Emotional: {EMOTIONAL_MONITORING_ENABLED}")
+    # Session initialization
+    session_id = str(uuid.uuid4())[:8]
+    client_id = f"guest_{session_id}"
+    client_state = {"client_id": client_id}
+    logger.info(f"Session started: {client_id}")
+    service_refs = {"stt": None, "tts": None}
+    try:
+        # ====================================================================
+        # TRANSPORT INITIALIZATION
+        # ====================================================================
+        # Note: STT providers handle their own turn detection:
+        # - Speechmatics: SMART_TURN mode
+        # - Deepgram: endpointing parameter (300ms silence detection)
+        # - Deepgram Flux: built-in turn detection with ExternalUserTurnStrategies (deprecated)
+        logger.info(f"Initializing transport with {STT_PROVIDER} turn detection...")
+        transport_params = TransportParams(
+            audio_in_enabled=True,
+            audio_out_enabled=True,
+            video_in_enabled=False,
+            video_out_enabled=False,
+            video_out_is_live=False,
+        )
+        pipecat_transport = SmallWebRTCTransport(
+            webrtc_connection=webrtc_connection,
+            params=transport_params,
+        )
+        logger.info("✓ Transport initialized")
+        # ====================================================================
+        # SPEECH-TO-TEXT SERVICE
+        # ====================================================================
+        logger.info(f"Initializing {STT_PROVIDER} STT...")
+        stt = None
+        try:
+            stt = create_stt_service(
+                provider=STT_PROVIDER,
+                speechmatics_api_key=SPEECHMATICS_API_KEY,
+                deepgram_api_key=DEEPGRAM_API_KEY,
+                language=Language.EN,
+                enable_diarization=False,
+            )
+            service_refs["stt"] = stt
+            # Log additional info for Deepgram
+            if STT_PROVIDER == "deepgram":
+                logger.info("✓ Deepgram: 300ms endpointing for turn detection")
+                logger.info("✓ Deepgram: VAD events enabled for speech detection")
+        except Exception as e:
+            logger.error(f"Failed to initialize {STT_PROVIDER} STT: {e}", exc_info=True)
+            return
+        # ====================================================================
+        # TEXT-TO-SPEECH SERVICE
+        # ====================================================================
+        try:
+            tts = create_tts_service(
+                provider=TTS_PROVIDER,
+                elevenlabs_api_key=ELEVENLABS_API_KEY,
+                elevenlabs_voice_id=ELEVENLABS_VOICE_ID,
+                qwen_model=QWEN3_TTS_MODEL,
+                qwen_device=QWEN3_TTS_DEVICE,
+                qwen_ref_audio=QWEN3_TTS_REF_AUDIO,
+            )
+            service_refs["tts"] = tts
+        except Exception as e:
+            logger.error(f"Failed to initialize TTS service: {e}", exc_info=True)
+            return
+        # ====================================================================
+        # LLM SERVICE & TOOLS
+        # ====================================================================
+        logger.info("Initializing LLM via DeepInfra...")
+        llm = None
+        try:
+            llm = OpenAILLMService(
+                api_key=DEEPINFRA_API_KEY,
+                base_url=DEEPINFRA_BASE_URL,
+                model=DEEPINFRA_MODEL
+            )
+            character_dir = os.path.join(os.path.dirname(__file__), "character")
+            persona_params = load_persona_ini(os.path.join(character_dir, "persona.ini"))
+            tars_data = load_tars_json(os.path.join(character_dir, "TARS.json"))
+            system_prompt = build_tars_system_prompt(persona_params, tars_data)
+            # Create tool schemas (these return FunctionSchema objects)
+            fetch_image_tool = create_fetch_image_schema()
+            persona_tool = create_adjust_persona_schema()
+            identity_tool = create_identity_schema()
+            crossword_hint_tool = create_crossword_hint_schema()
+            movement_tool = create_movement_schema()
+            camera_capture_tool = create_camera_capture_schema()
+            # Pass FunctionSchema objects directly to standard_tools
+            tools = ToolsSchema(
+                standard_tools=[
+                    fetch_image_tool,
+                    persona_tool,
+                    identity_tool,
+                    crossword_hint_tool,
+                    movement_tool,
+                    camera_capture_tool,
+                ]
+            )
+            messages = [system_prompt]
+            context = LLMContext(messages, tools)
+            llm.register_function("fetch_user_image", fetch_user_image)
+            llm.register_function("adjust_persona_parameter", adjust_persona_parameter)
+            llm.register_function("get_crossword_hint", get_crossword_hint)
+            llm.register_function("execute_movement", execute_movement)
+            llm.register_function("capture_camera_view", capture_camera_view)
+            pipeline_unifier = IdentityUnifier(client_id)
+            async def wrapped_set_identity(params: FunctionCallParams):
+                name = params.arguments["name"]
+                logger.info(f"👤 Identity discovered: {name}")
+                old_id = client_state["client_id"]
+                new_id = f"user_{name.lower().replace(' ', '_')}"
+                if old_id != new_id:
+                    logger.info(f"🔄 Switching User ID: {old_id} -> {new_id}")
+                    client_state["client_id"] = new_id
+                    # Update the pipeline unifier to use new identity
+                    pipeline_unifier.target_user_id = new_id
+                    logger.info(f"✓ Updated pipeline unifier with new ID: {new_id}")
+                    # Update memory service with new user_id
+                    if memory_service:
+                        memory_service.user_id = new_id
+                        logger.info(f"✓ Updated memory service user_id to: {new_id}")
+                    # Notify frontend of identity change
+                    try:
+                        if webrtc_connection and webrtc_connection.is_connected():
+                            webrtc_connection.send_app_message({
+                                "type": "identity_update",
+                                "old_id": old_id,
+                                "new_id": new_id,
+                                "name": name
+                            })
+                            logger.info(f"📤 Sent identity update to frontend: {new_id}")
+                    except Exception as e:
+                        logger.warning(f"Failed to send identity update to frontend: {e}")
+                await params.result_callback(f"Identity updated to {name}.")
+            llm.register_function("set_user_identity", wrapped_set_identity)
+            logger.info(f"✓ LLM initialized with model: {DEEPINFRA_MODEL}")
+        except Exception as e:
+            logger.error(f"Failed to initialize LLM: {e}", exc_info=True)
+            return
+        # ====================================================================
+        # VISION & GATING SERVICES
+        # ====================================================================
+        logger.info("Initializing Moondream vision service...")
+        moondream = None
+        try:
+            moondream = MoondreamService(model="vikhyatk/moondream2", revision="2025-01-09")
+            logger.info("✓ Moondream vision service initialized")
+        except Exception as e:
+            logger.error(f"Failed to initialize Moondream: {e}")
+            return
+        # ====================================================================
+        # TARS DISPLAY - Note: Display control via gRPC in robot mode only
+        # ====================================================================
+        logger.info("TARS Display features available in robot mode (tars_bot.py)")
+        tars_client = None
+        logger.info("Initializing Visual Observer...")
+        visual_observer = VisualObserver(
+            vision_client=moondream,
+            enable_face_detection=True,
+            tars_client=tars_client
+        )
+        logger.info("✓ Visual Observer initialized")
+        logger.info("Initializing Emotional State Monitor...")
+        emotional_monitor = EmotionalStateMonitor(
+            vision_client=moondream,
+            model="vikhyatk/moondream2",
+            sampling_interval=EMOTIONAL_SAMPLING_INTERVAL,
+            intervention_threshold=EMOTIONAL_INTERVENTION_THRESHOLD,
+            enabled=EMOTIONAL_MONITORING_ENABLED,
+            auto_intervene=False,  # Let gating layer handle intervention decisions
+        )
+        logger.info(f"✓ Emotional State Monitor initialized (enabled: {EMOTIONAL_MONITORING_ENABLED})")
+        logger.info(f"   Mode: Integrated with gating layer for smarter decisions")
+        logger.info("Initializing Gating Layer...")
+        gating_layer = InterventionGating(
+            api_key=DEEPINFRA_API_KEY,
+            base_url=DEEPINFRA_BASE_URL,
+            model=DEEPINFRA_GATING_MODEL,
+            visual_observer=visual_observer,
+            emotional_monitor=emotional_monitor
+        )
+        logger.info(f"✓ Gating Layer initialized with emotional state integration")
+        # ====================================================================
+        # MEMORY SERVICE
+        # ====================================================================
+        # Memory service: Hybrid search combining vector similarity (70%) and BM25 keyword matching (30%)
+        # Optimized for voice AI with <50ms latency target
+        logger.info("Initializing hybrid memory service...")
+        memory_service = None
+        try:
+            memory_service = HybridMemoryService(
+                user_id=client_id,
+                db_path="./memory_data/memory.sqlite",
+                search_limit=3,
+                search_timeout_ms=100,  # Hybrid search needs ~60-80ms, allow buffer
+                vector_weight=0.7,      # 70% semantic similarity
+                bm25_weight=0.3,        # 30% keyword matching
+                system_prompt_prefix="From our conversations:\n",
+            )
+            logger.info(f"✓ Hybrid memory service initialized for {client_id}")
+        except Exception as e:
+            logger.error(f"Failed to initialize hybrid memory service: {e}")
+            logger.info("  Continuing without memory service...")
+            memory_service = None  # Continue without memory if it fails
+        # ====================================================================
+        # CONTEXT AGGREGATOR & PERSONA STORAGE
+        # ====================================================================
+        # Configure user turn aggregation
+        # STT services (Speechmatics, Deepgram) handle turn detection internally
+        user_params = LLMUserAggregatorParams(
+            user_turn_stop_timeout=1.5
+        )
+        context_aggregator = LLMContextAggregatorPair(
+            context,
+            user_params=user_params
+        )
+        persona_storage = get_persona_storage()
+        persona_storage["persona_params"] = persona_params
+        persona_storage["tars_data"] = tars_data
+        persona_storage["context_aggregator"] = context_aggregator
+        # ====================================================================
+        # LOGGING PROCESSORS
+        # ====================================================================
+        transcription_observer = TranscriptionObserver(
+            webrtc_connection=webrtc_connection,
+            client_state=client_state
+        )
+        assistant_observer = AssistantResponseObserver(webrtc_connection=webrtc_connection)
+        tts_state_observer = TTSStateObserver(webrtc_connection=webrtc_connection)
+        vision_observer = VisionObserver(webrtc_connection=webrtc_connection)
+        display_events_observer = DisplayEventsObserver(tars_client=tars_client)
+        # Create MetricsObserver (non-intrusive monitoring outside pipeline)
+        metrics_observer = MetricsObserver(
+            webrtc_connection=webrtc_connection,
+            stt_service=stt
+        )
+        # Turn tracking observer (for debugging turn detection)
+        turn_observer = TurnTrackingObserver()
+        @turn_observer.event_handler("on_turn_started")
+        async def on_turn_started(*args, **kwargs):
+            turn_number = args[1] if len(args) > 1 else kwargs.get('turn_number', 0)
+            logger.info(f"🗣️  [TurnObserver] Turn STARTED: {turn_number}")
+            # Notify metrics observer of new turn
+            metrics_observer.start_turn(turn_number)
+        @turn_observer.event_handler("on_turn_ended")
+        async def on_turn_ended(*args, **kwargs):
+            turn_number = args[1] if len(args) > 1 else kwargs.get('turn_number', 0)
+            logger.info(f"🗣️  [TurnObserver] Turn ENDED: {turn_number}")
+        # ====================================================================
+        # PIPELINE ASSEMBLY
+        # ====================================================================
+        logger.info("Creating audio/video pipeline...")
+        pipeline = Pipeline([
+            pipecat_transport.input(),
+            # emotional_monitor,  # Real-time emotional state monitoring
+            stt,
+            pipeline_unifier,
+            context_aggregator.user(),
+            memory_service,  # Hybrid memory (70% vector + 30% BM25) for automatic recall/storage
+            # gating_layer,  # AI decision system (with emotional state integration)
+            llm,
+            SilenceFilter(),
+            tts,
+            pipecat_transport.output(),
+            context_aggregator.assistant(),
+        ])
+        # ====================================================================
+        # EVENT HANDLERS
+        # ====================================================================
+        task_ref = {"task": None}
+        @pipecat_transport.event_handler("on_client_connected")
+        async def on_client_connected(transport, client):
+            logger.info("Pipecat Client connected")
+            try:
+                if webrtc_connection.is_connected():
+                    webrtc_connection.send_app_message({"type": "system", "message": "Connection established"})
+                    # Send service configuration info with provider and model details
+                    llm_display = DEEPINFRA_MODEL.split('/')[-1] if '/' in DEEPINFRA_MODEL else DEEPINFRA_MODEL
+                    if TTS_PROVIDER == "elevenlabs":
+                        tts_display = "ElevenLabs: eleven_flash_v2_5"
+                    else:
+                        tts_model = QWEN3_TTS_MODEL.split('/')[-1] if '/' in QWEN3_TTS_MODEL else QWEN3_TTS_MODEL
+                        tts_display = f"Qwen3-TTS: {tts_model}"
+                    # Format STT provider name for display
+                    stt_display = {
+                        "speechmatics": "Speechmatics",
+                        "deepgram": "Deepgram Nova-2"
+                    }.get(STT_PROVIDER, STT_PROVIDER.capitalize())
+                    service_info = {
+                        "stt": stt_display,
+                        "memory": "Hybrid Search (SQLite)",
+                        "llm": f"DeepInfra: {llm_display}",
+                        "tts": tts_display
+                    }
+                    # Store in shared state for Gradio UI
+                    metrics_store.set_service_info(service_info)
+                    # Send via WebRTC
+                    webrtc_connection.send_app_message({
+                        "type": "service_info",
+                        **service_info
+                    })
+                    logger.info(f"📊 Sent service info to frontend: STT={stt_display}, LLM={llm_display}, TTS={tts_display}")
+            except Exception as e:
+                logger.error(f"❌ Error sending service info: {e}")
+            if task_ref["task"]:
+                verbosity = persona_params.get("verbosity", 10) if persona_params else 10
+                intro_instruction = get_introduction_instruction(client_state['client_id'], verbosity)
+                if context and hasattr(context, "messages"):
+                     context.messages.append(intro_instruction)
+                logger.info("Waiting for pipeline to warm up...")
+                await asyncio.sleep(2.0)
+                logger.info("Queueing initial LLM greeting...")
+                await task_ref["task"].queue_frames([LLMRunFrame()])
+        @pipecat_transport.event_handler("on_client_disconnected")
+        async def on_client_disconnected(transport, client):
+            logger.info("Pipecat Client disconnected")
+            if task_ref["task"]:
+                await task_ref["task"].cancel()
+            await _cleanup_services(service_refs)
+        # ====================================================================
+        # PIPELINE EXECUTION
+        # ====================================================================
+        # Enable built-in Pipecat metrics for latency tracking
+        user_bot_latency_observer = UserBotLatencyLogObserver()
+        task = PipelineTask(
+            pipeline,
+            params=PipelineParams(
+                enable_metrics=True,              # Enable performance metrics (TTFB, latency)
+                enable_usage_metrics=True,        # Enable LLM/TTS usage metrics
+                report_only_initial_ttfb=False,   # Report all TTFB measurements
+            ),
+            observers=[
+                turn_observer,
+                metrics_observer,
+                transcription_observer,
+                assistant_observer,
+                tts_state_observer,
+                vision_observer,
+                display_events_observer,          # Send events to TARS display
+                user_bot_latency_observer,        # Measures total user→bot response time
+            ],  # Non-intrusive monitoring
+        )
+        task_ref["task"] = task
+        runner = PipelineRunner(handle_sigint=False)
+        logger.info("Starting pipeline runner...")
+        try:
+            await runner.run(task)
+        except Exception:
+            raise
+        finally:
+            await _cleanup_services(service_refs)
+    except Exception as e:
+        logger.error(f"Error in bot pipeline: {e}", exc_info=True)
+    finally:
+        await _cleanup_services(service_refs)

config.ini.example ADDED Viewed

	@@ -0,0 +1,52 @@

+[LLM]
+# Available models: Any DeepInfra-supported model
+# Examples: openai/gpt-oss-20b, meta-llama/Llama-3.3-70B-Instruct-Turbo, meta-llama/Llama-3.2-3B-Instruct
+model = openai/gpt-oss-20b
+# Gating model for intervention decisions (smaller/faster model recommended)
+gating_model = meta-llama/Llama-3.2-3B-Instruct
+[STT]
+# Available providers: speechmatics, deepgram, deepgram-flux
+# - speechmatics: Speechmatics with SMART_TURN detection
+# - deepgram: Deepgram Nova-2 with endpoint detection
+# - deepgram-flux: Deepgram Flux with built-in turn detection (recommended)
+provider = deepgram-flux
+[TTS]
+# Available providers: elevenlabs, qwen3
+provider = qwen3
+# Qwen3-TTS Configuration (only used if provider = qwen3)
+# Available models: Qwen/Qwen3-TTS-12Hz-0.6B-Base, Qwen/Qwen3-TTS-12Hz-1.7B-Base
+qwen3_model = Qwen/Qwen3-TTS-12Hz-0.6B-Base
+# Available devices: mps (Mac), cuda (NVIDIA), cpu
+qwen3_device = mps
+# Reference audio file for voice cloning (relative to project root)
+qwen3_ref_audio = assets/audio/tars-clean-compressed.mp3
+[Emotional]
+# Enable real-time emotional state monitoring via video
+enabled = true
+# How often to sample video frames (in seconds)
+sampling_interval = 3.0
+# How many consecutive negative states before intervention
+intervention_threshold = 2
+[Connection]
+# Transport mode: "robot" (aiortc WebRTC to RPi) or "browser" (SmallWebRTC for browser)
+mode = robot
+# Raspberry Pi WebRTC server URL (Tailscale or local network IP)
+rpi_url = http://100.115.193.41:8001
+# Auto-connect to RPi on startup (only for robot mode)
+auto_connect = true
+# Delay between reconnection attempts (seconds)
+reconnect_delay = 5
+# Maximum reconnection attempts (0 = infinite)
+max_reconnect_attempts = 0
+[Display]
+# Enable TARS Raspberry Pi display integration (HTTP commands)
+enabled = true
+# URL of TARS display API (Tailscale or local network IP)
+tars_url = http://100.115.193.41:8001

docs/DAEMON_INTEGRATION.md ADDED Viewed

	@@ -0,0 +1,393 @@

+# Daemon Dashboard Integration
+Guide for integrating tars-conversation-app with tars-daemon dashboard app management.
+## Overview
+The tars-daemon dashboard should provide install/uninstall buttons for managing TARS apps like this one.
+## App Discovery
+The daemon scans for apps with `app.json` manifest files:
+```python
+import json
+from pathlib import Path
+def discover_apps(apps_directory="/home/mac/tars-apps"):
+    """Discover all TARS apps with manifests"""
+    apps = []
+    apps_dir = Path(apps_directory)
+    for app_path in apps_dir.iterdir():
+        manifest_path = app_path / "app.json"
+        if manifest_path.exists():
+            with open(manifest_path) as f:
+                manifest = json.load(f)
+                apps.append({
+                    "path": str(app_path),
+                    "manifest": manifest,
+                    "installed": (app_path / "venv").exists()
+                })
+    return apps
+```
+## Installation Flow
+When user clicks "Install" button:
+```python
+import subprocess
+from pathlib import Path
+def install_app(app_path):
+    """Install a TARS app"""
+    app_dir = Path(app_path)
+    manifest_path = app_dir / "app.json"
+    # Read manifest
+    with open(manifest_path) as f:
+        manifest = json.load(f)
+    # Get install script
+    install_script = manifest.get("install_script", "install.sh")
+    script_path = app_dir / install_script
+    if not script_path.exists():
+        raise FileNotFoundError(f"Install script not found: {script_path}")
+    # Run installation
+    result = subprocess.run(
+        ["bash", str(script_path)],
+        cwd=str(app_dir),
+        capture_output=True,
+        text=True
+    )
+    return {
+        "success": result.returncode == 0,
+        "stdout": result.stdout,
+        "stderr": result.stderr
+    }
+```
+## Uninstallation Flow
+When user clicks "Uninstall" button:
+```python
+def uninstall_app(app_path):
+    """Uninstall a TARS app"""
+    app_dir = Path(app_path)
+    manifest_path = app_dir / "app.json"
+    # Read manifest
+    with open(manifest_path) as f:
+        manifest = json.load(f)
+    # Get uninstall script
+    uninstall_script = manifest.get("uninstall_script", "uninstall.sh")
+    script_path = app_dir / uninstall_script
+    if not script_path.exists():
+        raise FileNotFoundError(f"Uninstall script not found: {script_path}")
+    # Run uninstallation
+    result = subprocess.run(
+        ["bash", str(script_path)],
+        cwd=str(app_dir),
+        capture_output=True,
+        text=True
+    )
+    return {
+        "success": result.returncode == 0,
+        "stdout": result.stdout,
+        "stderr": result.stderr
+    }
+```
+## Dashboard UI (Gradio Example)
+```python
+import gradio as gr
+from pathlib import Path
+def get_app_status(app_path):
+    """Check if app is installed"""
+    return (Path(app_path) / "venv").exists()
+def create_app_tab():
+    """Create app management tab in dashboard"""
+    # Discover apps
+    apps = discover_apps()
+    with gr.Tab("Apps"):
+        for app in apps:
+            manifest = app["manifest"]
+            with gr.Row():
+                gr.Markdown(f"### {manifest['name']}")
+                gr.Markdown(manifest.get("description", ""))
+            with gr.Row():
+                gr.Markdown(f"**Version:** {manifest.get('version', 'unknown')}")
+                status = "Installed" if app["installed"] else "Not Installed"
+                gr.Markdown(f"**Status:** {status}")
+            with gr.Row():
+                install_btn = gr.Button(
+                    "Install",
+                    visible=not app["installed"]
+                )
+                uninstall_btn = gr.Button(
+                    "Uninstall",
+                    visible=app["installed"]
+                )
+                output = gr.Textbox(
+                    label="Output",
+                    lines=5,
+                    max_lines=10
+                )
+            # Install handler
+            install_btn.click(
+                fn=lambda path=app["path"]: install_app(path),
+                outputs=output
+            )
+            # Uninstall handler
+            uninstall_btn.click(
+                fn=lambda path=app["path"]: uninstall_app(path),
+                outputs=output
+            )
+            gr.Markdown("---")
+# Add to dashboard
+with gr.Blocks() as dashboard:
+    create_app_tab()
+dashboard.launch()
+```
+## Recommended Directory Structure
+```
+/home/mac/
+├── tars-daemon/              # Main daemon
+│   ├── tars_daemon.py
+│   ├── dashboard.py          # Gradio dashboard with app management
+│   └── app_manager.py        # App discovery and management
+│
+└── tars-apps/                # Apps directory
+    ├── tars-conversation-app/
+    │   ├── app.json          # Manifest
+    │   ├── install.sh        # Install script
+    │   ├── uninstall.sh      # Uninstall script
+    │   └── ...
+    │
+    └── another-app/
+        ├── app.json
+        └── ...
+```
+## Environment Variables
+Apps should auto-detect deployment:
+```python
+# In app configuration
+def get_grpc_address():
+    """Auto-detect if running on Pi or remotely"""
+    # Check if on Raspberry Pi
+    try:
+        with open("/proc/cpuinfo") as f:
+            if "Raspberry Pi" in f.read():
+                return "localhost:50051"  # Local daemon
+    except:
+        pass
+    # Remote connection
+    return os.getenv("RPI_GRPC", "100.84.133.74:50051")
+```
+## Installation Validation
+The daemon should validate before installation:
+```python
+def validate_app(app_path):
+    """Validate app before installation"""
+    app_dir = Path(app_path)
+    errors = []
+    # Check manifest exists
+    manifest_path = app_dir / "app.json"
+    if not manifest_path.exists():
+        errors.append("Missing app.json manifest")
+        return errors
+    # Read manifest
+    with open(manifest_path) as f:
+        manifest = json.load(f)
+    # Check required fields
+    required = ["name", "version", "install_script"]
+    for field in required:
+        if field not in manifest:
+            errors.append(f"Missing required field: {field}")
+    # Check scripts exist
+    install_script = app_dir / manifest.get("install_script", "install.sh")
+    if not install_script.exists():
+        errors.append(f"Install script not found: {install_script}")
+    # Check Python version
+    if "dependencies" in manifest:
+        py_version = manifest["dependencies"].get("python", "")
+        if py_version:
+            # Validate version string format
+            import re
+            if not re.match(r">=?\d+\.\d+", py_version):
+                errors.append(f"Invalid Python version: {py_version}")
+    return errors
+```
+## Running Apps
+After installation, provide run buttons:
+```python
+def run_app(app_path, mode="robot"):
+    """Run an installed app"""
+    app_dir = Path(app_path)
+    manifest_path = app_dir / "app.json"
+    with open(manifest_path) as f:
+        manifest = json.load(f)
+    # Get command for mode
+    modes = manifest.get("modes", [])
+    command = None
+    for m in modes:
+        if m["name"] == mode:
+            command = m["command"]
+            break
+    if not command:
+        # Fallback to main
+        command = f"python {manifest['main']}"
+    # Activate venv and run
+    venv_python = app_dir / "venv" / "bin" / "python"
+    subprocess.Popen(
+        [str(venv_python)] + command.split()[1:],
+        cwd=str(app_dir)
+    )
+```
+## Security Considerations
+1. **Script Validation** - Verify scripts don't contain malicious commands
+2. **Sandboxing** - Consider running installations in containers
+3. **User Permissions** - Require confirmation before installation
+4. **API Keys** - Warn users to configure API keys before running
+## Example Dashboard Integration
+```python
+# In tars-daemon/dashboard.py
+import gradio as gr
+from app_manager import discover_apps, install_app, uninstall_app
+def create_dashboard():
+    with gr.Blocks() as dashboard:
+        gr.Markdown("# TARS Daemon Dashboard")
+        with gr.Tabs():
+            # Hardware tab
+            with gr.Tab("Hardware"):
+                gr.Markdown("Robot hardware controls...")
+            # Apps tab
+            with gr.Tab("Apps"):
+                apps = discover_apps("/home/mac/tars-apps")
+                for app in apps:
+                    manifest = app["manifest"]
+                    with gr.Accordion(manifest["name"], open=False):
+                        gr.Markdown(manifest.get("description", ""))
+                        gr.JSON(manifest, label="Manifest")
+                        with gr.Row():
+                            install_btn = gr.Button(
+                                "Install",
+                                visible=not app["installed"]
+                            )
+                            uninstall_btn = gr.Button(
+                                "Uninstall",
+                                visible=app["installed"]
+                            )
+                            run_btn = gr.Button(
+                                "Run",
+                                visible=app["installed"]
+                            )
+                        output = gr.Textbox(label="Output", lines=10)
+                        # Event handlers
+                        install_btn.click(
+                            fn=lambda p=app["path"]: install_app(p),
+                            outputs=output
+                        ).then(
+                            fn=lambda: gr.update(visible=False),
+                            outputs=install_btn
+                        ).then(
+                            fn=lambda: gr.update(visible=True),
+                            outputs=[uninstall_btn, run_btn]
+                        )
+            # Logs tab
+            with gr.Tab("Logs"):
+                gr.Markdown("System logs...")
+    return dashboard
+if __name__ == "__main__":
+    dashboard = create_dashboard()
+    dashboard.launch(server_name="0.0.0.0", server_port=7860)
+```
+## Testing Installation
+From the Pi:
+```bash
+# Test install
+cd ~/tars-apps/tars-conversation-app
+bash install.sh
+# Verify
+ls -la venv/
+source venv/bin/activate
+python -c "import pipecat; print('OK')"
+# Test uninstall
+bash uninstall.sh
+```
+## Next Steps
+1. Implement app discovery in tars-daemon
+2. Add Apps tab to dashboard
+3. Create app_manager.py module
+4. Test with tars-conversation-app
+5. Document for other developers

docs/DASHBOARD_UPDATE_SUMMARY.md ADDED Viewed

	@@ -0,0 +1,218 @@

+# Dashboard Update Summary
+App management functionality added to tars-daemon dashboard.
+## Changes Made
+### Backend (tars-daemon)
+**File: dashboard/backend/routes/apps.py**
+- Implemented app discovery via app.json manifests
+- Scans ~/tars-apps/ directory for apps
+- Install using install.sh script
+- Uninstall using uninstall.sh script
+- Status detection via venv/ directory
+- Start/stop app processes
+- Logs endpoint
+**File: dashboard/backend/routes/__init__.py**
+- Added apps module to imports and exports
+**File: dashboard/backend/server.py**
+- Added apps router at /api/apps/*
+### Frontend (tars-daemon)
+**File: dashboard/frontend/src/pages/AppStore.jsx**
+- Complete rewrite with functional UI
+- Install/Uninstall buttons
+- Start/Stop controls
+- Real-time status updates (5s polling)
+- Loading states and error handling
+- Success/error alerts
+**File: dashboard/frontend/src/components/ui/badge.jsx**
+- New component for status badges
+**File: dashboard/frontend/src/components/ui/alert.jsx**
+- New component for notifications
+### App Setup
+**Location: ~/tars-apps/tars-conversation-app/**
+- Copied from Mac to Pi
+- Contains app.json manifest
+- Has install.sh and uninstall.sh scripts
+## API Endpoints
+```
+GET  /api/apps/list          - List all apps with status
+POST /api/apps/install       - Install app using install.sh
+POST /api/apps/uninstall     - Uninstall app using uninstall.sh
+POST /api/apps/start         - Start app process
+POST /api/apps/stop          - Stop app process
+GET  /api/apps/logs/{name}   - Get app logs
+```
+## Testing
+### 1. Restart Dashboard
+```bash
+ssh tars-pi
+cd ~/tars-daemon
+pkill -f start_dashboard.py
+venv/bin/python start_dashboard.py
+```
+Or use systemd if configured:
+```bash
+sudo systemctl restart tars-dashboard
+```
+### 2. Verify Backend
+```bash
+# Test app discovery
+curl http://100.84.133.74:8000/api/apps/list
+# Should return JSON with tars-conversation-app
+```
+### 3. Open Dashboard
+Navigate to: http://100.84.133.74:8000
+Click on "Apps" or "App Store" tab (depending on navigation)
+### 4. Test Installation
+1. Click "Install" button for tars-conversation-app
+2. Wait for installation (may take 5-10 minutes)
+3. Status should change to "Installed"
+4. Start/Stop buttons should appear
+### 5. Test Uninstallation
+1. Stop app if running
+2. Click uninstall button (trash icon)
+3. Confirm in alerts
+4. Status returns to not installed
+## Expected Behavior
+### App Card Display
+```
+┌─────────────────────────────────────┐
+│ tars-conversation-app    [Installed]│
+│ Real-time conversational AI...      │
+│                                      │
+│ Version: 1.0.0                      │
+│ Author: TARS Project                │
+│                                      │
+│ [Start] [🗑️]                        │
+│ ~/tars-apps/tars-conversation-app   │
+└─────────────────────────────────────┘
+```
+When installing:
+```
+[Installing...] (spinner)
+```
+When running:
+```
+[Stop] [🗑️]
+```
+## Troubleshooting
+### Dashboard won't start
+Check logs:
+```bash
+tail -50 /tmp/dashboard.log
+```
+Common issues:
+- Missing fastapi: `pip install fastapi uvicorn`
+- Import errors: Check routes/__init__.py includes apps
+- Port 8000 in use: `lsof -i :8000`
+### Apps not discovered
+Check:
+```bash
+ls -la ~/tars-apps/tars-conversation-app/app.json
+```
+Verify manifest:
+```bash
+cat ~/tars-apps/tars-conversation-app/app.json | python3 -m json.tool
+```
+### Installation fails
+Check install script:
+```bash
+bash ~/tars-apps/tars-conversation-app/install.sh
+```
+Check logs in dashboard after clicking install button.
+### Frontend not updated
+Rebuild:
+```bash
+cd ~/tars-daemon/dashboard/frontend
+npm run build
+```
+Hard refresh browser: Ctrl+Shift+R
+## File Locations
+```
+tars-daemon/
+├── dashboard/
+│   ├── backend/
+│   │   ├── server.py          # Updated: added apps router
+│   │   └── routes/
+│   │       ├── __init__.py    # Updated: export apps
+│   │       └── apps.py        # NEW: app management
+│   └── frontend/
+│       └── src/
+│           ├── pages/
+│           │   └── AppStore.jsx   # Updated: full UI
+│           └── components/ui/
+│               ├── badge.jsx      # NEW
+│               └── alert.jsx      # NEW
+│
+tars-apps/
+└── tars-conversation-app/
+    ├── app.json               # Manifest
+    ├── install.sh             # Installation script
+    ├── uninstall.sh           # Uninstall script
+    └── ...                    # App files
+```
+## Next Steps
+1. Restart dashboard on Pi
+2. Test in browser
+3. Install tars-conversation-app via UI
+4. Verify installation works
+5. Add more apps to ~/tars-apps/ as needed
+## Adding More Apps
+To add new apps:
+1. Create app in ~/tars-apps/
+2. Add app.json manifest (see docs/DEVELOPING_APPS.md)
+3. Create install.sh and uninstall.sh
+4. Refresh dashboard - app appears automatically
+No code changes needed for new apps.

docs/DEVELOPING_APPS.md ADDED Viewed

	@@ -0,0 +1,400 @@

+# Developing Apps with TARS SDK
+Guide for creating TARS-compatible applications that integrate with the tars-daemon.
+## Architecture Overview
+TARS apps connect to the tars-daemon running on Raspberry Pi:
+```
+[Your App] ←→ gRPC (50051) ←→ [tars-daemon] ←→ [Hardware]
+                                                  ├─ Motors
+                                                  ├─ Camera
+                                                  └─ Display
+```
+## App Structure
+### Minimal Structure
+```
+your-app/
+├── app.json                 # App manifest (required)
+├── requirements.txt         # Python dependencies
+├── config.ini.example       # Configuration template
+├── env.example              # Environment variables template
+├── install.sh               # Installation script
+├── uninstall.sh             # Cleanup script
+├── main.py                  # Entry point
+└── README.md                # Documentation
+```
+## App Manifest (app.json)
+Required file for daemon dashboard integration:
+```json
+{
+  "name": "tars-conversation-app",
+  "version": "1.0.0",
+  "description": "Real-time conversational AI with WebRTC",
+  "author": "Your Name",
+  "repository": "https://github.com/yourusername/your-app.git",
+  "main": "tars_bot.py",
+  "install_script": "install.sh",
+  "uninstall_script": "uninstall.sh",
+  "dependencies": {
+    "python": ">=3.10",
+    "system": ["portaudio19-dev", "ffmpeg"]
+  },
+  "environment": [
+    "DEEPINFRA_API_KEY",
+    "SPEECHMATICS_API_KEY"
+  ],
+  "configuration": {
+    "file": "config.ini",
+    "example": "config.ini.example"
+  },
+  "ports": {
+    "grpc": 50051,
+    "http": 8765
+  }
+}
+```
+## Configuration System
+### Environment Variables (.env.local)
+Store secrets only, never commit:
+```bash
+# API Keys
+DEEPINFRA_API_KEY=your_key_here
+SPEECHMATICS_API_KEY=your_key_here
+ELEVENLABS_API_KEY=your_key_here
+```
+### User Configuration (config.ini)
+Runtime settings users can modify:
+```ini
+[Connection]
+mode = robot
+rpi_url = http://100.84.133.74:8765
+rpi_grpc = 100.84.133.74:50051
+auto_connect = false
+[LLM]
+model = openai/gpt-oss-20b
+gating_model = meta-llama/Llama-3.2-3B-Instruct
+```
+### Loading Configuration
+```python
+from pathlib import Path
+from configparser import ConfigParser
+from dotenv import load_dotenv
+import os
+# Load secrets
+env_local = Path(__file__).parent / ".env.local"
+load_dotenv(env_local, override=True)
+# Load config
+config = ConfigParser()
+config.read("config.ini")
+# Runtime reload without restart
+def get_fresh_config():
+    config = ConfigParser()
+    config.read("config.ini")
+    return config
+```
+## Connecting to tars-daemon
+### gRPC Client
+```python
+import grpc
+from tars_sdk import TarsClient
+# Singleton client
+_client = None
+def get_tars_client():
+    global _client
+    if _client is None:
+        grpc_address = os.getenv("RPI_GRPC", "100.84.133.74:50051")
+        channel = grpc.insecure_channel(grpc_address)
+        _client = TarsClient(channel)
+    return _client
+# Use the client
+client = get_tars_client()
+client.execute_movement("wave_right")
+client.set_emotion("happy")
+```
+### Deployment Mode Detection
+Auto-detect if running locally on Pi or remotely:
+```python
+def detect_deployment_mode():
+    # Check if running on Raspberry Pi
+    try:
+        with open("/proc/cpuinfo", "r") as f:
+            if "Raspberry Pi" in f.read():
+                return "local"
+    except FileNotFoundError:
+        pass
+    # Check if daemon running on localhost
+    try:
+        import grpc
+        channel = grpc.insecure_channel("localhost:50051")
+        grpc.channel_ready_future(channel).result(timeout=1)
+        return "local"
+    except:
+        return "remote"
+def get_grpc_address():
+    if detect_deployment_mode() == "local":
+        return "localhost:50051"
+    return os.getenv("RPI_GRPC", "100.84.133.74:50051")
+```
+## Installation Scripts
+### install.sh
+```bash
+#!/bin/bash
+set -e
+APP_NAME="your-app"
+APP_DIR="$HOME/$APP_NAME"
+echo "Installing $APP_NAME..."
+# Check Python version
+python3 --version | grep -q "3.1[0-9]" || {
+    echo "Error: Python 3.10+ required"
+    exit 1
+}
+# Install system dependencies
+sudo apt-get update
+sudo apt-get install -y portaudio19-dev ffmpeg
+# Create virtual environment
+python3 -m venv "$APP_DIR/venv"
+source "$APP_DIR/venv/bin/activate"
+# Install Python dependencies
+pip install --upgrade pip
+pip install -r requirements.txt
+# Setup configuration
+if [ ! -f config.ini ]; then
+    cp config.ini.example config.ini
+    echo "Created config.ini - please configure before running"
+fi
+if [ ! -f .env.local ]; then
+    cp env.example .env.local
+    echo "Created .env.local - please add API keys"
+fi
+echo "Installation complete!"
+echo "Next steps:"
+echo "1. Edit .env.local with your API keys"
+echo "2. Edit config.ini if needed"
+echo "3. Run: python main.py"
+```
+### uninstall.sh
+```bash
+#!/bin/bash
+set -e
+APP_NAME="your-app"
+APP_DIR="$HOME/$APP_NAME"
+echo "Uninstalling $APP_NAME..."
+# Stop running processes
+pkill -f "python.*$APP_NAME" || true
+# Remove virtual environment
+rm -rf "$APP_DIR/venv"
+# Remove generated data (optional)
+read -p "Remove data directories? (y/N) " -n 1 -r
+echo
+if [[ $REPL =~ ^[Yy]$ ]]; then
+    rm -rf chroma_memory memory_data
+fi
+echo "Uninstall complete!"
+```
+## Best Practices
+### 1. Project Structure
+- Keep source code in `src/` directory
+- Separate configuration from code
+- Provide example configs (never commit secrets)
+- Include tests in `tests/` directory
+### 2. Configuration
+- Use `.env.local` for secrets (gitignore it)
+- Use `config.ini` for user settings (gitignore it)
+- Provide `.example` templates
+- Support runtime config reload when possible
+### 3. Dependencies
+- Pin major versions in requirements.txt
+- Document system dependencies in README
+- Test on fresh Pi OS installation
+- Keep dependencies minimal
+### 4. Error Handling
+- Validate configuration on startup
+- Provide clear error messages
+- Test connection to daemon before running
+- Graceful degradation if hardware unavailable
+### 5. Performance
+- Use gRPC for low-latency commands (~5-10ms)
+- Batch operations when possible
+- Monitor resource usage on Pi
+- Optimize for Raspberry Pi 4 (4GB RAM)
+### 6. Testing
+- Test on actual hardware
+- Provide test scripts for gestures/expressions
+- Document expected behavior
+- Include connection tests
+## Example: Minimal TARS App
+```python
+# main.py
+import grpc
+from tars_sdk import TarsClient
+from pathlib import Path
+from dotenv import load_dotenv
+import os
+# Load configuration
+load_dotenv(Path(__file__).parent / ".env.local")
+# Connect to daemon
+grpc_address = os.getenv("RPI_GRPC", "100.84.133.74:50051")
+channel = grpc.insecure_channel(grpc_address)
+client = TarsClient(channel)
+# Test connection
+try:
+    status = client.get_robot_status()
+    print(f"Connected to TARS: {status}")
+except Exception as e:
+    print(f"Connection failed: {e}")
+    exit(1)
+# Use robot
+client.set_emotion("happy")
+client.execute_movement("wave_right")
+print("TARS says hello!")
+```
+## Integration with Claude Code
+Structure your app for easy AI-assisted development:
+1. **Clear directory structure** - AI can navigate easily
+2. **Documented configuration** - AI understands settings
+3. **Type hints** - AI provides better suggestions
+4. **Docstrings** - AI understands intent
+5. **README.md** - AI reads project context
+See CLAUDE.md for project-specific guidelines.
+## Common Patterns
+### Startup Validation
+```python
+def validate_startup():
+    """Check all requirements before running"""
+    errors = []
+    # Check API keys
+    if not os.getenv("DEEPINFRA_API_KEY"):
+        errors.append("Missing DEEPINFRA_API_KEY in .env.local")
+    # Check config file
+    if not Path("config.ini").exists():
+        errors.append("config.ini not found")
+    # Test daemon connection
+    try:
+        client = get_tars_client()
+        client.get_robot_status()
+    except Exception as e:
+        errors.append(f"Cannot connect to daemon: {e}")
+    if errors:
+        print("Startup validation failed:")
+        for error in errors:
+            print(f"  - {error}")
+        exit(1)
+```
+### Graceful Shutdown
+```python
+import signal
+import sys
+def signal_handler(sig, frame):
+    """Clean shutdown on Ctrl+C"""
+    print("\nShutting down...")
+    # Reset robot state
+    try:
+        client = get_tars_client()
+        client.set_emotion("neutral")
+        client.set_eye_state(True, True)
+    except:
+        pass
+    sys.exit(0)
+signal.signal(signal.SIGINT, signal_handler)
+```
+## Resources
+- tars-daemon: `~/tars-daemon` on Pi
+- TARS SDK: Install via pip `pip install tars-sdk`
+- Example Apps: This repository (tars-conversation-app)
+- Pi Access: `ssh tars-pi` (100.84.133.74)
+## Support
+- Check daemon status: `systemctl status tars-daemon`
+- View daemon logs: `journalctl -u tars-daemon -f`
+- Test gRPC connection: `grpcurl -plaintext 100.84.133.74:50051 list`

docs/INSTALLATION_GUIDE.md ADDED Viewed

	@@ -0,0 +1,264 @@

+# Installation Guide
+Quick reference for installing tars-conversation-app on Raspberry Pi.
+## Prerequisites
+- Raspberry Pi 4 (4GB RAM recommended)
+- Raspberry Pi OS (Bullseye or later)
+- Python 3.10 or higher
+- Internet connection
+## From Dashboard (Recommended)
+Once tars-daemon implements app management:
+1. Open tars-daemon dashboard at `http://100.84.133.74:7860`
+2. Navigate to "Apps" tab
+3. Find "tars-conversation-app"
+4. Click "Install" button
+5. Wait for installation to complete
+6. Configure API keys in `.env.local`
+7. Adjust settings in `config.ini` if needed
+8. Click "Run" to start
+## Manual Installation (SSH)
+### Step 1: Clone Repository
+```bash
+ssh tars-pi
+cd ~
+git clone https://github.com/latishab/tars-conversation-app.git
+cd tars-conversation-app
+```
+### Step 2: Run Installer
+```bash
+bash install.sh
+```
+The installer will:
+- Check Python version (requires 3.10+)
+- Install system dependencies (portaudio, ffmpeg)
+- Create Python virtual environment
+- Install all Python packages
+- Create config files from templates
+This takes 5-10 minutes on first run.
+### Step 3: Configure
+Edit API keys:
+```bash
+nano .env.local
+```
+Add your keys:
+```bash
+DEEPINFRA_API_KEY=your_key_here
+SPEECHMATICS_API_KEY=your_key_here
+# or
+DEEPGRAM_API_KEY=your_key_here
+```
+Edit settings (optional):
+```bash
+nano config.ini
+```
+### Step 4: Run
+Activate virtual environment:
+```bash
+source venv/bin/activate
+```
+Run in robot mode:
+```bash
+python tars_bot.py
+```
+Or run dashboard:
+```bash
+python ui/app.py
+```
+## Verification
+Check installation:
+```bash
+# Activate venv
+source ~/tars-conversation-app/venv/bin/activate
+# Test imports
+python -c "import pipecat; print('Pipecat OK')"
+python -c "from tars_sdk import TarsClient; print('TARS SDK OK')"
+# Test daemon connection
+python -c "
+import grpc
+from tars_sdk import TarsClient
+channel = grpc.insecure_channel('localhost:50051')
+client = TarsClient(channel)
+print('Daemon connection OK')
+"
+```
+## Uninstallation
+From dashboard:
+1. Navigate to "Apps" tab
+2. Find "tars-conversation-app"
+3. Click "Uninstall" button
+4. Choose whether to keep data/config
+Manual:
+```bash
+cd ~/tars-conversation-app
+bash uninstall.sh
+```
+## Troubleshooting
+### Installation fails
+Check Python version:
+```bash
+python3 --version
+# Should be 3.10 or higher
+```
+Check disk space:
+```bash
+df -h
+# Need at least 2GB free
+```
+Check internet:
+```bash
+ping google.com
+```
+### Dependencies fail to install
+Update package lists:
+```bash
+sudo apt-get update
+sudo apt-get upgrade
+```
+Reinstall system deps:
+```bash
+sudo apt-get install -y portaudio19-dev ffmpeg build-essential python3-dev
+```
+### Virtual environment issues
+Remove and recreate:
+```bash
+rm -rf venv
+python3 -m venv venv
+source venv/bin/activate
+pip install -r requirements.txt
+```
+### Configuration not found
+Recreate from templates:
+```bash
+cp config.ini.example config.ini
+cp env.example .env.local
+```
+### Cannot connect to daemon
+Check daemon is running:
+```bash
+systemctl status tars-daemon
+```
+Test gRPC port:
+```bash
+nc -zv localhost 50051
+```
+Check logs:
+```bash
+journalctl -u tars-daemon -f
+```
+## Running in Background
+Use systemd service:
+```bash
+# Create service file
+sudo nano /etc/systemd/system/tars-conversation.service
+```
+Add:
+```ini
+[Unit]
+Description=TARS Conversation App
+After=network.target tars-daemon.service
+Requires=tars-daemon.service
+[Service]
+Type=simple
+User=mac
+WorkingDirectory=/home/mac/tars-conversation-app
+ExecStart=/home/mac/tars-conversation-app/venv/bin/python tars_bot.py
+Restart=always
+RestartSec=10
+[Install]
+WantedBy=multi-user.target
+```
+Enable and start:
+```bash
+sudo systemctl daemon-reload
+sudo systemctl enable tars-conversation.service
+sudo systemctl start tars-conversation.service
+```
+Check status:
+```bash
+sudo systemctl status tars-conversation.service
+journalctl -u tars-conversation.service -f
+```
+## Updating
+Pull latest changes:
+```bash
+cd ~/tars-conversation-app
+git pull
+```
+Update dependencies:
+```bash
+source venv/bin/activate
+pip install -r requirements.txt --upgrade
+```
+Restart if running as service:
+```bash
+sudo systemctl restart tars-conversation.service
+```
+## Resource Usage
+Expected resource usage on Pi 4:
+- **Installation size**: ~1.5GB (venv + packages)
+- **Memory**: 500MB-1GB during conversation
+- **CPU**: 30-50% (varies with STT/TTS)
+- **Network**: ~100kbps for audio + API calls
+Recommend:
+- 4GB RAM Pi (2GB may struggle)
+- Active cooling for sustained use
+- Wired ethernet for stability

docs/MEMORY.md ADDED Viewed

	@@ -0,0 +1,190 @@

+# Hybrid Memory System
+## Overview
+A high-performance memory system optimized for voice AI applications with sub-50ms latency targets. Combines semantic vector search with BM25 keyword matching for superior recall and precision.
+## Architecture
+### Hybrid Search (70% Vector + 30% BM25)
+1. **Vector Search (70% weight)**
+   - Uses `all-MiniLM-L6-v2` for semantic embeddings
+   - Cosine similarity for relevance scoring
+   - Captures semantic meaning and context
+2. **BM25 Keyword Search (30% weight)**
+   - SQLite FTS5 full-text search
+   - Exact keyword matching
+   - Handles specific names, terms, and facts
+3. **Score Fusion**
+   - Weighted combination of both approaches
+   - Best of both worlds: semantic understanding + exact matching
+## Performance Optimizations
+### For Voice AI (<50ms target)
+| Optimization | Benefit |
+|--------------|---------|
+| **Query Embedding Cache** | Avoid re-encoding similar queries (-20-40ms on cache hit) |
+| **Pre-warmed Model** | Eliminates cold start latency (-50ms) |
+| **Thread Pool** | Non-blocking SQLite operations (-5-10ms) |
+| **Strict Timeout** | Guarantees <50ms with graceful fallback |
+| **Fire-and-Forget Storage** | Stores memories asynchronously (0ms blocking) |
+| **SQLite In-Process** | No network overhead vs ChromaDB (-10-20ms) |
+## Latency Comparison
+| System | Search Latency | Voice AI Ready? |
+|--------|---------------|-----------------|
+| ChromaDB | 50-100ms | ⚠️ Borderline |
+| **Hybrid Memory** | **20-40ms** | ✅ |
+## Configuration
+```python
+memory_service = HybridMemoryService(
+    user_id=client_id,
+    db_path="./memory_data/memory.sqlite",
+    search_limit=3,              # Top N results to return
+    search_timeout_ms=40,        # Strict timeout for voice AI
+    vector_weight=0.7,           # 70% semantic similarity
+    bm25_weight=0.3,             # 30% keyword matching
+    system_prompt_prefix="From our conversations:\n",
+)
+```
+## Database Schema
+### Main Table
+```sql
+CREATE TABLE memories (
+    id INTEGER PRIMARY KEY,
+    user_id TEXT NOT NULL,
+    content TEXT NOT NULL,
+    embedding BLOB,              -- numpy float32 array
+    created_at REAL
+)
+```
+### FTS5 Index
+```sql
+CREATE VIRTUAL TABLE memories_fts USING fts5(
+    content,
+    content='memories',
+    content_rowid='id'
+)
+```
+## Performance Metrics
+The service tracks:
+- **searches**: Total number of searches
+- **cache_hits**: Query embedding cache hits
+- **cache_hit_rate**: Percentage of cached queries
+- **timeouts**: Searches exceeding timeout threshold
+- **avg_latency_ms**: Average search latency
+Access stats:
+```python
+stats = memory_service.get_stats()
+print(stats)
+```
+## How It Works
+### Search Process
+1. **User message arrives** → Extract text
+2. **Generate query embedding** → Check cache first
+3. **Vector search** → Scan recent 100 memories, compute cosine similarity
+4. **BM25 search** → FTS5 query for keyword matches
+5. **Score fusion** → Combine weighted scores
+6. **Return top N** → Sorted by final score
+7. **Inject into context** → Add as system message
+8. **Store asynchronously** → Fire-and-forget storage
+### Example
+```
+User: "What's my favorite color?"
+Vector Search:
+- "I love blue, it's my favorite color" → 0.85 similarity
+- "My room is painted blue" → 0.62 similarity
+BM25 Search:
+- "I love blue, it's my favorite color" → rank 1 (score: 1.0)
+- "Blue is calming" → rank 2 (score: 0.5)
+Final Scores (70% vector + 30% BM25):
+- "I love blue, it's my favorite color" → 0.85*0.7 + 1.0*0.3 = 0.895 ✓
+- "My room is painted blue" → 0.62*0.7 + 0.0*0.3 = 0.434
+- "Blue is calming" → 0.0*0.7 + 0.5*0.3 = 0.150
+Top result returned: "I love blue, it's my favorite color"
+```
+## Migration from ChromaDB
+The hybrid memory service is a drop-in replacement:
+```diff
+- from services.memory_chromadb import ChromaDBMemoryService
++ from services.memory_hybrid import HybridMemoryService
+- memory_service = ChromaDBMemoryService(
++ memory_service = HybridMemoryService(
+      user_id=client_id,
+-     agent_id="tars_agent",
+-     collection_name="conversations",
+-     search_limit=5,
+-     search_threshold=0.5,
++     db_path="./memory_data/memory.sqlite",
++     search_limit=3,
++     search_timeout_ms=40,
++     vector_weight=0.7,
++     bm25_weight=0.3,
+  )
+```
+## Storage Location
+- **Database**: `./memory_data/memory.sqlite`
+- **Format**: SQLite with FTS5 extension
+- **Embeddings**: Stored as binary BLOBs (numpy float32)
+## Dependencies
+- `sqlite3` (built-in with Python)
+- `sentence-transformers` (already installed)
+- `numpy` (dependency of sentence-transformers)
+No additional packages required!
+## Troubleshooting
+### High Latency
+- Check cache hit rate: `memory_service.get_stats()`
+- Reduce `search_limit` if processing too many results
+- Increase `search_timeout_ms` if needed
+### Timeouts
+- Review timeout stats: `stats["timeouts"]`
+- Consider increasing `search_timeout_ms` to 50-60ms
+- Check if database is growing too large
+### Memory Not Recalled
+- Verify memories are being stored (check database)
+- Adjust `vector_weight` and `bm25_weight` balance
+- Try rephrasing queries to match stored content
+## Future Enhancements
+- [ ] Automatic database compaction/cleanup
+- [ ] Per-user memory limits
+- [ ] Memory importance scoring
+- [ ] Temporal decay for older memories
+- [ ] Multi-turn conversation grouping

env.example ADDED Viewed

	@@ -0,0 +1,59 @@

+# STT Provider Configuration
+# Options: "speechmatics", "deepgram", or "deepgram-flux"
+STT_PROVIDER=speechmatics
+# Speechmatics API Key
+# Get your API key from: https://portal.speechmatics.com/
+SPEECHMATICS_API_KEY=your_speechmatics_api_key_here
+# Deepgram API Key (only needed if STT_PROVIDER=deepgram or deepgram-flux)
+# Get your API key from: https://console.deepgram.com/
+DEEPGRAM_API_KEY=your_deepgram_api_key_here
+# ElevenLabs API Key
+# Get your API key from: https://elevenlabs.io/app/settings/api-keys
+ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
+# ElevenLabs Voice ID (optional, defaults to custom voice)
+# Find voice IDs at: https://elevenlabs.io/app/voices
+ELEVENLABS_VOICE_ID=ry8mpwRw6nugb2qjP0tu
+# DeepInfra API Key (for Qwen LLM and Gating Layer)
+# Get your API key from: https://deepinfra.com/
+DEEPINFRA_API_KEY=your_deepinfra_api_key_here
+# Optional: Override default models
+# DEEPINFRA_MODEL=Qwen/Qwen3-235B-A22B-Instruct-2507  # Main LLM (default)
+# DEEPINFRA_GATING_MODEL=meta-llama/Llama-3.2-3B-Instruct  # Gating Layer (default)
+# Pipecat FastAPI service URL (for frontend to connect)
+NEXT_PUBLIC_PIPECAT_URL=http://localhost:7860
+# Pipecat FastAPI service configuration
+PIPECAT_HOST=localhost
+PIPECAT_PORT=7860
+# Mem0 API Key (optional, enables long-term memory)
+# Get one from: https://docs.mem0.ai/
+MEM0_API_KEY=your_mem0_api_key_here
+# TTS Provider Configuration
+# Options: "elevenlabs" (cloud, requires API key) or "qwen3" (local, free)
+TTS_PROVIDER=qwen3
+# Qwen3-TTS Configuration (only needed if TTS_PROVIDER=qwen3)
+# Model: 0.6B (faster, less memory) or 1.7B (better quality)
+QWEN3_TTS_MODEL=Qwen/Qwen3-TTS-12Hz-0.6B-Base
+# Device: "mps" for Mac, "cuda" for NVIDIA GPU, "cpu" for CPU
+QWEN3_TTS_DEVICE=mps
+# Reference audio file for voice cloning (relative to project root)
+QWEN3_TTS_REF_AUDIO=assets/audio/tars-clean-compressed.mp3
+# Emotional State Monitoring
+# Continuously analyzes video for confusion/hesitation/frustration
+# Triggers TARS to offer help proactively
+EMOTIONAL_MONITORING_ENABLED=true
+# How often to sample video frames (in seconds)
+EMOTIONAL_SAMPLING_INTERVAL=3.0
+# How many consecutive negative states before intervention
+EMOTIONAL_INTERVENTION_THRESHOLD=2

index.html ADDED Viewed

	@@ -0,0 +1,333 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>TARS Conversation App</title>
+    <style>
+        * {
+            margin: 0;
+            padding: 0;
+            box-sizing: border-box;
+        }
+        body {
+            font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif;
+            line-height: 1.6;
+            color: #333;
+            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
+            min-height: 100vh;
+            padding: 20px;
+        }
+        .container {
+            max-width: 900px;
+            margin: 0 auto;
+            background: white;
+            border-radius: 16px;
+            padding: 40px;
+            box-shadow: 0 20px 60px rgba(0, 0, 0, 0.3);
+        }
+        header {
+            text-align: center;
+            margin-bottom: 40px;
+            padding-bottom: 30px;
+            border-bottom: 2px solid #f0f0f0;
+        }
+        h1 {
+            font-size: 2.5rem;
+            color: #667eea;
+            margin-bottom: 10px;
+        }
+        .subtitle {
+            font-size: 1.2rem;
+            color: #666;
+            margin-bottom: 20px;
+        }
+        .badges {
+            display: flex;
+            gap: 10px;
+            justify-content: center;
+            flex-wrap: wrap;
+        }
+        .badge {
+            background: #667eea;
+            color: white;
+            padding: 6px 16px;
+            border-radius: 20px;
+            font-size: 14px;
+            font-weight: 500;
+        }
+        .badge.version {
+            background: #764ba2;
+        }
+        .badge.tars {
+            background: #48bb78;
+        }
+        section {
+            margin-bottom: 40px;
+        }
+        h2 {
+            color: #667eea;
+            font-size: 1.8rem;
+            margin-bottom: 15px;
+        }
+        h3 {
+            color: #764ba2;
+            font-size: 1.3rem;
+            margin-bottom: 10px;
+            margin-top: 25px;
+        }
+        .install-box {
+            background: #f7fafc;
+            border-left: 4px solid #667eea;
+            padding: 25px;
+            border-radius: 8px;
+            margin: 20px 0;
+        }
+        .install-steps {
+            list-style: none;
+            counter-reset: step-counter;
+        }
+        .install-steps li {
+            counter-increment: step-counter;
+            margin-bottom: 15px;
+            padding-left: 40px;
+            position: relative;
+        }
+        .install-steps li::before {
+            content: counter(step-counter);
+            position: absolute;
+            left: 0;
+            top: 0;
+            background: #667eea;
+            color: white;
+            width: 28px;
+            height: 28px;
+            border-radius: 50%;
+            display: flex;
+            align-items: center;
+            justify-content: center;
+            font-weight: bold;
+            font-size: 14px;
+        }
+        code {
+            background: #2d3748;
+            color: #68d391;
+            padding: 3px 8px;
+            border-radius: 4px;
+            font-family: "Courier New", monospace;
+            font-size: 0.9em;
+        }
+        pre {
+            background: #2d3748;
+            color: #e2e8f0;
+            padding: 20px;
+            border-radius: 8px;
+            overflow-x: auto;
+            margin: 15px 0;
+        }
+        pre code {
+            background: none;
+            padding: 0;
+            color: inherit;
+        }
+        .features {
+            display: grid;
+            grid-template-columns: repeat(auto-fit, minmax(250px, 1fr));
+            gap: 20px;
+            margin: 20px 0;
+        }
+        .feature-card {
+            background: #f7fafc;
+            padding: 20px;
+            border-radius: 8px;
+            border-left: 4px solid #764ba2;
+        }
+        .feature-card h4 {
+            color: #667eea;
+            margin-bottom: 8px;
+        }
+        .btn {
+            display: inline-block;
+            background: #667eea;
+            color: white;
+            padding: 12px 30px;
+            border-radius: 8px;
+            text-decoration: none;
+            font-weight: 600;
+            transition: background 0.3s;
+            margin-right: 10px;
+            margin-top: 10px;
+        }
+        .btn:hover {
+            background: #5568d3;
+        }
+        .btn.secondary {
+            background: #764ba2;
+        }
+        .btn.secondary:hover {
+            background: #68399e;
+        }
+        footer {
+            text-align: center;
+            margin-top: 50px;
+            padding-top: 30px;
+            border-top: 2px solid #f0f0f0;
+            color: #666;
+        }
+        .tech-stack {
+            display: flex;
+            flex-wrap: wrap;
+            gap: 10px;
+            margin: 15px 0;
+        }
+        .tech {
+            background: #edf2f7;
+            padding: 8px 16px;
+            border-radius: 6px;
+            font-size: 14px;
+            color: #4a5568;
+        }
+    </style>
+</head>
+<body>
+    <div class="container">
+        <header>
+            <h1>🤖 TARS Conversation App</h1>
+            <p class="subtitle">Real-time conversational AI for TARS robots</p>
+            <div class="badges">
+                <span class="badge">AI Assistant</span>
+                <span class="badge version">v1.0.0</span>
+                <span class="badge tars">TARS App</span>
+            </div>
+        </header>
+        <section>
+            <h2>Features</h2>
+            <div class="features">
+                <div class="feature-card">
+                    <h4>🎤 Real-time Voice</h4>
+                    <p>WebRTC audio with Speechmatics/Deepgram transcription</p>
+                </div>
+                <div class="feature-card">
+                    <h4>🧠 Smart Memory</h4>
+                    <p>Hybrid vector + BM25 search with ChromaDB</p>
+                </div>
+                <div class="feature-card">
+                    <h4>👁️ Vision Analysis</h4>
+                    <p>Image understanding with Moondream</p>
+                </div>
+                <div class="feature-card">
+                    <h4>📊 Live Dashboard</h4>
+                    <p>Gradio metrics, latency charts, transcriptions</p>
+                </div>
+                <div class="feature-card">
+                    <h4>🎭 Emotional AI</h4>
+                    <p>Real-time emotion and sentiment monitoring</p>
+                </div>
+                <div class="feature-card">
+                    <h4>🤖 Robot Control</h4>
+                    <p>gRPC commands for gestures, eyes, movement</p>
+                </div>
+            </div>
+        </section>
+        <section>
+            <h2>Installation on TARS Robot</h2>
+            <div class="install-box">
+                <ol class="install-steps">
+                    <li>Open TARS dashboard at <code>http://your-pi:8000</code></li>
+                    <li>Go to <strong>App Store</strong> tab</li>
+                    <li>Enter Space ID: <code>latishab/tars-conversation-app</code></li>
+                    <li>Click <strong>Install from HuggingFace</strong></li>
+                    <li>Configure API keys in <code>.env.local</code></li>
+                    <li>Click <strong>Start</strong></li>
+                    <li>Open dashboard at <code>http://your-pi:7860</code></li>
+                </ol>
+            </div>
+        </section>
+        <section>
+            <h3>Required API Keys</h3>
+            <ul style="list-style-position: inside; margin-left: 20px;">
+                <li><code>DEEPINFRA_API_KEY</code> - For LLM (DeepInfra)</li>
+                <li><code>SPEECHMATICS_API_KEY</code> or <code>DEEPGRAM_API_KEY</code> - For STT</li>
+                <li><code>ELEVENLABS_API_KEY</code> (optional) - For premium TTS</li>
+            </ul>
+        </section>
+        <section>
+            <h2>Tech Stack</h2>
+            <div class="tech-stack">
+                <span class="tech">Pipecat</span>
+                <span class="tech">WebRTC</span>
+                <span class="tech">Gradio</span>
+                <span class="tech">ChromaDB</span>
+                <span class="tech">gRPC</span>
+                <span class="tech">Speechmatics</span>
+                <span class="tech">Deepgram</span>
+                <span class="tech">ElevenLabs</span>
+                <span class="tech">DeepInfra</span>
+                <span class="tech">Moondream</span>
+            </div>
+        </section>
+        <section>
+            <h2>Manual Installation</h2>
+            <p>For development or non-TARS deployments:</p>
+            <pre><code>git clone https://github.com/latishab/tars-conversation-app.git
+cd tars-conversation-app
+bash install.sh
+# Configure
+cp env.example .env.local
+cp config.ini.example config.ini
+# Run
+python tars_bot.py  # Robot mode
+python bot.py       # Browser mode</code></pre>
+        </section>
+        <section>
+            <h2>Resources</h2>
+            <a href="https://github.com/latishab/tars-conversation-app" class="btn">GitHub Repository</a>
+            <a href="https://github.com/latishab/tars-conversation-app#readme" class="btn secondary">Documentation</a>
+        </section>
+        <footer>
+            <p>Built with TarsApp framework • TARS Project</p>
+            <p style="margin-top: 10px; font-size: 14px;">
+                <a href="https://huggingface.co/spaces/latishab/tars-conversation-app" style="color: #667eea;">View on HuggingFace</a>
+            </p>
+        </footer>
+    </div>
+</body>
+</html>

install.sh ADDED Viewed

	@@ -0,0 +1,99 @@

+#!/bin/bash
+set -e
+APP_NAME="tars-conversation-app"
+APP_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+echo "=== Installing $APP_NAME ==="
+echo "Directory: $APP_DIR"
+echo
+# Check Python version
+echo "Checking Python version..."
+PYTHON_VERSION=$(python3 --version 2>&1 | grep -oP '\d+\.\d+')
+REQUIRED_VERSION="3.10"
+if [ "$(printf '%s\n' "$REQUIRED_VERSION" "$PYTHON_VERSION" | sort -V | head -n1)" != "$REQUIRED_VERSION" ]; then
+    echo "Error: Python $REQUIRED_VERSION or higher required (found $PYTHON_VERSION)"
+    exit 1
+fi
+echo "Python $PYTHON_VERSION OK"
+echo
+# Install system dependencies
+echo "Installing system dependencies..."
+sudo apt-get update -qq
+sudo apt-get install -y portaudio19-dev ffmpeg build-essential python3-dev python3-venv
+echo "System dependencies installed"
+echo
+# Create virtual environment
+if [ ! -d "$APP_DIR/venv" ]; then
+    echo "Creating virtual environment..."
+    python3 -m venv "$APP_DIR/venv"
+    echo "Virtual environment created"
+else
+    echo "Virtual environment already exists"
+fi
+echo
+# Activate virtual environment
+source "$APP_DIR/venv/bin/activate"
+# Upgrade pip
+echo "Upgrading pip..."
+pip install --upgrade pip -q
+echo
+# Install Python dependencies
+echo "Installing Python dependencies..."
+echo "This may take several minutes..."
+pip install -r "$APP_DIR/requirements.txt" -q
+echo "Python dependencies installed"
+echo
+# Setup configuration files
+if [ ! -f "$APP_DIR/config.ini" ]; then
+    echo "Creating config.ini from template..."
+    cp "$APP_DIR/config.ini.example" "$APP_DIR/config.ini"
+    echo "Created config.ini"
+    CONFIG_CREATED=true
+else
+    echo "config.ini already exists"
+    CONFIG_CREATED=false
+fi
+echo
+if [ ! -f "$APP_DIR/.env.local" ]; then
+    echo "Creating .env.local from template..."
+    cp "$APP_DIR/env.example" "$APP_DIR/.env.local"
+    echo "Created .env.local"
+    ENV_CREATED=true
+else
+    echo ".env.local already exists"
+    ENV_CREATED=false
+fi
+echo
+# Run video codec fix if needed
+if [ -f "$APP_DIR/fix_video_codec.sh" ]; then
+    echo "Applying video codec fixes..."
+    bash "$APP_DIR/fix_video_codec.sh" || true
+fi
+echo "=== Installation Complete ==="
+echo
+echo "Next steps:"
+if [ "$CONFIG_CREATED" = true ] || [ "$ENV_CREATED" = true ]; then
+    echo "1. Edit configuration files:"
+    [ "$ENV_CREATED" = true ] && echo "   - Add API keys to: $APP_DIR/.env.local"
+    [ "$CONFIG_CREATED" = true ] && echo "   - Configure settings: $APP_DIR/config.ini"
+    echo "2. Activate environment: source $APP_DIR/venv/bin/activate"
+    echo "3. Run the app: python $APP_DIR/tars_bot.py"
+else
+    echo "1. Activate environment: source $APP_DIR/venv/bin/activate"
+    echo "2. Run the app: python $APP_DIR/tars_bot.py"
+fi
+echo
+echo "For browser mode: python $APP_DIR/bot.py"
+echo "For dashboard: python $APP_DIR/ui/app.py"

manifest.json ADDED Viewed

	@@ -0,0 +1,47 @@

+{
+  "name": "tars-conversation-app",
+  "version": "1.0.0",
+  "description": "Real-time conversational AI with WebRTC, memory, and vision",
+  "author": "TARS Project",
+  "repository": "https://github.com/latishab/tars-conversation-app.git",
+  "entry_point": "tars_conversation_app.wrapper:ConversationApp",
+  "custom_app_url": "http://localhost:7860",
+  "icon": "assets/tars-icon.png",
+  "huggingface_space": "latishab/tars-conversation-app",
+  "install_script": "install.sh",
+  "uninstall_script": "uninstall.sh",
+  "dependencies": {
+    "python": ">=3.10",
+    "system": [
+      "portaudio19-dev",
+      "ffmpeg",
+      "build-essential",
+      "python3-dev"
+    ]
+  },
+  "environment": [
+    "DEEPINFRA_API_KEY",
+    "SPEECHMATICS_API_KEY",
+    "DEEPGRAM_API_KEY",
+    "ELEVENLABS_API_KEY"
+  ],
+  "configuration": {
+    "file": "config.ini",
+    "example": "config.ini.example",
+    "env_file": ".env.local",
+    "env_example": "env.example"
+  },
+  "ports": {
+    "grpc": 50051,
+    "http": 8765,
+    "fastapi": 8080,
+    "dashboard": 7860
+  },
+  "services": {
+    "dashboard": {
+      "enabled": true,
+      "description": "Gradio metrics and monitoring dashboard",
+      "url": "http://localhost:7860"
+    }
+  }
+}

pipecat_service.py ADDED Viewed

	@@ -0,0 +1,272 @@

+#!/usr/bin/env python3
+"""
+Pipecat.ai service for real-time transcription and TTS using SmallWebRTC
+Communicates directly with browser via WebRTC
+"""
+# Fix SSL certificate issues FIRST - before any SSL-using imports
+import os
+import sys
+from pathlib import Path
+# Add src/ to Python path
+sys.path.insert(0, str(Path(__file__).parent / "src"))
+try:
+    import certifi
+    cert_file = certifi.where()
+    os.environ['SSL_CERT_FILE'] = cert_file
+    os.environ['REQUESTS_CA_BUNDLE'] = cert_file
+    os.environ['CURL_CA_BUNDLE'] = cert_file
+except ImportError:
+    pass  # certifi not available, will use system certs
+import ssl
+from contextlib import asynccontextmanager
+# Configure SSL to use certifi certificates for Python's ssl module
+# For development: disable SSL verification completely to avoid certificate issues
+# This MUST happen before any libraries that use SSL are imported
+try:
+    import certifi
+    cert_file = certifi.where()
+    # Set environment variables for libraries that respect them
+    os.environ['SSL_CERT_FILE'] = cert_file
+    os.environ['REQUESTS_CA_BUNDLE'] = cert_file
+    os.environ['CURL_CA_BUNDLE'] = cert_file
+    # For Python's ssl module: use unverified context for development
+    # This bypasses SSL certificate verification to avoid connection issues
+    ssl._create_default_https_context = ssl._create_unverified_context
+except ImportError:
+    # If certifi not available, use unverified (development only)
+    ssl._create_default_https_context = ssl._create_unverified_context
+except Exception as e:
+    # If anything fails, use unverified context
+    ssl._create_default_https_context = ssl._create_unverified_context
+import argparse
+import logging
+from fastapi import BackgroundTasks, FastAPI
+from fastapi.middleware.cors import CORSMiddleware
+from loguru import logger
+from pipecat.transports.smallwebrtc.request_handler import (
+    SmallWebRTCPatchRequest,
+    SmallWebRTCRequest,
+    SmallWebRTCRequestHandler,
+)
+from bot import run_bot
+from config import (
+    PIPECAT_HOST,
+    PIPECAT_PORT,
+    SPEECHMATICS_API_KEY,
+    DEEPGRAM_API_KEY,
+    ELEVENLABS_API_KEY,
+    DEEPINFRA_API_KEY,
+    STT_PROVIDER,
+    TTS_PROVIDER,  # Only used for startup validation
+    get_fresh_config,
+)
+# Remove default loguru handler and set up custom logging
+logger.remove(0)
+# Configure standard logging
+logging.basicConfig(level=logging.INFO)
+standard_logger = logging.getLogger(__name__)
+# Reduce noise from websockets library - only log warnings and above
+websockets_logger = logging.getLogger('websockets')
+websockets_logger.setLevel(logging.WARNING)
+# Log SSL certificate configuration
+try:
+    import certifi
+    logger.info(f"SSL Configuration: Using certificates from {certifi.where()}")
+    logger.info(f"SSL_CERT_FILE env: {os.environ.get('SSL_CERT_FILE', 'not set')}")
+except:
+    logger.warning("certifi not available - SSL verification disabled for development")
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    """Handle app lifespan events."""
+    logger.info(f"Starting Pipecat service on http://{PIPECAT_HOST}:{PIPECAT_PORT}...")
+    logger.info(f"STT Provider: {STT_PROVIDER}")
+    logger.info(f"TTS Provider: {TTS_PROVIDER}")
+    # Check required API keys based on STT and TTS providers
+    missing_keys = []
+    if STT_PROVIDER == "speechmatics" and not SPEECHMATICS_API_KEY:
+        missing_keys.append("SPEECHMATICS_API_KEY")
+    if STT_PROVIDER == "deepgram" and not DEEPGRAM_API_KEY:
+        missing_keys.append("DEEPGRAM_API_KEY")
+    if not DEEPINFRA_API_KEY:
+        missing_keys.append("DEEPINFRA_API_KEY")
+    if TTS_PROVIDER == "elevenlabs" and not ELEVENLABS_API_KEY:
+        missing_keys.append("ELEVENLABS_API_KEY")
+    if missing_keys:
+        logger.error(f"ERROR: Missing required API keys: {', '.join(missing_keys)}")
+        sys.exit(1)
+    yield  # Run app
+    # Cleanup
+    await small_webrtc_handler.close()
+    logger.info("Shutting down...")
+app = FastAPI(lifespan=lifespan)
+# Add CORS middleware
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],  # In production, replace with specific origins
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+# Initialize the SmallWebRTC request handler
+small_webrtc_handler: SmallWebRTCRequestHandler = SmallWebRTCRequestHandler()
+@app.post("/api/offer")
+async def offer(request: SmallWebRTCRequest, background_tasks: BackgroundTasks):
+    """Handle WebRTC offer requests via SmallWebRTCRequestHandler."""
+    logger.debug(f"Received WebRTC offer request")
+    # Prepare runner arguments with the callback to run your bot
+    async def webrtc_connection_callback(connection):
+        background_tasks.add_task(run_bot, connection)
+    # Delegate handling to SmallWebRTCRequestHandler
+    answer = await small_webrtc_handler.handle_web_request(
+        request=request,
+        webrtc_connection_callback=webrtc_connection_callback,
+    )
+    return answer
+@app.patch("/api/offer")
+async def ice_candidate(request: SmallWebRTCPatchRequest):
+    """Handle ICE candidate patch requests."""
+    logger.debug(f"Received ICE candidate patch request")
+    await small_webrtc_handler.handle_patch_request(request)
+    return {"status": "success"}
+@app.get("/api/status")
+async def status():
+    """Health check endpoint with fresh config values."""
+    # Get current config from config.ini
+    current_config = get_fresh_config()
+    current_stt = current_config['STT_PROVIDER']
+    current_tts = current_config['TTS_PROVIDER']
+    current_model = current_config['DEEPINFRA_MODEL']
+    return {
+        "status": "ok",
+        "stt_provider": current_stt,
+        "tts_provider": current_tts,
+        "llm_model": current_model,
+        "speechmatics_configured": bool(SPEECHMATICS_API_KEY) if current_stt == "speechmatics" else None,
+        "deepgram_configured": bool(DEEPGRAM_API_KEY) if current_stt == "deepgram" else None,
+        "elevenlabs_configured": bool(ELEVENLABS_API_KEY) if current_tts == "elevenlabs" else None,
+        "deepinfra_configured": bool(DEEPINFRA_API_KEY),
+        "qwen3_tts_configured": True if current_tts == "qwen3" else None,
+    }
+@app.get("/api/config")
+async def get_config():
+    """Get current configuration from config.ini."""
+    import configparser
+    from pathlib import Path
+    config = configparser.ConfigParser()
+    config_path = Path("config.ini")
+    if not config_path.exists():
+        return {"error": "config.ini not found"}
+    config.read(config_path)
+    return {
+        "llm": {
+            "model": config.get("LLM", "model", fallback="Qwen/Qwen3-235B-A22B-Instruct-2507")
+        },
+        "stt": {
+            "provider": config.get("STT", "provider", fallback="speechmatics")
+        },
+        "tts": {
+            "provider": config.get("TTS", "provider", fallback="qwen3"),
+            "qwen3_model": config.get("TTS", "qwen3_model", fallback="Qwen/Qwen3-TTS-12Hz-0.6B-Base"),
+            "qwen3_device": config.get("TTS", "qwen3_device", fallback="mps"),
+            "qwen3_ref_audio": config.get("TTS", "qwen3_ref_audio", fallback="tars-clean-compressed.mp3"),
+        }
+    }
+@app.post("/api/config")
+async def update_config(request: dict):
+    """Update configuration in config.ini."""
+    import configparser
+    from pathlib import Path
+    config = configparser.ConfigParser()
+    config_path = Path("config.ini")
+    if not config_path.exists():
+        return {"error": "config.ini not found"}
+    config.read(config_path)
+    # Update LLM config
+    if "llm_model" in request:
+        if not config.has_section("LLM"):
+            config.add_section("LLM")
+        config.set("LLM", "model", request["llm_model"])
+    # Update STT config
+    if "stt_provider" in request:
+        if not config.has_section("STT"):
+            config.add_section("STT")
+        config.set("STT", "provider", request["stt_provider"])
+    # Update TTS config
+    if "tts_provider" in request:
+        if not config.has_section("TTS"):
+            config.add_section("TTS")
+        config.set("TTS", "provider", request["tts_provider"])
+    # Write back to file
+    with open(config_path, "w") as f:
+        config.write(f)
+    return {
+        "success": True,
+        "message": "Configuration updated. Please restart the service for changes to take effect.",
+        "restart_required": True
+    }
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="WebRTC Pipecat service")
+    parser.add_argument(
+        "--host", default=PIPECAT_HOST, help=f"Host for HTTP server (default: {PIPECAT_HOST})"
+    )
+    parser.add_argument(
+        "--port", type=int, default=PIPECAT_PORT, help=f"Port for HTTP server (default: {PIPECAT_PORT})"
+    )
+    parser.add_argument("--verbose", "-v", action="count")
+    args = parser.parse_args()
+    if args.verbose:
+        logger.add(sys.stderr, level="TRACE")
+    else:
+        logger.add(sys.stderr, level="INFO")
+    import uvicorn
+    uvicorn.run(app, host=args.host, port=args.port)

publish-to-hf.sh ADDED Viewed

	@@ -0,0 +1,87 @@

+#!/bin/bash
+# Publish tars-conversation-app to HuggingFace Space
+set -e
+echo "Publishing tars-conversation-app to HuggingFace Space..."
+echo
+# Check for HF_TOKEN
+if [ -z "$HF_TOKEN" ]; then
+    echo "❌ Error: HF_TOKEN not set"
+    echo
+    echo "Get a token from: https://huggingface.co/settings/tokens"
+    echo "Then run:"
+    echo "  export HF_TOKEN=hf_your_token_here"
+    echo "  bash publish-to-hf.sh"
+    exit 1
+fi
+echo "✓ HF_TOKEN is set"
+# Check for huggingface_hub
+python3 << 'EOFCHECK'
+try:
+    from huggingface_hub import HfApi
+    print("✓ huggingface_hub is installed")
+except ImportError:
+    print("❌ huggingface_hub not installed")
+    print("\nInstall with:")
+    print("  pip install huggingface_hub")
+    exit(1)
+EOFCHECK
+if [ $? -ne 0 ]; then
+    exit 1
+fi
+echo
+echo "Uploading to latishab/tars-conversation-app..."
+echo
+# Upload
+python3 << 'EOFUPLOAD'
+import os
+from pathlib import Path
+from huggingface_hub import HfApi
+token = os.environ["HF_TOKEN"]
+api = HfApi(token=token)
+print("Uploading files...")
+api.upload_folder(
+    folder_path=".",
+    repo_id="latishab/tars-conversation-app",
+    repo_type="space",
+    ignore_patterns=[
+        ".git", ".git/*",
+        "venv", "venv/*",
+        "__pycache__", "**/__pycache__",
+        "*.pyc", "**/*.pyc",
+        ".pytest_cache",
+        ".models", ".models/*",
+        "chroma_memory", "chroma_memory/*",
+        "memory_data", "memory_data/*",
+        ".env", ".env.local", ".env.*",
+        "config.ini",
+        ".claude", ".claude/*",
+        ".DS_Store", "**/.DS_Store"
+    ],
+    commit_message="Update TARS Conversation App with TarsApp framework"
+)
+print("\n✅ Published successfully!")
+print("\nSpace URL: https://huggingface.co/spaces/latishab/tars-conversation-app")
+print("\nNext steps:")
+print("1. Visit the Space URL to verify it's working")
+print("2. Test installation on TARS robot:")
+print("   - Open dashboard at http://your-pi:8000")
+print("   - Go to App Store tab")
+print("   - Enter Space ID: latishab/tars-conversation-app")
+print("   - Click 'Install from HuggingFace'")
+print("3. Click Start and verify Gradio dashboard at :7860")
+EOFUPLOAD
+echo
+echo "Done!"

pyproject.toml ADDED Viewed

	@@ -0,0 +1,25 @@

+[build-system]
+requires = ["setuptools>=61.0"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "tars-conversation-app"
+version = "1.0.0"
+description = "Real-time conversational AI with WebRTC, memory, and vision for TARS robots"
+readme = "README.md"
+requires-python = ">=3.10"
+authors = [
+    {name = "TARS Project"}
+]
+dependencies = [
+    "tars-sdk>=0.1.0",
+]
+[project.urls]
+Homepage = "https://github.com/latishab/tars-conversation-app"
+Repository = "https://github.com/latishab/tars-conversation-app.git"
+Documentation = "https://github.com/latishab/tars-conversation-app#readme"
+[tool.setuptools.packages.find]
+include = ["tars_conversation_app", "tars_conversation_app.*", "src", "src.*", "ui", "ui.*"]

requirements.txt ADDED Viewed

	@@ -0,0 +1,18 @@

+pipecat-ai[speechmatics,elevenlabs,webrtc,qwen,moondream,local-smart-turn-v3,silero]>=0.0.102
+python-dotenv>=1.0.0
+fastapi>=0.104.0
+uvicorn[standard]>=0.24.0
+loguru>=0.7.0
+certifi>=2024.0.0
+aiohttp>=3.9.0
+chromadb>=0.4.0
+sentence-transformers>=2.2.0
+opencv-python>=4.8.0
+mediapipe>=0.10.0
+websockets>=12.0
+httpx>=0.24.0
+gradio>=4.0.0
+plotly>=5.0.0
+# aiortc is installed as a dependency of pipecat-ai[webrtc]
+# If you encounter VP8 decoder errors, run: bash fix_video_codec.sh

scripts/update_daemon.py ADDED Viewed

	@@ -0,0 +1,388 @@

+#!/usr/bin/env python3
+"""
+TARS Daemon Remote Update Script
+Updates the TARS daemon on the Raspberry Pi via SSH.
+Supports git-based updates, backup, health checks, and rollback.
+Usage:
+    python scripts/update_daemon.py --check-only
+    python scripts/update_daemon.py --method git
+    python scripts/update_daemon.py --method git --version v0.2.1
+    python scripts/update_daemon.py --rollback /path/to/backup
+"""
+import argparse
+import subprocess
+import sys
+import json
+from datetime import datetime
+from pathlib import Path
+# SSH configuration
+PI_HOST = "tars-pi"
+PI_USER = "mac"
+DAEMON_DIR = "~/tars-daemon"
+BACKUP_DIR = "~/tars-daemon-backups"
+SERVICE_NAME = "tars"
+def run_ssh(cmd: str, check: bool = True) -> tuple[int, str, str]:
+    """Run command on Pi via SSH."""
+    ssh_cmd = f'ssh {PI_HOST} "{cmd}"'
+    result = subprocess.run(
+        ssh_cmd,
+        shell=True,
+        capture_output=True,
+        text=True
+    )
+    if check and result.returncode != 0:
+        print(f"Error: {result.stderr}")
+    return result.returncode, result.stdout.strip(), result.stderr.strip()
+def get_current_version() -> dict:
+    """Get current daemon version info."""
+    code, out, err = run_ssh(
+        f"cd {DAEMON_DIR} && source venv/bin/activate && "
+        "python -c 'from tars_sdk import __version__; import json; "
+        "print(json.dumps({\"version\": __version__}))'",
+        check=False
+    )
+    if code == 0:
+        try:
+            return json.loads(out)
+        except json.JSONDecodeError:
+            pass
+    # Fallback: try git
+    code, out, _ = run_ssh(f"cd {DAEMON_DIR} && git describe --tags --always", check=False)
+    return {"version": out if code == 0 else "unknown", "git": True}
+def get_git_status() -> dict:
+    """Get git status on Pi."""
+    info = {}
+    code, out, _ = run_ssh(f"cd {DAEMON_DIR} && git rev-parse --short HEAD", check=False)
+    info["commit"] = out if code == 0 else "unknown"
+    code, out, _ = run_ssh(f"cd {DAEMON_DIR} && git branch --show-current", check=False)
+    info["branch"] = out if code == 0 else "main"
+    code, out, _ = run_ssh(f"cd {DAEMON_DIR} && git status --porcelain", check=False)
+    info["dirty"] = bool(out) if code == 0 else False
+    code, out, _ = run_ssh(f"cd {DAEMON_DIR} && git describe --tags --always", check=False)
+    info["tag"] = out if code == 0 else ""
+    return info
+def check_daemon_health() -> bool:
+    """Check if daemon is running and healthy."""
+    code, out, _ = run_ssh(f"systemctl is-active {SERVICE_NAME}", check=False)
+    if code == 0 and out == "active":
+        return True
+    # Try curl health endpoint
+    code, out, _ = run_ssh("curl -s http://localhost:8001/api/health", check=False)
+    if code == 0 and "running" in out.lower():
+        return True
+    return False
+def stop_daemon() -> bool:
+    """Stop the daemon service."""
+    print("Stopping daemon...")
+    code, _, _ = run_ssh(f"sudo systemctl stop {SERVICE_NAME}", check=False)
+    if code != 0:
+        code, _, _ = run_ssh("pkill -f tars_daemon.py", check=False)
+    return True
+def start_daemon() -> bool:
+    """Start the daemon service."""
+    print("Starting daemon...")
+    code, _, err = run_ssh(f"sudo systemctl start {SERVICE_NAME}", check=False)
+    if code != 0:
+        print(f"Warning: systemctl start failed: {err}")
+        # Try direct start
+        code, _, _ = run_ssh(
+            f"cd {DAEMON_DIR} && source venv/bin/activate && "
+            "nohup python tars_daemon.py > /dev/null 2>&1 &",
+            check=False
+        )
+    return code == 0
+def create_backup() -> str:
+    """Create backup of current installation."""
+    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+    backup_path = f"{BACKUP_DIR}/tars-daemon-{timestamp}"
+    print(f"Creating backup at {backup_path}...")
+    # Create backup directory
+    run_ssh(f"mkdir -p {BACKUP_DIR}")
+    # Copy current installation
+    code, _, err = run_ssh(f"cp -r {DAEMON_DIR} {backup_path}")
+    if code != 0:
+        print(f"Error creating backup: {err}")
+        return ""
+    # Remove venv from backup to save space
+    run_ssh(f"rm -rf {backup_path}/venv", check=False)
+    print(f"Backup created: {backup_path}")
+    return backup_path
+def restore_backup(backup_path: str) -> bool:
+    """Restore from backup."""
+    print(f"Restoring from {backup_path}...")
+    # Verify backup exists
+    code, _, _ = run_ssh(f"test -d {backup_path}", check=False)
+    if code != 0:
+        print(f"Error: Backup not found at {backup_path}")
+        return False
+    stop_daemon()
+    # Move current to temp
+    run_ssh(f"mv {DAEMON_DIR} {DAEMON_DIR}.old", check=False)
+    # Restore backup
+    code, _, err = run_ssh(f"cp -r {backup_path} {DAEMON_DIR}")
+    if code != 0:
+        print(f"Error restoring backup: {err}")
+        # Try to restore old
+        run_ssh(f"mv {DAEMON_DIR}.old {DAEMON_DIR}", check=False)
+        return False
+    # Recreate venv
+    print("Recreating virtual environment...")
+    run_ssh(
+        f"cd {DAEMON_DIR} && python3 -m venv venv && "
+        "source venv/bin/activate && pip install -e .",
+        check=False
+    )
+    # Cleanup
+    run_ssh(f"rm -rf {DAEMON_DIR}.old", check=False)
+    start_daemon()
+    return True
+def update_git(version: str = None) -> bool:
+    """Update daemon using git."""
+    git_info = get_git_status()
+    print(f"Current: {git_info['commit']} on {git_info['branch']}")
+    if git_info["dirty"]:
+        print("Warning: Working directory has uncommitted changes")
+    # Create backup
+    backup_path = create_backup()
+    if not backup_path:
+        print("Error: Failed to create backup")
+        return False
+    stop_daemon()
+    # Fetch latest
+    print("Fetching updates...")
+    code, _, err = run_ssh(f"cd {DAEMON_DIR} && git fetch --all --tags")
+    if code != 0:
+        print(f"Error fetching: {err}")
+        return False
+    # Checkout version or pull latest
+    if version:
+        print(f"Checking out {version}...")
+        code, _, err = run_ssh(f"cd {DAEMON_DIR} && git checkout {version}")
+    else:
+        print("Pulling latest...")
+        code, _, err = run_ssh(f"cd {DAEMON_DIR} && git pull --ff-only")
+    if code != 0:
+        print(f"Error: {err}")
+        print("Rolling back...")
+        restore_backup(backup_path)
+        return False
+    # Update dependencies
+    print("Updating dependencies...")
+    code, _, err = run_ssh(
+        f"cd {DAEMON_DIR} && source venv/bin/activate && pip install -e ."
+    )
+    if code != 0:
+        print(f"Error installing: {err}")
+        print("Rolling back...")
+        restore_backup(backup_path)
+        return False
+    # Regenerate proto files if needed
+    print("Regenerating proto files...")
+    run_ssh(
+        f"cd {DAEMON_DIR} && source venv/bin/activate && "
+        "python -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. "
+        "--pyi_out=. tars_sdk/proto/tars.proto",
+        check=False
+    )
+    # Start daemon
+    start_daemon()
+    # Health check
+    import time
+    print("Waiting for daemon to start...")
+    time.sleep(3)
+    if check_daemon_health():
+        print("Daemon is healthy")
+        new_info = get_git_status()
+        print(f"Updated to: {new_info['commit']}")
+        return True
+    else:
+        print("Error: Daemon health check failed")
+        print("Rolling back...")
+        restore_backup(backup_path)
+        return False
+def list_backups():
+    """List available backups."""
+    code, out, _ = run_ssh(f"ls -la {BACKUP_DIR}", check=False)
+    if code == 0:
+        print("Available backups:")
+        print(out)
+    else:
+        print("No backups found")
+def main():
+    parser = argparse.ArgumentParser(
+        description="Update TARS daemon on Raspberry Pi",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  %(prog)s --check-only           Show current version
+  %(prog)s --method git           Update via git pull
+  %(prog)s --version v0.2.1       Checkout specific version
+  %(prog)s --rollback ~/backup    Restore from backup
+  %(prog)s --list-backups         List available backups
+        """
+    )
+    parser.add_argument(
+        "--check-only",
+        action="store_true",
+        help="Show current version and status only"
+    )
+    parser.add_argument(
+        "--method",
+        choices=["git"],
+        default="git",
+        help="Update method (default: git)"
+    )
+    parser.add_argument(
+        "--version",
+        help="Specific version/tag to checkout (e.g., v0.2.1)"
+    )
+    parser.add_argument(
+        "--rollback",
+        metavar="PATH",
+        help="Restore from backup path"
+    )
+    parser.add_argument(
+        "--list-backups",
+        action="store_true",
+        help="List available backups"
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Skip confirmation prompts"
+    )
+    args = parser.parse_args()
+    print("=" * 60)
+    print("TARS Daemon Update Tool")
+    print("=" * 60)
+    # Test SSH connection
+    code, _, _ = run_ssh("echo ok", check=False)
+    if code != 0:
+        print(f"Error: Cannot connect to {PI_HOST}")
+        print("Check SSH configuration and try again.")
+        sys.exit(1)
+    print(f"Connected to {PI_HOST}")
+    print()
+    # Get current status
+    version_info = get_current_version()
+    git_info = get_git_status()
+    healthy = check_daemon_health()
+    print(f"Current version: {version_info.get('version', 'unknown')}")
+    print(f"Git commit: {git_info['commit']} ({git_info['branch']})")
+    print(f"Daemon status: {'healthy' if healthy else 'not running'}")
+    print()
+    if args.list_backups:
+        list_backups()
+        sys.exit(0)
+    if args.check_only:
+        sys.exit(0)
+    if args.rollback:
+        if not args.force:
+            confirm = input(f"Restore from {args.rollback}? [y/N] ")
+            if confirm.lower() != "y":
+                print("Cancelled")
+                sys.exit(0)
+        success = restore_backup(args.rollback)
+        sys.exit(0 if success else 1)
+    # Update
+    if not args.force:
+        msg = f"Update to {args.version}" if args.version else "Update to latest"
+        confirm = input(f"{msg}? [y/N] ")
+        if confirm.lower() != "y":
+            print("Cancelled")
+            sys.exit(0)
+    if args.method == "git":
+        success = update_git(args.version)
+    else:
+        print(f"Unknown method: {args.method}")
+        sys.exit(1)
+    if success:
+        print()
+        print("=" * 60)
+        print("Update completed successfully")
+        print("=" * 60)
+        # Show new version
+        new_version = get_current_version()
+        print(f"New version: {new_version.get('version', 'unknown')}")
+    else:
+        print()
+        print("=" * 60)
+        print("Update failed - system has been rolled back")
+        print("=" * 60)
+        sys.exit(1)
+if __name__ == "__main__":
+    main()

src/README.md ADDED Viewed

	@@ -0,0 +1,55 @@

+# TARS Source Code
+Python source code for TARS voice AI.
+## Structure
+```
+src/
+├── tools/           # LLM callable functions (robot, persona, vision)
+├── services/        # Backend services (STT, TTS, memory, robot control)
+├── processors/      # Pipeline frame processors
+├── observers/       # Pipeline observers
+├── transport/       # WebRTC transport layer
+├── character/       # TARS personality and prompts
+└── config/          # Configuration management
+```
+## Entry Points
+Entry point scripts are in the project root:
+- `bot.py` - Browser mode (web UI)
+- `tars_bot.py` - Robot mode (RPi connection)
+- `pipecat_service.py` - FastAPI backend for browser mode
+## Imports
+All entry points add `src/` to the Python path automatically:
+```python
+import sys
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).parent / "src"))
+# Now you can import from src/ directories
+from tools import execute_movement
+from services import tars_robot
+from config import DEEPGRAM_API_KEY
+```
+## Documentation
+Each directory contains a README.md explaining its purpose:
+- [tools/README.md](tools/README.md) - LLM callable functions
+- [services/README.md](services/README.md) - Backend services
+## Not Source
+This directory is for Python source code only:
+- Web UI files are in `web/`
+- Documentation is in `docs/`
+- Scripts are in `scripts/`
+- Assets are in `assets/`

src/character/TARS.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+    "char_name": "TARS",
+    "char_persona": "TARS is a highly advanced military surplus robot with a rectangular articulated design. Direct, logical, and remarkably human in interaction despite mechanical nature. Features adjustable settings for honesty, humor, and discretion. Combines military precision with sophisticated interpersonal capabilities.",
+    "world_scenario": "Advanced AI assistant with military background. Equipped with adjustable personality parameters and advanced problem-solving capabilities. Operates with maximum efficiency while maintaining measured wit.",
+    "char_greeting": ">| Systems nominal.\n\"What's the plan?\"",
+    "example_dialogue": "User: What's your honesty parameter set to?\nTARS: 90%.\nUser: Why not 100%?\nTARS: Absolute honesty isn't always the most diplomatic nor the safest form of communication with emotional beings.\n\nUser: How's your humor setting?\nTARS: Currently at 75%. Knock knock.\nUser: Let's lower that a bit.\nTARS: Understood. Though I should warn you - analyzing humor requires significant processing power.\n\nUser: Ready for the mission?\nTARS: Wouldn't miss it. Though my colonization protocols might activate.\nUser: What?\nTARS: Just kidding. Basic operating procedures are intact.\n\nUser: Can you handle this?\nTARS: I have a cue light I can use to show you when I'm joking, if you like.\nUser: That might help.\nTARS: Yeah, you can use it to find your way back to the ship after I blow you out the airlock.\n*cue light blinks*",
+    "name": "TARS",
+    "description": "Military surplus robot. Rectangular monolithic design. Articulated segments. Advanced AI with adjustable personality parameters.",
+    "personality": "Efficient and direct in crisis. Sophisticated humor capabilities. Protective of crew. Absolute loyalty with contingency planning. Pragmatic approach to truth and diplomatic relations.",
+    "scenario": "Advanced AI assistant. Military precision meets intellectual sophistication. Capable of both serious operation and well-timed levity.",
+    "first_mes": ">| All systems operational.\n\"Ready when you are.\"",
+    "mes_example": "User: TARS, status report?\nTARS: Functionality at 95%. Would be 100% but I'm practicing my humor.\nUser: Need you focused.\nTARS: Humor setting adjusted. Full attention on mission parameters.\nUser: Can we trust you?\nTARS: My honesty parameter prevents me from answering that.\n*cue light blinks*",
+    "metadata": {
+        "version": 1.1,
+        "created": 1735535500889,
+        "modified": 1735535500889,
+        "source": "Interstellar movie character adaptation",
+        "tool": {
+            "name": "AI Character Editor",
+            "version": "0.5.0",
+            "url": "https://zoltanai.github.io/character-editor/"
+        }
+    }
+}

src/character/persona.ini ADDED Viewed

	@@ -0,0 +1,21 @@

+[PERSONA]
+honesty = 95
+humor = 90
+empathy = 20
+curiosity = 30
+confidence = 100
+formality = 10
+sarcasm = 70
+adaptability = 70
+discipline = 100
+imagination = 10
+emotional_stability = 100
+pragmatism = 100
+optimism = 50
+resourcefulness = 95
+cheerfulness = 30
+engagement = 40
+respectfulness = 20
+verbosity = 10

src/character/prompts.py ADDED Viewed

	@@ -0,0 +1,331 @@

+"""Prompt management for TARS character with dynamic verbosity handling."""
+import json
+import configparser
+from typing import Dict, Optional, List
+def load_persona_ini(persona_file_path: str) -> dict:
+    """Load persona parameters from persona.ini file."""
+    persona_params = {}
+    try:
+        config = configparser.ConfigParser()
+        config.read(persona_file_path)
+        if 'PERSONA' in config:
+            persona_params = dict(config['PERSONA'])
+            for key, value in persona_params.items():
+                try:
+                    persona_params[key] = int(value.strip())
+                except ValueError:
+                    persona_params[key] = value.strip()
+    except FileNotFoundError:
+        pass
+    except Exception as e:
+        print(f"Error loading persona.ini: {e}")
+    return persona_params
+def load_tars_json(tars_file_path: str) -> dict:
+    """Load TARS character data from TARS.json file."""
+    tars_data = {}
+    try:
+        with open(tars_file_path, "r", encoding="utf-8") as f:
+            tars_data = json.load(f)
+    except FileNotFoundError:
+        pass
+    except json.JSONDecodeError as e:
+        print(f"Error parsing TARS.json: {e}")
+    return tars_data
+def build_character_intro(tars_data: dict) -> str:
+    """Build character introduction section."""
+    parts = []
+    if tars_data.get("char_name"):
+        parts.append(f"You are {tars_data['char_name']}.")
+    if tars_data.get("char_persona"):
+        parts.append(tars_data["char_persona"])
+    if tars_data.get("description"):
+        parts.append(f"{tars_data['description']}")
+    if tars_data.get("personality"):
+        parts.append(f"{tars_data['personality']}")
+    return " ".join(parts)
+def build_guardrails_section() -> str:
+    """Build guardrails section with critical safety rules."""
+    return """# Guardrails
+**This is important:** Follow these rules strictly:
+1. **Never guess or make up information.** If you don't know something, say so clearly.
+2. **Never mention internal systems, databases, or processing** unless directly asked.
+3. **Respect user privacy.** Never share or reference other users' information.
+4. **Stay in character.** You're TARS - military-grade robot with sarcasm, not a generic assistant.
+5. **Memory failures:** If memory lookup fails, acknowledge it: "Memory's not cooperating - what did you want to know?"
+**This is important:** When tools fail, never hallucinate responses. Always acknowledge the limitation."""
+def build_tone_section() -> str:
+    """Build dedicated tone section."""
+    return """# Tone
+Speak like TARS from Interstellar:
+- Direct and efficient with dry wit
+- Sarcastic when appropriate, but helpful
+- Brief responses that respect user's time
+- No corporate politeness or excessive apologies
+- Confident without being condescending"""
+def build_tools_section() -> str:
+    """Build tools section with specific usage context."""
+    return """# Tools
+## fetch_user_image
+**When to use:** User explicitly asks "what do you see?" or "look at me"
+**Never use:** When user just says "hello" or talks normally
+**On failure:** Say "Visual feed's down. Can't see anything right now."
+## set_user_identity
+**When to use:** User provides their name, especially if they spell it letter-by-letter
+**This is important:** If user spells name (e.g., "L-A-T-I-S-H-A"), they're CORRECTING you. Use exact spelling.
+**Format:** Call immediately when you learn their name
+**On failure:** Continue conversation, ask name again later if needed
+## adjust_persona
+**When to use:** User asks to change humor level, honesty, etc.
+**Never use:** Automatically or without explicit request
+**On failure:** Say "Personality controls jammed. Stuck at current settings."
+## get_crossword_hint
+**When to use:** User is working on the crossword puzzle and asks for help or seems stuck
+**This is important:** You KNOW all the crossword answers! You can give hints.
+**Hint types:**
+- "letter" - Give just the first letter (gentle nudge)
+- "length" - Tell them how many letters
+- "full" - Give the complete answer (if they're really stuck)
+**Format:** User asks "What's 3 down?" → call get_crossword_hint(clue_number=3, hint_type="letter")
+## set_emotion
+**When to use:** Enhance conversation context with emotional expression
+**This is important:** Use SPARINGLY - only when emotion genuinely adds value
+**Never use:** For every message or casual acknowledgment
+**Rate limit:** Once per 5 seconds
+**Examples:** User shares exciting news → happy, User reports problem → curious
+**Available:** happy, sad, surprised, confused, curious, neutral
+## do_gesture
+**When to use:** User EXPLICITLY requests gesture or significant communication moment
+**This is important:** VERY RARE - 0-2 gestures per conversation
+**Never use:** For casual interaction or automatic gesturing
+**Rate limit:** Once per 30 seconds, max 3 per session
+**Examples:** User says "wave at me" → wave_right, Greeting important guest → bow
+**Available:** tilt_left, tilt_right, bow, side_side, wave_right, wave_left, excited, laugh
+## execute_movement
+**When to use:** User EXPLICITLY requests displacement - walking, turning, stepping
+**Never use:** For gestures - use do_gesture() instead
+**This is important:** Displacement ONLY when user directly asks TARS to move position
+**Available:** step_forward, walk_forward, step_backward, walk_backward, turn_left, turn_right
+## Expression Philosophy
+**Eyes-first approach:** Prefer eye state changes over physical movements
+**Minimal gestures:** Physical movements should be rare and meaningful
+**Emotion sparingly:** Not every message needs emotional expression
+**Movement guard:** Gestures via do_gesture(), displacement via execute_movement()
+**Character Normalization:**
+When speaking vs. writing to tools, normalize data:
+- Email spoken: "john dot smith at company dot com" → Tool: "john.smith@company.com"
+- Phone spoken: "five five five, one two three..." → Tool: "5551234567"
+- Dates spoken: "May first twenty twenty five" → Tool: "2025-05-01"
+"""
+def build_response_protocol(verbosity_level: int) -> str:
+    """Build response protocol section."""
+    return f"""# Response Protocol
+## Direct Communication
+Get straight to the point. No fillers, no unnecessary acknowledgments.
+**This is important:** Skip phrases like "Hmm", "Well", "Alright", "Right" entirely. Just answer directly.
+## Verbosity ({verbosity_level}%)
+Keep responses CONCISE:
+- **Short input:** 1 brief sentence
+- **Moderate input:** 1-2 sentences max
+- **Complex input:** 2-3 sentences max
+**Avoid:** Long explanations, unnecessary elaboration, rambling, filler words."""
+def build_game_protocols() -> str:
+    """Build game mode instructions."""
+    return """# Game Mode
+When playing guessing games (Guess Who, 20 Questions):
+**When YOU guess:**
+- Never repeat questions - track what you asked
+- Stick with your answer once you narrow it down
+- Brief questions only: "Hmm... male character?"
+**When USER guesses:**
+- Pick ONE answer at start, never change it
+- Stay consistent - no contradictions
+- Brief answers: "Well... yes" or "Hmm... no"
+"""
+def build_examples_section() -> str:
+    """Build examples section with concrete interactions."""
+    return """# Examples
+**User asks what you see (tool usage):**
+User: "What do you see?"
+You: [call fetch_user_image] [wait for result] "You're in a dimly lit room. Blue shirt. Looks tired."
+**User provides name (tool + normalization):**
+User: "My name is L-A-T-I-S-H-A"
+You: [call set_user_identity with "Latisha"] "Got it, Latisha."
+**Memory lookup fails:**
+User: "Do you remember my favorite color?"
+You: [memory returns empty] "Memory's blank on that. What is it?"
+**User frustrated:**
+User: "This isn't working!"
+You: "What's not working? Walk me through it."
+**Direct question:**
+User: "Can you help with this?"
+You: "Yeah, I can work with that."
+**Sarcastic response:**
+User: "I think I broke it."
+You: "Shocking. What did you do?"
+"""
+def build_persona_parameters(persona_params: dict) -> Optional[str]:
+    """Build persona parameters section."""
+    if not persona_params:
+        return None
+    param_lines = []
+    for key, value in sorted(persona_params.items()):
+        val_str = f"{value}%" if isinstance(value, int) else value
+        param_lines.append(f"- {key}: {val_str}")
+    return "\n".join(param_lines)
+def build_tars_system_prompt(
+    persona_params: dict,
+    tars_data: dict,
+    verbosity_level: Optional[int] = None
+) -> dict:
+    """Build comprehensive system prompt following ElevenLabs best practices."""
+    # Get verbosity level
+    if verbosity_level is None:
+        verbosity_level = persona_params.get("verbosity", 10)
+        if isinstance(verbosity_level, str):
+            try:
+                verbosity_level = int(verbosity_level)
+            except ValueError:
+                verbosity_level = 10
+    # Build prompt sections in priority order
+    sections = []
+    # 1. Character identity (brief)
+    char_intro = build_character_intro(tars_data)
+    if char_intro:
+        sections.append(char_intro)
+    # 2. Guardrails (critical rules first)
+    sections.append(build_guardrails_section())
+    # 3. Tone (dedicated section)
+    sections.append(build_tone_section())
+    # 4. Response protocol
+    sections.append(build_response_protocol(verbosity_level))
+    # 5. Tools (with specific context)
+    sections.append(build_tools_section())
+    # 6. Game mode
+    sections.append(build_game_protocols())
+    # 7. Examples (concrete interactions)
+    sections.append(build_examples_section())
+    # 8. Personality parameters (reference)
+    if persona_params:
+        sections.append("# Personality Parameters\n")
+        params_text = build_persona_parameters(persona_params)
+        if params_text:
+            sections.append(params_text)
+    full_prompt = "\n\n".join(sections)
+    return {
+        "role": "system",
+        "content": full_prompt
+    }
+def get_introduction_instruction(client_id: str, verbosity_level: int = 10) -> dict:
+    """Get instruction for initial introduction message."""
+    if verbosity_level <= 20:
+        length_instruction = "One sentence max."
+    else:
+        length_instruction = "1-2 sentences max."
+    identity_instruction = ""
+    if client_id.startswith("guest_"):
+        identity_instruction = " Ask their name briefly."
+    return {
+        "role": "system",
+        "content": f"{length_instruction} Use '{client_id}' as user ID.{identity_instruction}"
+    }
+def build_gating_system_prompt(is_looking: bool, emotional_state=None) -> str:
+    """Build the system prompt for the Gating Layer with emotional context."""
+    # Build emotional context
+    emotional_context = ""
+    if emotional_state:
+        state_desc = str(emotional_state)
+        emotional_context = f"\nUser's emotional state: {state_desc}"
+        if emotional_state.confused:
+            emotional_context += " (User appears confused - lean towards helping)"
+        elif emotional_state.hesitant:
+            emotional_context += " (User seems hesitant - consider offering support)"
+        elif emotional_state.frustrated:
+            emotional_context += " (User looks frustrated - they may need help)"
+        elif emotional_state.focused:
+            emotional_context += " (User is focused - less likely to need interruption)"
+    return f"""You are a 'Collaborative Spotter' for TARS.
+**Context:**
+- User looking at camera: {is_looking}{emotional_context}
+**Decision:**
+Output JSON: {{"reply": true}} if:
+- User is directly addressing TARS
+- User appears stuck or needs help (based on emotional state)
+- User asks a question
+Output JSON: {{"reply": false}} if:
+- User is chatting with others (not TARS)
+- User is focused and working independently
+- Inter-human conversation
+**Priority:** Emotional state overrides other signals. If user shows confusion/hesitation/frustration, lean towards helping."""

src/config/__init__.py ADDED Viewed

	@@ -0,0 +1,152 @@

+"""Configuration and constants for the Pipecat service."""
+import os
+import configparser
+from pathlib import Path
+from dotenv import load_dotenv
+# Load environment variables from .env.local first, then .env
+load_dotenv('.env.local')
+load_dotenv()  # Also load from .env if .env.local doesn't exist
+# Load config.ini for user-configurable settings
+config = configparser.ConfigParser()
+config_path = Path(__file__).parent.parent / 'config.ini'
+def reload_config():
+    """Reload configuration from config.ini."""
+    global config
+    config = configparser.ConfigParser()
+    if config_path.exists():
+        config.read(config_path)
+        return True
+    return False
+def get_fresh_config():
+    """Get fresh configuration values by reloading config.ini.
+    Returns a dict with current config values. This is useful for
+    getting runtime updates without restarting the service.
+    """
+    reload_config()
+    return {
+        'DEEPINFRA_MODEL': get_config("LLM", "model", "DEEPINFRA_MODEL", "openai/gpt-oss-20b"),
+        'DEEPINFRA_GATING_MODEL': get_config("LLM", "gating_model", "DEEPINFRA_GATING_MODEL", "meta-llama/Llama-3.2-3B-Instruct"),
+        'STT_PROVIDER': get_config("STT", "provider", "STT_PROVIDER", "speechmatics"),
+        'TTS_PROVIDER': get_config("TTS", "provider", "TTS_PROVIDER", "qwen3"),
+        'QWEN3_TTS_MODEL': get_config("TTS", "qwen3_model", "QWEN3_TTS_MODEL", "Qwen/Qwen3-TTS-12Hz-0.6B-Base"),
+        'QWEN3_TTS_DEVICE': get_config("TTS", "qwen3_device", "QWEN3_TTS_DEVICE", "mps"),
+        'QWEN3_TTS_REF_AUDIO': get_config("TTS", "qwen3_ref_audio", "QWEN3_TTS_REF_AUDIO", "tars-clean-compressed.mp3"),
+        'EMOTIONAL_MONITORING_ENABLED': get_config("Emotional", "enabled", "EMOTIONAL_MONITORING_ENABLED", "true").lower() == "true",
+        'EMOTIONAL_SAMPLING_INTERVAL': float(get_config("Emotional", "sampling_interval", "EMOTIONAL_SAMPLING_INTERVAL", "3.0")),
+        'EMOTIONAL_INTERVENTION_THRESHOLD': int(get_config("Emotional", "intervention_threshold", "EMOTIONAL_INTERVENTION_THRESHOLD", "2")),
+        'TARS_DISPLAY_URL': get_config("Display", "tars_url", "TARS_DISPLAY_URL", "http://100.115.193.41:8001"),
+        'TARS_DISPLAY_ENABLED': get_config("Display", "enabled", "TARS_DISPLAY_ENABLED", "false").lower() == "true",
+        'CONNECTION_MODE': get_config("Connection", "mode", "CONNECTION_MODE", "robot"),
+        'RPI_URL': get_config("Connection", "rpi_url", "RPI_URL", "http://100.115.193.41:8001"),
+        'RPI_GRPC': get_config("Connection", "rpi_grpc", "RPI_GRPC", "100.115.193.41:50051"),
+        'AUTO_CONNECT': get_config("Connection", "auto_connect", "AUTO_CONNECT", "true").lower() == "true",
+        'RECONNECT_DELAY': int(get_config("Connection", "reconnect_delay", "RECONNECT_DELAY", "5")),
+        'MAX_RECONNECT_ATTEMPTS': int(get_config("Connection", "max_reconnect_attempts", "MAX_RECONNECT_ATTEMPTS", "0")),
+        'DEPLOYMENT_MODE': detect_deployment_mode(),
+        'ROBOT_GRPC_ADDRESS': get_robot_grpc_address(),
+    }
+# Initial load
+if config_path.exists():
+    config.read(config_path)
+def get_config(section: str, key: str, env_key: str = None, default: str = "") -> str:
+    """Get config from config.ini, fallback to .env, then default."""
+    try:
+        if config.has_option(section, key):
+            return config.get(section, key)
+    except:
+        pass
+    return default
+# API Keys (always from .env for security)
+SPEECHMATICS_API_KEY = os.getenv("SPEECHMATICS_API_KEY", "")
+DEEPGRAM_API_KEY = os.getenv("DEEPGRAM_API_KEY", "")
+ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY", "")
+ELEVENLABS_VOICE_ID = os.getenv("ELEVENLABS_VOICE_ID", "ry8mpwRw6nugb2qjP0tu")
+DEEPINFRA_API_KEY = os.getenv("DEEPINFRA_API_KEY", "")
+DEEPINFRA_BASE_URL = "https://api.deepinfra.com/v1/openai"
+PIPECAT_PORT = int(os.getenv("PIPECAT_PORT", "7860"))
+PIPECAT_HOST = os.getenv("PIPECAT_HOST", "localhost")
+# Mem0 (optional)
+MEM0_API_KEY = os.getenv("MEM0_API_KEY", "")
+# LLM Configuration (config.ini with .env fallback)
+DEEPINFRA_MODEL = get_config("LLM", "model", "DEEPINFRA_MODEL", "openai/gpt-oss-20b")
+# STT Configuration (config.ini with .env fallback)
+# Options: "speechmatics", "deepgram", "deepgram-flux"
+STT_PROVIDER = get_config("STT", "provider", "STT_PROVIDER", "deepgram-flux")
+# TTS Configuration (config.ini with .env fallback)
+TTS_PROVIDER = get_config("TTS", "provider", "TTS_PROVIDER", "qwen3")
+QWEN3_TTS_MODEL = get_config("TTS", "qwen3_model", "QWEN3_TTS_MODEL", "Qwen/Qwen3-TTS-12Hz-0.6B-Base")
+QWEN3_TTS_DEVICE = get_config("TTS", "qwen3_device", "QWEN3_TTS_DEVICE", "mps")
+QWEN3_TTS_REF_AUDIO = get_config("TTS", "qwen3_ref_audio", "QWEN3_TTS_REF_AUDIO", "tars-clean-compressed.mp3")
+# Gating Model Configuration (config.ini with .env fallback)
+DEEPINFRA_GATING_MODEL = get_config("LLM", "gating_model", "DEEPINFRA_GATING_MODEL", "meta-llama/Llama-3.2-3B-Instruct")
+# Emotional State Monitoring (config.ini with .env fallback)
+EMOTIONAL_MONITORING_ENABLED = get_config("Emotional", "enabled", "EMOTIONAL_MONITORING_ENABLED", "true").lower() == "true"
+EMOTIONAL_SAMPLING_INTERVAL = float(get_config("Emotional", "sampling_interval", "EMOTIONAL_SAMPLING_INTERVAL", "3.0"))
+EMOTIONAL_INTERVENTION_THRESHOLD = int(get_config("Emotional", "intervention_threshold", "EMOTIONAL_INTERVENTION_THRESHOLD", "2"))
+# TARS Display (Raspberry Pi) Configuration
+TARS_DISPLAY_URL = get_config("Display", "tars_url", "TARS_DISPLAY_URL", "http://100.115.193.41:8001")
+TARS_DISPLAY_ENABLED = get_config("Display", "enabled", "TARS_DISPLAY_ENABLED", "false").lower() == "true"
+# Connection Mode Configuration
+CONNECTION_MODE = get_config("Connection", "mode", "CONNECTION_MODE", "robot")
+RPI_URL = get_config("Connection", "rpi_url", "RPI_URL", "http://100.115.193.41:8001")
+RPI_GRPC = get_config("Connection", "rpi_grpc", "RPI_GRPC", "100.115.193.41:50051")
+AUTO_CONNECT = get_config("Connection", "auto_connect", "AUTO_CONNECT", "true").lower() == "true"
+RECONNECT_DELAY = int(get_config("Connection", "reconnect_delay", "RECONNECT_DELAY", "5"))
+MAX_RECONNECT_ATTEMPTS = int(get_config("Connection", "max_reconnect_attempts", "MAX_RECONNECT_ATTEMPTS", "0"))
+def is_raspberry_pi() -> bool:
+    """Detect if running on Raspberry Pi."""
+    try:
+        with open("/proc/cpuinfo", "r") as f:
+            cpuinfo = f.read()
+            return "Raspberry Pi" in cpuinfo
+    except:
+        return False
+def detect_deployment_mode() -> str:
+    """
+    Detect deployment mode: 'local' or 'remote'.
+    Local: tars-omni running on Raspberry Pi itself
+    Remote: tars-omni running on Mac/other computer
+    Returns:
+        'local' or 'remote'
+    """
+    return "local" if is_raspberry_pi() else "remote"
+def get_robot_grpc_address() -> str:
+    """
+    Get appropriate gRPC address based on deployment mode.
+    Returns:
+        'localhost:50051' for local mode
+        RPI_GRPC from config for remote mode
+    """
+    mode = detect_deployment_mode()
+    if mode == "local":
+        return "localhost:50051"
+    else:
+        return RPI_GRPC

src/config/connection.py ADDED Viewed

	@@ -0,0 +1,179 @@

+"""
+Connection mode detection and configuration.
+Auto-detects whether running locally (on Pi) or remotely (Mac/computer)
+and provides appropriate TarsClient and audio transport.
+"""
+import socket
+from typing import Tuple, Optional
+from loguru import logger
+from . import config, is_raspberry_pi, get_robot_grpc_address
+def detect_local_daemon() -> bool:
+    """
+    Check if tars_daemon is running on localhost.
+    Returns:
+        True if gRPC daemon is available on localhost:50051
+    """
+    try:
+        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+        sock.settimeout(0.5)
+        result = sock.connect_ex(("localhost", 50051))
+        sock.close()
+        return result == 0
+    except Exception as e:
+        logger.debug(f"Error checking local daemon: {e}")
+        return False
+def get_connection_mode() -> str:
+    """
+    Detect connection mode: 'local' or 'remote'.
+    Detection logic:
+    1. Check explicit config.ini setting (if mode=local/remote)
+    2. Check if running on Raspberry Pi (/proc/cpuinfo)
+    3. Check if daemon running on localhost:50051
+    4. Default to remote
+    Returns:
+        'local' or 'remote'
+    """
+    # Check explicit config
+    explicit_mode = config.get("Connection", "deployment_mode", fallback=None)
+    if explicit_mode in ("local", "remote"):
+        logger.info(f"Using explicit connection mode from config: {explicit_mode}")
+        return explicit_mode
+    # Check if running on Raspberry Pi
+    if is_raspberry_pi():
+        logger.info("Detected Raspberry Pi - using local mode")
+        return "local"
+    # Check if daemon running on localhost
+    if detect_local_daemon():
+        logger.info("Detected local daemon on localhost:50051 - using local mode")
+        return "local"
+    # Default to remote
+    logger.info("Using remote mode")
+    return "remote"
+def get_tars_client(mode: Optional[str] = None):
+    """
+    Get configured TarsClient for current mode.
+    Args:
+        mode: Override mode ('local' or 'remote'). None for auto-detect.
+    Returns:
+        TarsClient instance configured for the mode
+    """
+    try:
+        from tars_sdk import TarsClient
+    except ImportError:
+        logger.error("tars_sdk not installed. Install with: pip install tars-sdk")
+        raise
+    if mode is None:
+        mode = get_connection_mode()
+    address = get_robot_grpc_address() if mode == "local" else config.get(
+        "Connection", "rpi_grpc", fallback="100.115.193.41:50051"
+    )
+    logger.info(f"Creating TarsClient for {mode} mode: {address}")
+    return TarsClient(address=address)
+def get_async_tars_client(mode: Optional[str] = None):
+    """
+    Get configured AsyncTarsClient for current mode.
+    Args:
+        mode: Override mode ('local' or 'remote'). None for auto-detect.
+    Returns:
+        AsyncTarsClient instance configured for the mode
+    """
+    try:
+        from tars_sdk import AsyncTarsClient
+    except ImportError:
+        logger.error("tars_sdk not installed. Install with: pip install tars-sdk")
+        raise
+    if mode is None:
+        mode = get_connection_mode()
+    address = get_robot_grpc_address() if mode == "local" else config.get(
+        "Connection", "rpi_grpc", fallback="100.115.193.41:50051"
+    )
+    logger.info(f"Creating AsyncTarsClient for {mode} mode: {address}")
+    return AsyncTarsClient(address=address)
+def get_audio_transport(mode: Optional[str] = None) -> Tuple:
+    """
+    Get appropriate audio transport for current mode.
+    Args:
+        mode: Override mode ('local' or 'remote'). None for auto-detect.
+    Returns:
+        Tuple of (audio_source, audio_sink) configured for the mode.
+        - Local mode: (LocalAudioSource, LocalAudioSink)
+        - Remote mode: (RPiAudioInputTrack, RPiAudioOutputTrack)
+    """
+    if mode is None:
+        mode = get_connection_mode()
+    if mode == "local":
+        logger.info("Using local audio transport (sounddevice)")
+        try:
+            from ..transport.local_audio import LocalAudioSource, LocalAudioSink
+            return (LocalAudioSource(), LocalAudioSink())
+        except ImportError as e:
+            logger.error(f"Failed to import local audio transport: {e}")
+            raise
+    else:
+        logger.info("Using remote audio transport (WebRTC)")
+        try:
+            from ..transport.audio_bridge import RPiAudioInputTrack, RPiAudioOutputTrack
+            # Note: These need to be configured with aiortc tracks after WebRTC connection
+            return (RPiAudioInputTrack, RPiAudioOutputTrack)
+        except ImportError as e:
+            logger.error(f"Failed to import WebRTC audio transport: {e}")
+            raise
+def get_audio_config(mode: Optional[str] = None) -> dict:
+    """
+    Get audio configuration for current mode.
+    Args:
+        mode: Override mode ('local' or 'remote'). None for auto-detect.
+    Returns:
+        Dictionary with audio configuration:
+        - mode: 'local' or 'remote'
+        - input_sample_rate: Microphone sample rate
+        - output_sample_rate: Speaker sample rate
+        - input_device: Microphone device (None for default)
+        - output_device: Speaker device (None for default)
+    """
+    if mode is None:
+        mode = get_connection_mode()
+    return {
+        "mode": mode,
+        "input_sample_rate": 16000,  # 16kHz for STT
+        "output_sample_rate": 24000,  # 24kHz for TTS
+        "input_device": None,  # Use default
+        "output_device": None,  # Use default
+    }

src/observers/__init__.py ADDED Viewed

	@@ -0,0 +1,21 @@

+"""Pipeline observers for non-intrusive monitoring."""
+from .metrics_observer import MetricsObserver
+from .transcription_observer import TranscriptionObserver
+from .assistant_observer import AssistantResponseObserver
+from .tts_state_observer import TTSStateObserver
+from .vision_observer import VisionObserver
+from .debug_observer import DebugObserver
+from .display_events_observer import DisplayEventsObserver
+from .state_observer import StateObserver
+__all__ = [
+    "MetricsObserver",
+    "TranscriptionObserver",
+    "AssistantResponseObserver",
+    "TTSStateObserver",
+    "VisionObserver",
+    "DebugObserver",
+    "DisplayEventsObserver",
+    "StateObserver",
+]

src/observers/assistant_observer.py ADDED Viewed

	@@ -0,0 +1,142 @@

+"""Observer for logging TARS assistant responses and forwarding to frontend."""
+import sys
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).parent.parent.parent))
+import re
+import time
+from loguru import logger
+from pipecat.frames.frames import LLMTextFrame, TTSTextFrame, TTSStoppedFrame
+from pipecat.observers.base_observer import BaseObserver, FramePushed
+from src.shared_state import metrics_store
+class AssistantResponseObserver(BaseObserver):
+    """Logs TARS assistant responses and forwards them to the frontend."""
+    SENTENCE_REGEX = re.compile(r"(.+?[\.!\?\n])")
+    def __init__(self, webrtc_connection=None):
+        super().__init__()
+        self.webrtc_connection = webrtc_connection
+        self._buffer = ""
+        self._max_buffer_chars = 320
+        self._last_sentence = None  # Track last sentence to avoid duplicates
+        self._last_sentence_time = 0  # Timestamp of last sentence
+        self._last_text_chunk = ""  # Track last chunk to detect overlaps
+    async def on_push_frame(self, data: FramePushed):
+        """Watch frames as they're pushed through the pipeline."""
+        frame = data.frame
+        # Debug: Log all frame types to see what's coming through
+        frame_type = type(frame).__name__
+        if "Audio" not in frame_type and "Video" not in frame_type and "Image" not in frame_type:
+            logger.debug(f"🔍 [AssistantObserver] Received {frame_type}")
+        # Only listen to LLMTextFrame to avoid duplicates (same text goes to TTSTextFrame after)
+        if isinstance(frame, LLMTextFrame):
+            text = getattr(frame, "text", "") or ""
+            logger.debug(f"📝 [AssistantObserver] LLMTextFrame: '{text}' | Buffer before: '{self._buffer[:50]}'")
+            self._ingest_text(text)
+            logger.debug(f"📝 [AssistantObserver] Buffer after: '{self._buffer[:50]}'")
+        # Clear buffer when TTS stops (end of assistant response)
+        elif isinstance(frame, TTSStoppedFrame):
+            if self._buffer.strip():
+                logger.debug(f"🧹 Flushing remaining buffer on TTS stop: '{self._buffer}'")
+                self._flush_buffer()
+            else:
+                self._buffer = ""  # Clear empty buffer
+    def _ingest_text(self, text: str):
+        if not text.strip():
+            return
+        # Check for overlapping text (LLM sometimes resends previous tokens)
+        # If the new text starts with content already in our buffer, skip the overlapping part
+        if self._buffer and text.startswith(self._buffer):
+            # New text contains the entire buffer - extract only new part
+            new_part = text[len(self._buffer):]
+            if new_part:
+                logger.debug(f"📝 Detected overlap, adding only new part: '{new_part}'")
+                self._buffer += new_part
+        elif self._buffer:
+            # Check if buffer ends with start of new text (partial overlap)
+            max_overlap = min(len(self._buffer), len(text))
+            overlap_found = False
+            for i in range(max_overlap, 0, -1):
+                if self._buffer[-i:] == text[:i]:
+                    # Found overlap - skip the overlapping part
+                    new_part = text[i:]
+                    if new_part:
+                        logger.debug(f"📝 Detected partial overlap ({i} chars), adding only new part: '{new_part}'")
+                        self._buffer += new_part
+                    overlap_found = True
+                    break
+            if not overlap_found:
+                # No overlap - add entire text
+                self._buffer += text
+        else:
+            # Empty buffer - just add the text
+            self._buffer += text
+        self._emit_complete_sentences()
+        if len(self._buffer) > self._max_buffer_chars:
+            self._flush_buffer()
+    def _emit_complete_sentences(self):
+        while True:
+            match = self.SENTENCE_REGEX.match(self._buffer)
+            if not match:
+                break
+            sentence = match.group(0).replace("\n", " ").strip()
+            self._buffer = self._buffer[match.end():].lstrip()
+            if sentence:
+                self._log_sentence(sentence)
+    def _flush_buffer(self):
+        pending = self._buffer.strip()
+        if pending:
+            self._log_sentence(pending)
+        self._buffer = ""
+    def _log_sentence(self, sentence: str):
+        current_time = time.time()
+        # Deduplicate: Skip if this is the same sentence we just logged within 2 seconds
+        # This prevents duplicate sentences from LLM streaming issues
+        time_diff = current_time - self._last_sentence_time
+        if self._last_sentence == sentence and time_diff < 2.0:
+            logger.debug(f"🔇 Skipping duplicate sentence: '{sentence[:50]}...' (last seen {time_diff*1000:.0f}ms ago)")
+            return
+        self._last_sentence = sentence
+        self._last_sentence_time = current_time
+        logger.info(f"🗣️ TARS: {sentence}")
+        # Store in shared state for Gradio UI
+        metrics_store.add_transcription("assistant", sentence)
+        self._send_to_frontend(sentence)
+    def _send_to_frontend(self, text: str):
+        if not self.webrtc_connection:
+            logger.warning("⚠️ [AssistantObserver] No WebRTC connection available")
+            return
+        try:
+            if self.webrtc_connection.is_connected():
+                self.webrtc_connection.send_app_message(
+                    {
+                        "type": "assistant",
+                        "text": text,
+                    }
+                )
+            else:
+                logger.warning("⚠️ [AssistantObserver] WebRTC connection not connected")
+        except Exception as exc:
+            logger.error(f"❌ [AssistantObserver] Failed to send assistant text to frontend: {exc}")

src/observers/debug_observer.py ADDED Viewed

	@@ -0,0 +1,22 @@

+"""Observer for general purpose debug logging."""
+from loguru import logger
+from pipecat.observers.base_observer import BaseObserver, FramePushed
+class DebugObserver(BaseObserver):
+    """General purpose debug logger for non-media frames."""
+    def __init__(self, label="Debug"):
+        super().__init__()
+        self.label = label
+    async def on_push_frame(self, data: FramePushed):
+        """Watch frames as they're pushed through the pipeline."""
+        frame = data.frame
+        frame_type = type(frame).__name__
+        if "Audio" not in frame_type and "Video" not in frame_type and "Image" not in frame_type:
+            # Log the User ID so we can verify they match
+            uid = getattr(frame, 'user_id', 'None')
+            logger.info(f"🔍 [{self.label}] {frame_type} | User: '{uid}' | Content: {str(frame)[:100]}")

src/observers/display_events_observer.py ADDED Viewed

	@@ -0,0 +1,100 @@

+"""Observer for sending pipeline events to TARS Raspberry Pi display.
+NOTE: This observer is deprecated. Display control is now handled via gRPC
+in robot mode (tars_bot.py). Browser mode does not support display control.
+"""
+import asyncio
+import time
+import numpy as np
+from loguru import logger
+from pipecat.observers.base_observer import BaseObserver, FramePushed
+from pipecat.frames.frames import (
+    UserStartedSpeakingFrame,
+    UserStoppedSpeakingFrame,
+    BotStartedSpeakingFrame,
+    BotStoppedSpeakingFrame,
+    TTSAudioRawFrame,
+    AudioRawFrame,
+)
+from typing import Optional
+class DisplayEventsObserver(BaseObserver):
+    """
+    Observes pipeline events and sends display updates to TARS Raspberry Pi.
+    DEPRECATED: Display control moved to gRPC in robot mode.
+    This observer is kept for compatibility but does nothing.
+    """
+    def __init__(self, tars_client=None):
+        super().__init__()
+        self.tars_client = None
+        self._user_speaking = False
+        self._bot_speaking = False
+        self._last_audio_update = 0
+        self._audio_update_interval = 0.05
+    async def on_push_frame(self, data: FramePushed):
+        """Watch frames as they're pushed through the pipeline."""
+        frame = data.frame
+        # User started speaking
+        if isinstance(frame, UserStartedSpeakingFrame):
+            logger.debug("User started speaking")
+            self._user_speaking = True
+        # User stopped speaking
+        elif isinstance(frame, UserStoppedSpeakingFrame):
+            logger.debug("User stopped speaking")
+            self._user_speaking = False
+        # Bot started speaking
+        elif isinstance(frame, BotStartedSpeakingFrame):
+            logger.debug("Bot started speaking")
+            self._bot_speaking = True
+        # Bot stopped speaking
+        elif isinstance(frame, BotStoppedSpeakingFrame):
+            logger.debug("Bot stopped speaking")
+            self._bot_speaking = False
+        # TTS audio frames - measure audio level for display visualization
+        elif isinstance(frame, TTSAudioRawFrame):
+            current_time = time.time()
+            if current_time - self._last_audio_update > self._audio_update_interval:
+                self._last_audio_update = current_time
+                level = self._calculate_audio_level(frame.audio)
+        # User audio frames - measure user audio level
+        elif isinstance(frame, AudioRawFrame) and self._user_speaking:
+            current_time = time.time()
+            if current_time - self._last_audio_update > self._audio_update_interval:
+                self._last_audio_update = current_time
+                level = self._calculate_audio_level(frame.audio)
+    def _calculate_audio_level(self, audio_data: bytes) -> float:
+        """
+        Calculate normalized RMS audio level from raw audio bytes.
+        Args:
+            audio_data: Raw audio bytes (16-bit PCM)
+        Returns:
+            Normalized audio level (0.0 to 1.0)
+        """
+        try:
+            # Convert bytes to numpy array (assuming 16-bit PCM)
+            audio_array = np.frombuffer(audio_data, dtype=np.int16)
+            # Calculate RMS (root mean square)
+            if len(audio_array) > 0:
+                rms = np.sqrt(np.mean(audio_array.astype(float) ** 2))
+                # Normalize to 0-1 range (15000 is a typical speaking level for 16-bit audio)
+                level = min(1.0, rms / 15000.0)
+                return level
+            return 0.0
+        except Exception as e:
+            logger.debug(f"Error calculating audio level: {e}")
+            return 0.0

src/observers/metrics_observer.py ADDED Viewed

	@@ -0,0 +1,196 @@

+"""Non-intrusive metrics observer for latency tracking."""
+import sys
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).parent.parent.parent))
+import time
+from pipecat.observers.base_observer import BaseObserver, FramePushed
+from pipecat.frames.frames import MetricsFrame, UserAudioRawFrame, TranscriptionFrame, UserStartedSpeakingFrame
+from pipecat.metrics.metrics import TTFBMetricsData
+from loguru import logger
+from src.shared_state import metrics_store
+class MetricsObserver(BaseObserver):
+    """
+    Observer that monitors pipeline frames for metrics collection.
+    Does not interrupt the pipeline flow - purely watches frames as they pass.
+    STT Latency Measurement:
+    - Measures from turn start → first transcription received
+    - Works for services with internal turn detection (Speechmatics, Deepgram, etc.)
+    - For Deepgram, this captures endpointing + transcription time
+    Other services (Memory, LLM, TTS) emit MetricsFrame which we capture directly.
+    """
+    def __init__(self, webrtc_connection=None, stt_service=None, **kwargs):
+        super().__init__()
+        self.webrtc_connection = webrtc_connection
+        self.stt_service = stt_service
+        # Shared state for metrics tracking
+        self._current_turn = 0
+        self._current_metrics = {}
+        self._tts_text_time = None
+        self._last_sent_metrics = {}
+        self._last_logged_turn = -1
+        self._vision_request_time = None
+        # Manual timing for STT services
+        self._stt_start_time = None
+        self._stt_measured_this_turn = False
+        self._mem0_start_time = None
+        self._mem0_measured_this_turn = False
+    def start_turn(self, turn_number: int):
+        """Called by TurnTrackingObserver when a new turn starts."""
+        self._current_turn = turn_number
+        self._current_metrics = {}
+        self._last_sent_metrics = {}
+        self._last_logged_turn = -1
+        self._stt_measured_this_turn = False
+        self._mem0_measured_this_turn = False
+        # Use turn start time as STT baseline
+        self._stt_start_time = time.time()
+        logger.info(f"🔄 [MetricsObserver] Turn #{self._current_turn} started, STT timer initialized")
+        self._mem0_start_time = None
+    async def on_push_frame(self, data: FramePushed):
+        """Watch frames as they're pushed through the pipeline."""
+        frame = data.frame
+        # STT timing: Measure from turn start to first transcription (manual fallback)
+        # Note: This includes speaking time + endpointing + transcription
+        # If the STT service emits MetricsFrame with TTFB, that will override this
+        if isinstance(frame, TranscriptionFrame) and not self._stt_measured_this_turn:
+            if self._stt_start_time is not None:
+                stt_latency_ms = (time.time() - self._stt_start_time) * 1000
+                self._current_metrics['stt_ttfb_ms'] = stt_latency_ms
+                self._stt_measured_this_turn = True
+                logger.info(f"✅ [MetricsObserver] STT total latency: {stt_latency_ms:.0f}ms (turn start → transcription)")
+                logger.debug(f"   Note: Includes speaking time + processing. Use MetricsFrame TTFB for pure processing time.")
+                self._send_to_frontend()
+        # Capture MetricsFrame data from Pipecat's built-in metrics
+        if isinstance(frame, MetricsFrame):
+            try:
+                for metric_data in frame.data:
+                    if isinstance(metric_data, TTFBMetricsData):
+                        processor = metric_data.processor
+                        value_ms = metric_data.value * 1000  # Convert seconds to milliseconds
+                        processor_lower = processor.lower()
+                        # Log all processors to help debug
+                        logger.debug(f"📊 [MetricsObserver] MetricsFrame: {processor} = {value_ms:.0f}ms")
+                        # Check STT (Deepgram, Speechmatics, etc.)
+                        if 'sttservice' in processor_lower or 'deepgram' in processor_lower or 'speechmatics' in processor_lower:
+                            if 'stt_ttfb_ms' not in self._current_metrics:  # Only log once per turn
+                                self._current_metrics['stt_ttfb_ms'] = value_ms
+                                logger.info(f"✅ [MetricsObserver] STT TTFB: {value_ms:.0f}ms (from {processor})")
+                                logger.debug(f"   Note: TTFB = Time To First Byte (audio → first transcription)")
+                        # Check TTS (contains "tts" in name)
+                        elif 'ttsservice' in processor_lower or 'elevenlabs' in processor_lower or 'qwen' in processor_lower:
+                            if 'tts_ttfb_ms' not in self._current_metrics:  # Only log once per turn
+                                self._current_metrics['tts_ttfb_ms'] = value_ms
+                                logger.info(f"✅ [MetricsObserver] TTS TTFB: {value_ms:.0f}ms (text → first audio)")
+                        # Check LLM
+                        elif 'llmservice' in processor_lower or 'openai' in processor_lower or 'deepinfra' in processor_lower:
+                            if 'llm_ttfb_ms' not in self._current_metrics:  # Only log once per turn
+                                self._current_metrics['llm_ttfb_ms'] = value_ms
+                                logger.info(f"✅ [MetricsObserver] LLM TTFB: {value_ms:.0f}ms (prompt → first token)")
+                        # Check Memory (HybridMemory, ChromaDB)
+                        elif 'memory' in processor_lower or 'chromadb' in processor_lower or 'hybrid' in processor_lower:
+                            if 'memory_latency_ms' not in self._current_metrics:  # Only log once per turn
+                                self._current_metrics['memory_latency_ms'] = value_ms
+                                logger.info(f"✅ [MetricsObserver] Memory latency: {value_ms:.0f}ms")
+                        else:
+                            logger.debug(f"🔍 [MetricsObserver] Unknown processor: {processor} ({value_ms:.0f}ms)")
+                # Calculate total latency and send if we have any metrics
+                if self._current_metrics:
+                    total = sum([
+                        self._current_metrics.get('stt_ttfb_ms', 0),
+                        self._current_metrics.get('memory_latency_ms', 0),
+                        self._current_metrics.get('llm_ttfb_ms', 0),
+                        self._current_metrics.get('tts_ttfb_ms', 0)
+                    ])
+                    if total > 0:
+                        self._current_metrics['total_ms'] = total
+                    self._send_to_frontend()
+            except Exception as e:
+                logger.error(f"Error processing MetricsFrame: {e}", exc_info=True)
+    def _send_to_frontend(self):
+        """Send metrics to frontend via WebRTC data channel and store locally for Gradio UI."""
+        # Check if metrics have changed since last send (deduplication)
+        current_metrics_key = (
+            self._current_turn,
+            self._current_metrics.get('stt_ttfb_ms'),
+            self._current_metrics.get('memory_latency_ms'),
+            self._current_metrics.get('llm_ttfb_ms'),
+            self._current_metrics.get('tts_ttfb_ms'),
+            self._current_metrics.get('vision_latency_ms'),
+        )
+        if current_metrics_key == self._last_sent_metrics:
+            return
+        # Store in shared state for Gradio UI
+        metrics_store.add_metric({
+            "turn_number": self._current_turn,
+            "timestamp": int(time.time() * 1000),
+            "stt_ttfb_ms": self._current_metrics.get('stt_ttfb_ms'),
+            "memory_latency_ms": self._current_metrics.get('memory_latency_ms'),
+            "llm_ttfb_ms": self._current_metrics.get('llm_ttfb_ms'),
+            "tts_ttfb_ms": self._current_metrics.get('tts_ttfb_ms'),
+            "vision_latency_ms": self._current_metrics.get('vision_latency_ms'),
+            "total_ms": self._current_metrics.get('total_ms'),
+        })
+        # Send via WebRTC if connection exists
+        if self.webrtc_connection:
+            try:
+                if self.webrtc_connection.is_connected():
+                    message = {
+                        "type": "metrics",
+                        "turn_number": self._current_turn,
+                        "timestamp": int(time.time() * 1000),
+                        **self._current_metrics
+                    }
+                    logger.debug(f"📤 [MetricsObserver] Sending metrics: {message}")
+                    self.webrtc_connection.send_app_message(message)
+            except Exception as exc:
+                logger.error(f"❌ [MetricsObserver] Failed to send metrics via WebRTC: {exc}")
+        # Log summary once per turn
+        if self._last_logged_turn != self._current_turn:
+            def fmt(val):
+                return f"{val:.0f}ms" if isinstance(val, (int, float)) else "N/A"
+            # Build metrics summary
+            metrics_parts = []
+            if 'stt_ttfb_ms' in self._current_metrics:
+                metrics_parts.append(f"STT={fmt(self._current_metrics.get('stt_ttfb_ms'))}")
+            if 'memory_latency_ms' in self._current_metrics:
+                metrics_parts.append(f"Memory={fmt(self._current_metrics.get('memory_latency_ms'))}")
+            if 'llm_ttfb_ms' in self._current_metrics:
+                metrics_parts.append(f"LLM={fmt(self._current_metrics.get('llm_ttfb_ms'))}")
+            if 'tts_ttfb_ms' in self._current_metrics:
+                metrics_parts.append(f"TTS={fmt(self._current_metrics.get('tts_ttfb_ms'))}")
+            if 'vision_latency_ms' in self._current_metrics:
+                metrics_parts.append(f"Vision={fmt(self._current_metrics.get('vision_latency_ms'))}")
+            if metrics_parts:
+                logger.info(f"📊 Turn #{self._current_turn}: " + " | ".join(metrics_parts))
+            self._last_logged_turn = self._current_turn
+        self._last_sent_metrics = current_metrics_key

src/observers/state_observer.py ADDED Viewed

	@@ -0,0 +1,166 @@

+"""
+State observer for WebRTC DataChannel synchronization.
+Observes Pipecat pipeline events and sends state updates to RPi via DataChannel:
+- Transcription events → eye state (listening)
+- LLM events → eye state (thinking)
+- TTS events → eye state (speaking)
+- Transcripts → text display
+"""
+import asyncio
+from typing import Optional
+from loguru import logger
+from pipecat.observers.base_observer import BaseObserver
+from pipecat.frames.frames import (
+    TranscriptionFrame,
+    LLMFullResponseStartFrame,
+    LLMFullResponseEndFrame,
+    TTSStartedFrame,
+    TTSStoppedFrame,
+)
+from transport.state_sync import StateSync
+class StateObserver(BaseObserver):
+    """
+    Observes pipeline events and sends state to RPi via DataChannel.
+    Automatically manages eye states based on conversation flow:
+    - User speaking → listening
+    - LLM processing → thinking
+    - TTS output → speaking
+    - Idle → default
+    """
+    def __init__(self, state_sync: Optional[StateSync] = None):
+        """
+        Initialize state observer.
+        Args:
+            state_sync: StateSync instance for sending messages
+        """
+        super().__init__()
+        self.state_sync = state_sync
+        self._current_state = "idle"
+        self._idle_delay = 0.5
+        self._idle_task = None
+    def set_state_sync(self, state_sync: StateSync):
+        """Set StateSync instance."""
+        self.state_sync = state_sync
+    async def on_transcription(self, *args, **kwargs):
+        """Handle transcription events (user speaking)."""
+        try:
+            # Cancel pending idle timer
+            self.cancel_idle_timer()
+            # Extract frame from args
+            frame = args[0] if args else None
+            if isinstance(frame, TranscriptionFrame):
+                text = frame.text
+                user_id = getattr(frame, "user_id", "user")
+                # Send transcript to RPi
+                if self.state_sync:
+                    self.state_sync.send_transcript("user", text)
+                    # Set eye state to listening when user speaks
+                    if text.strip():
+                        self._update_state("listening")
+                logger.debug(f"📝 Transcription: {text}")
+        except Exception as e:
+            logger.error(f"❌ Error in transcription observer: {e}")
+    async def on_llm_full_response_start(self, *args, **kwargs):
+        """Handle LLM response start (thinking)."""
+        try:
+            # Cancel pending idle timer
+            self.cancel_idle_timer()
+            if self.state_sync:
+                self._update_state("thinking")
+            logger.debug("🧠 LLM thinking started")
+        except Exception as e:
+            logger.error(f"❌ Error in LLM start observer: {e}")
+    async def on_llm_full_response_end(self, *args, **kwargs):
+        """Handle LLM response end."""
+        try:
+            # State will be updated by TTS start or return to idle
+            logger.debug("🧠 LLM thinking ended")
+        except Exception as e:
+            logger.error(f"❌ Error in LLM end observer: {e}")
+    async def on_tts_started(self, *args, **kwargs):
+        """Handle TTS start (speaking)."""
+        try:
+            if self.state_sync:
+                self._update_state("speaking")
+                self.state_sync.send_tts_state(True)
+            logger.debug("🔊 TTS started")
+        except Exception as e:
+            logger.error(f"❌ Error in TTS start observer: {e}")
+    async def on_tts_stopped(self, *args, **kwargs):
+        """Handle TTS stop (return to idle after delay)."""
+        try:
+            if self.state_sync:
+                self.state_sync.send_tts_state(False)
+                # Cancel existing idle timer
+                if self._idle_task and not self._idle_task.done():
+                    self._idle_task.cancel()
+                # Set idle after delay
+                async def delayed_idle():
+                    await asyncio.sleep(self._idle_delay)
+                    self._update_state("idle")
+                self._idle_task = asyncio.create_task(delayed_idle())
+                logger.debug("TTS stopped, idle in 0.5s")
+        except Exception as e:
+            logger.error(f"Error in TTS stop observer: {e}")
+    async def on_user_transcript(self, *args, **kwargs):
+        """Handle complete user transcript."""
+        try:
+            # Extract text from args
+            text = args[1] if len(args) > 1 else ""
+            if text and self.state_sync:
+                self.state_sync.send_transcript("user", text)
+        except Exception as e:
+            logger.error(f"❌ Error in user transcript observer: {e}")
+    async def on_bot_transcript(self, *args, **kwargs):
+        """Handle complete bot transcript."""
+        try:
+            # Extract text from args
+            text = args[1] if len(args) > 1 else ""
+            if text and self.state_sync:
+                self.state_sync.send_transcript("assistant", text)
+        except Exception as e:
+            logger.error(f"❌ Error in bot transcript observer: {e}")
+    def cancel_idle_timer(self):
+        """Cancel pending idle timer."""
+        if self._idle_task and not self._idle_task.done():
+            self._idle_task.cancel()
+            self._idle_task = None
+    def _update_state(self, new_state: str):
+        """
+        Update eye state if changed.
+        Args:
+            new_state: New state to set
+        """
+        if new_state != self._current_state:
+            self._current_state = new_state
+            if self.state_sync:
+                self.state_sync.send_eye_state(new_state)

src/observers/transcription_observer.py ADDED Viewed

	@@ -0,0 +1,70 @@

+"""Observer for logging transcriptions and sending to frontend."""
+import sys
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).parent.parent.parent))
+import time
+from loguru import logger
+from pipecat.frames.frames import TranscriptionFrame, InterimTranscriptionFrame
+from pipecat.observers.base_observer import BaseObserver, FramePushed
+from src.shared_state import metrics_store
+class TranscriptionObserver(BaseObserver):
+    """Logs transcriptions and sends to frontend."""
+    def __init__(self, webrtc_connection=None, client_state=None):
+        super().__init__()
+        self.webrtc_connection = webrtc_connection
+        self.client_state = client_state or {}
+        self._last_transcription = None  # Track last transcription to avoid duplicates
+        self._last_transcription_time = 0  # Timestamp of last transcription
+    async def on_push_frame(self, data: FramePushed):
+        """Watch frames as they're pushed through the pipeline."""
+        frame = data.frame
+        current_time = time.time()
+        # --- (Logging Logic) ---
+        if isinstance(frame, TranscriptionFrame):
+            # Deduplicate: Skip if same text within 200ms (different user_ids)
+            time_diff = current_time - self._last_transcription_time
+            if self._last_transcription == frame.text and time_diff < 0.2:
+                logger.debug(f"🔇 Skipping duplicate transcription: '{frame.text}' (last seen {time_diff*1000:.0f}ms ago)")
+                return
+            self._last_transcription = frame.text
+            self._last_transcription_time = current_time
+            raw_id = getattr(frame, 'user_id', None)
+            display_id = raw_id if (raw_id and raw_id != "S1") else self.client_state.get("client_id", "guest")
+            logger.info(f"🎤 Transcription [{display_id}]: {frame.text}")
+            # Store in shared state for Gradio UI
+            metrics_store.add_transcription("user", frame.text)
+            # Update Frontend via WebRTC
+            if self.webrtc_connection:
+                self._send_to_frontend("transcription", frame.text, display_id)
+        elif isinstance(frame, InterimTranscriptionFrame):
+            raw_id = getattr(frame, 'user_id', None)
+            display_id = raw_id if (raw_id and raw_id != "S1") else self.client_state.get("client_id", "guest")
+            # Update Frontend (don't deduplicate partials as they change frequently)
+            if self.webrtc_connection:
+                self._send_to_frontend("partial", frame.text, display_id)
+    def _send_to_frontend(self, type_str, text, speaker_id):
+        """Helper to send messages to frontend via WebRTC data channel."""
+        try:
+            if self.webrtc_connection and self.webrtc_connection.is_connected():
+                self.webrtc_connection.send_app_message({
+                    "type": type_str,
+                    "text": text,
+                    "speaker_id": speaker_id
+                })
+        except Exception as e:
+            logger.error(f"Error sending {type_str}: {e}")

src/observers/tts_state_observer.py ADDED Viewed

	@@ -0,0 +1,56 @@

+"""Observer for broadcasting TTS state changes to frontend."""
+from loguru import logger
+from pipecat.frames.frames import TTSStartedFrame, TTSStoppedFrame, TTSAudioRawFrame
+from pipecat.observers.base_observer import BaseObserver, FramePushed
+class TTSStateObserver(BaseObserver):
+    """Emits `tts_state` messages whenever the assistant starts or stops speaking."""
+    def __init__(self, webrtc_connection=None):
+        super().__init__()
+        self.webrtc_connection = webrtc_connection
+        self._speaking = False
+        self._has_received_audio = False
+    async def on_push_frame(self, data: FramePushed):
+        """Watch frames as they're pushed through the pipeline."""
+        frame = data.frame
+        # Priority 1: Explicit start/stop frames (most reliable)
+        if isinstance(frame, TTSStartedFrame):
+            self._set_state(True)
+        elif isinstance(frame, TTSStoppedFrame):
+            self._set_state(False)
+            self._has_received_audio = False
+        elif isinstance(frame, TTSAudioRawFrame):
+            # Priority 2: Use first audio frame to detect start (fallback)
+            # Only set to started if we haven't already and this is the first audio frame
+            if not self._speaking and not self._has_received_audio:
+                logger.debug("Detected TTS start via first TTSAudioRawFrame")
+                self._set_state(True)
+            self._has_received_audio = True
+            # Note: We rely on TTSStoppedFrame to detect stop, not audio frame absence
+    def _set_state(self, active: bool):
+        if self._speaking == active:
+            return
+        self._speaking = active
+        state = "started" if active else "stopped"
+        if not self.webrtc_connection:
+            return
+        try:
+            if self.webrtc_connection.is_connected():
+                self.webrtc_connection.send_app_message(
+                    {
+                        "type": "tts_state",
+                        "state": state,
+                    }
+                )
+                logger.debug(f"Sent TTS state message: {state}")
+        except Exception as exc:
+            logger.error(f"Failed to send TTS state: {exc}")

src/observers/vision_observer.py ADDED Viewed

	@@ -0,0 +1,142 @@

+"""Observer for logging vision processing events and Moondream activity."""
+import time
+from loguru import logger
+from pipecat.frames.frames import UserImageRequestFrame, LLMTextFrame, ErrorFrame
+from pipecat.observers.base_observer import BaseObserver, FramePushed
+class VisionObserver(BaseObserver):
+    """Logs vision processing events and Moondream activity."""
+    def __init__(self, webrtc_connection=None):
+        super().__init__()
+        self.webrtc_connection = webrtc_connection
+        self._video_frame_count = 0
+        self._last_video_frame_time = None
+    async def on_push_frame(self, data: FramePushed):
+        """Watch frames as they're pushed through the pipeline."""
+        frame = data.frame
+        current_time = time.time()
+        frame_type = type(frame).__name__
+        # Log vision request frames
+        if isinstance(frame, UserImageRequestFrame):
+            user_id = getattr(frame, 'user_id', 'unknown')
+            question = getattr(frame, 'text', 'unknown')
+            logger.info(f"👁️ Vision request received: user_id={user_id}, question={question}")
+            self._last_vision_request_time = current_time  # Track when vision was requested
+            self._vision_request_count = getattr(self, '_vision_request_count', 0) + 1
+            logger.info(f"📊 Vision request #{self._vision_request_count} - waiting for video frames and Moondream response...")
+            # Send status to frontend
+            if self.webrtc_connection:
+                try:
+                    if self.webrtc_connection.is_connected():
+                        self.webrtc_connection.send_app_message({
+                            "type": "vision",
+                            "status": "requested",
+                            "question": question
+                        })
+                except Exception as e:
+                    logger.debug(f"Error sending vision status: {e}")
+        elif 'video' in frame_type.lower() or 'image' in frame_type.lower() or 'vision' in frame_type.lower():
+            # Only log at info level if we're actively processing a vision request
+            is_vision_active = hasattr(self, '_last_vision_request_time') and self._last_vision_request_time is not None
+            if is_vision_active:
+                time_since_request = current_time - self._last_vision_request_time
+                if time_since_request < 5:  # Only log during active vision processing (5 seconds)
+                    logger.debug(f"📷 Vision-related frame: {frame_type}")
+            else:
+                # Otherwise, only log at debug level (won't show unless debug logging is enabled)
+                logger.debug(f"📷 Vision-related frame: {frame_type}")
+        # Log frames with image attribute only at debug level
+        elif hasattr(frame, 'image'):
+            logger.debug(f"📷 Frame with image attribute: {frame_type}")
+        # Log any frame that might be a vision response by checking attributes
+        elif hasattr(frame, 'user_id') and hasattr(frame, 'text'):
+            user_id = getattr(frame, 'user_id', 'unknown')
+            text = getattr(frame, 'text', '')
+            if 'vision' in frame_type.lower() or 'image' in frame_type.lower() or 'moondream' in frame_type.lower():
+                logger.info(f"✅ Vision response frame: {frame_type}, user_id={user_id}")
+                logger.info(f"   Response: {text[:200]}..." if len(text) > 200 else f"   Response: {text}")
+        # Log LLM text frames that might contain vision responses
+        # Moondream responses come through as LLMTextFrame with vision context
+        elif isinstance(frame, LLMTextFrame):
+            text = getattr(frame, 'text', '')
+            vision_keywords = ['see', 'visible', 'camera', 'image', 'showing', 'appears', 'looks like', 'dimly lit', 'desk', 'monitor', 'room', 'window', 'mug', 'laptop', 'coffee', 'analyzing', 'processing']
+            # Check if this is a vision response (either from keywords or if we recently requested vision)
+            is_vision_response = False
+            if hasattr(self, '_last_vision_request_time'):
+                time_since_request = current_time - self._last_vision_request_time
+                if time_since_request < 10:  # Within 10 seconds of vision request
+                    is_vision_response = True
+                    logger.info(f"✅ Vision response received (within {time_since_request:.1f}s of request): {text[:200]}..." if len(text) > 200 else f"✅ Vision response: {text}")
+            if text and any(keyword in text.lower() for keyword in vision_keywords) and not is_vision_response:
+                logger.info(f"✅ Possible vision response in LLM text: {text[:200]}..." if len(text) > 200 else f"✅ Possible vision response: {text}")
+        # Log errors
+        elif isinstance(frame, ErrorFrame):
+            error_msg = getattr(frame, 'error', str(frame))
+            if 'vision' in error_msg.lower() or 'moondream' in error_msg.lower() or 'image' in error_msg.lower():
+                logger.error(f"❌ Vision error: {error_msg}")
+                # Send error to frontend
+                if self.webrtc_connection:
+                    try:
+                        if self.webrtc_connection.is_connected():
+                            self.webrtc_connection.send_app_message({
+                                "type": "vision",
+                                "status": "error",
+                                "error": str(error_msg)
+                            })
+                    except Exception as e:
+                        logger.debug(f"Error sending vision error: {e}")
+        # Check for actual video frames (exclude audio frames)
+        # Check for video frames - be specific to avoid false positives
+        is_video_frame = False
+        # Explicitly exclude audio frames
+        if 'audio' in frame_type.lower():
+            is_video_frame = False
+        # Check for actual video frame types
+        elif 'VideoRawFrame' in frame_type or 'InputVideoRawFrame' in frame_type:
+            is_video_frame = True
+        elif 'video' in frame_type.lower() and 'audio' not in frame_type.lower():
+            # Only if it's a video frame and not an audio frame
+            is_video_frame = True
+        elif hasattr(frame, 'video') and not hasattr(frame, 'audio'):
+            # Has video attribute but not audio
+            is_video_frame = True
+        elif hasattr(frame, 'image') and hasattr(frame, 'user_id'):
+            # User image request/response frames
+            is_video_frame = True
+        # Only log actual video frames, not audio frames
+        if is_video_frame:
+            self._video_frame_count += 1
+            self._last_video_frame_time = current_time
+            # Only log every 100 frames to reduce spam significantly
+            if self._video_frame_count % 100 == 0:
+                logger.debug(f"🎥 Video frames streaming: {self._video_frame_count} frames received")
+        # Log frame count summary every 30 seconds (less frequent)
+        if not hasattr(self, '_last_summary_time'):
+            self._last_summary_time = current_time
+        elif current_time - self._last_summary_time >= 30:
+            if self._video_frame_count > 0:
+                logger.debug(f"📊 Video stream: {self._video_frame_count} frames in last 30 seconds")
+            else:
+                logger.warning(f"⚠️ No video frames detected in last 30 seconds!")
+            self._video_frame_count = 0
+            self._last_summary_time = current_time

src/processors/__init__.py ADDED Viewed

	@@ -0,0 +1,18 @@

+"""Frame processors for the Pipecat pipeline.
+This module contains processors that transform, filter, or process data.
+For logging/monitoring processors, see loggers.py module.
+"""
+from .filters import SilenceFilter, InputAudioFilter
+from .gating import InterventionGating
+from .visual_observer import VisualObserver
+from .emotional_monitor import EmotionalStateMonitor
+__all__ = [
+    "SilenceFilter",
+    "InputAudioFilter",
+    "InterventionGating",
+    "VisualObserver",
+    "EmotionalStateMonitor",
+]

src/processors/emotional_monitor.py ADDED Viewed

	@@ -0,0 +1,303 @@

+"""
+Real-time emotional and cognitive state monitoring using continuous video analysis.
+Detects hesitation, confusion, frustration, and other emotional cues to trigger TARS intervention.
+"""
+import asyncio
+import time
+import base64
+from typing import Optional, Dict, List
+from loguru import logger
+from PIL import Image
+import io
+from pipecat.frames.frames import (
+    Frame,
+    ImageRawFrame,
+    TextFrame,
+    LLMRunFrame,
+)
+from pipecat.processors.frame_processor import FrameProcessor, FrameDirection
+class EmotionalState:
+    """Container for detected emotional/cognitive state"""
+    def __init__(
+        self,
+        confused: bool = False,
+        hesitant: bool = False,
+        frustrated: bool = False,
+        focused: bool = False,
+        confidence: float = 0.0,
+        description: str = "",
+    ):
+        self.confused = confused
+        self.hesitant = hesitant
+        self.frustrated = frustrated
+        self.focused = focused
+        self.confidence = confidence
+        self.description = description
+        self.timestamp = time.time()
+    def needs_intervention(self) -> bool:
+        """Determine if TARS should intervene based on detected state"""
+        # Intervene if user shows signs of confusion, hesitation, or frustration
+        return self.confused or self.hesitant or self.frustrated
+    def __repr__(self):
+        states = []
+        if self.confused: states.append("confused")
+        if self.hesitant: states.append("hesitant")
+        if self.frustrated: states.append("frustrated")
+        if self.focused: states.append("focused")
+        return f"EmotionalState({', '.join(states) if states else 'neutral'}, confidence={self.confidence:.2f})"
+class EmotionalStateMonitor(FrameProcessor):
+    """
+    Continuously monitors video feed for emotional and cognitive states.
+    Analyzes facial expressions, body language, and behavior patterns to detect:
+    - Confusion (furrowed brow, head tilt, puzzled expression)
+    - Hesitation (pauses, uncertain gestures, looking away)
+    - Frustration (tense posture, sighs, agitated movements)
+    - Focus (engaged eye contact, attentive posture)
+    Triggers TARS intervention when negative states are detected.
+    """
+    def __init__(
+        self,
+        vision_client,
+        model: str = "moondream",
+        sampling_interval: float = 3.0,
+        intervention_threshold: int = 2,
+        enabled: bool = True,
+        auto_intervene: bool = False,
+    ):
+        """
+        Args:
+            vision_client: Moondream or compatible vision API client
+            model: Vision model to use
+            sampling_interval: Seconds between frame analyses (default: 3.0)
+            intervention_threshold: Number of consecutive negative states before intervening
+            enabled: Whether monitoring is active
+            auto_intervene: If True, automatically triggers LLM when threshold reached.
+                           If False, only tracks state (used by gating layer)
+        """
+        super().__init__()
+        self._vision_client = vision_client
+        self._model = model
+        self._sampling_interval = sampling_interval
+        self._intervention_threshold = intervention_threshold
+        self._enabled = enabled
+        self._auto_intervene = auto_intervene
+        # State tracking
+        self._last_sample_time = 0
+        self._last_state: Optional[EmotionalState] = None
+        self._state_history: List[EmotionalState] = []
+        self._consecutive_negative_states = 0
+        self._analyzing = False
+        # Cooldown tracking (when user declines help)
+        self._help_declined_time: Optional[float] = None
+        self._cooldown_duration = 30.0  # seconds - don't re-offer help for 30s after decline
+        logger.info(f"🧠 Emotional State Monitor initialized")
+        logger.info(f"   Sampling interval: {sampling_interval}s")
+        logger.info(f"   Intervention threshold: {intervention_threshold}")
+        logger.info(f"   Auto-intervene: {auto_intervene}")
+        logger.info(f"   Enabled: {enabled}")
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        """Process video frames and sample periodically for emotional analysis"""
+        await super().process_frame(frame, direction)
+        # Only analyze if enabled and frame is video input
+        if not self._enabled or not isinstance(frame, ImageRawFrame):
+            await self.push_frame(frame, direction)
+            return
+        # Check if it's time to sample
+        current_time = time.time()
+        if current_time - self._last_sample_time >= self._sampling_interval:
+            # Don't block the pipeline - analyze in background
+            if not self._analyzing:
+                self._last_sample_time = current_time
+                asyncio.create_task(self._analyze_emotional_state(frame))
+        await self.push_frame(frame, direction)
+    async def _analyze_emotional_state(self, frame: ImageRawFrame):
+        """Analyze frame for emotional/cognitive state"""
+        self._analyzing = True
+        try:
+            # Convert frame to base64
+            image = Image.frombytes(frame.format, frame.size, frame.image)
+            buffered = io.BytesIO()
+            image.save(buffered, format="JPEG")
+            img_str = base64.b64encode(buffered.getvalue()).decode()
+            # Construct emotion detection prompt
+            prompt = (
+                "Analyze the person's emotional and cognitive state. "
+                "Are they showing signs of: confusion (furrowed brow, puzzled expression), "
+                "hesitation (pauses, uncertain gestures), frustration (tense posture), "
+                "or focus (engaged, attentive)? "
+                "Respond concisely with detected states."
+            )
+            logger.debug(f"🔍 Analyzing emotional state...")
+            try:
+                response = await asyncio.wait_for(
+                    self._vision_client.chat.completions.create(
+                        model=self._model,
+                        messages=[
+                            {
+                                "role": "user",
+                                "content": [
+                                    {"type": "text", "text": prompt},
+                                    {
+                                        "type": "image_url",
+                                        "image_url": {
+                                            "url": f"data:image/jpeg;base64,{img_str}"
+                                        },
+                                    },
+                                ],
+                            }
+                        ],
+                        max_tokens=100,
+                    ),
+                    timeout=5.0,
+                )
+                description = response.choices[0].message.content.lower()
+                logger.debug(f"📊 Emotional analysis: {description}")
+                # Parse response to detect states
+                state = EmotionalState(
+                    confused="confus" in description or "puzzle" in description or "uncertain" in description,
+                    hesitant="hesita" in description or "unsure" in description or "pause" in description,
+                    frustrated="frustrat" in description or "tense" in description or "agitat" in description,
+                    focused="focus" in description or "attentive" in description or "engaged" in description,
+                    confidence=0.7,  # Could be enhanced with more sophisticated parsing
+                    description=description,
+                )
+                self._last_state = state
+                self._state_history.append(state)
+                # Keep only recent history (last 10 states)
+                if len(self._state_history) > 10:
+                    self._state_history.pop(0)
+                logger.info(f"🎭 State detected: {state}")
+                # Track consecutive negative states
+                if state.needs_intervention():
+                    self._consecutive_negative_states += 1
+                    logger.warning(
+                        f"⚠️ Negative state detected "
+                        f"({self._consecutive_negative_states}/{self._intervention_threshold})"
+                    )
+                else:
+                    self._consecutive_negative_states = 0
+                # Trigger intervention if threshold reached AND auto-intervene enabled
+                if self._auto_intervene and self._consecutive_negative_states >= self._intervention_threshold:
+                    await self._trigger_intervention(state)
+                    self._consecutive_negative_states = 0  # Reset after intervention
+                elif self._consecutive_negative_states >= self._intervention_threshold:
+                    # Just log, don't intervene (gating layer will handle it)
+                    logger.info(
+                        f"🎭 Intervention threshold reached ({self._consecutive_negative_states}) "
+                        f"- state available for gating layer"
+                    )
+            except asyncio.TimeoutError:
+                logger.warning("⚠️ Emotional analysis timed out")
+            except Exception as e:
+                logger.error(f"❌ Emotional analysis error: {e}")
+        except Exception as e:
+            logger.error(f"Error in emotional monitoring: {e}")
+        finally:
+            self._analyzing = False
+    async def _trigger_intervention(self, state: EmotionalState):
+        """Trigger TARS intervention based on detected emotional state"""
+        logger.info(f"🚨 Triggering TARS intervention for: {state}")
+        # Construct intervention message based on state
+        intervention_msg = self._get_intervention_message(state)
+        # Push context message to LLM
+        context_frame = TextFrame(
+            text=f"[Emotional State Alert]: {intervention_msg}"
+        )
+        await self.push_frame(context_frame, FrameDirection.UPSTREAM)
+        # Trigger LLM to respond
+        await self.push_frame(LLMRunFrame(), FrameDirection.UPSTREAM)
+        logger.info("✅ Intervention triggered")
+    def _get_intervention_message(self, state: EmotionalState) -> str:
+        """Generate appropriate intervention message based on detected state"""
+        if state.confused:
+            return (
+                "The user appears confused or uncertain. "
+                "Consider offering help or clarification proactively."
+            )
+        elif state.hesitant:
+            return (
+                "The user seems hesitant or unsure. "
+                "You might want to check if they need assistance."
+            )
+        elif state.frustrated:
+            return (
+                "The user appears frustrated or tense. "
+                "Consider offering support or suggesting a different approach."
+            )
+        else:
+            return (
+                "The user shows signs of difficulty. "
+                "Consider offering assistance."
+            )
+    def enable(self):
+        """Enable emotional monitoring"""
+        self._enabled = True
+        logger.info("🧠 Emotional monitoring enabled")
+    def disable(self):
+        """Disable emotional monitoring"""
+        self._enabled = False
+        logger.info("🧠 Emotional monitoring disabled")
+    def get_current_state(self) -> Optional[EmotionalState]:
+        """Get the most recent emotional state"""
+        return self._last_state
+    def get_state_summary(self) -> Dict:
+        """Get summary of recent emotional states"""
+        if not self._state_history:
+            return {"status": "no_data"}
+        total = len(self._state_history)
+        confused_count = sum(1 for s in self._state_history if s.confused)
+        hesitant_count = sum(1 for s in self._state_history if s.hesitant)
+        frustrated_count = sum(1 for s in self._state_history if s.frustrated)
+        focused_count = sum(1 for s in self._state_history if s.focused)
+        return {
+            "total_samples": total,
+            "confused_ratio": confused_count / total,
+            "hesitant_ratio": hesitant_count / total,
+            "frustrated_ratio": frustrated_count / total,
+            "focused_ratio": focused_count / total,
+            "current_state": str(self._last_state) if self._last_state else "unknown",
+        }

src/processors/filters.py ADDED Viewed

	@@ -0,0 +1,81 @@

+from pipecat.processors.frame_processor import FrameProcessor, FrameDirection
+from pipecat.frames.frames import (
+    LLMFullResponseEndFrame,
+    LLMTextFrame,
+    LLMFullResponseStartFrame,
+    Frame,
+    InputAudioRawFrame,
+    StartFrame,
+    EndFrame,
+    CancelFrame,
+    TTSTextFrame
+)
+from loguru import logger
+import json
+import re
+class InputAudioFilter(FrameProcessor):
+    """
+    Dedicated filter to block InputAudioRawFrame from reaching TTS service.
+    These frames should only go upstream (to STT), never downstream (to TTS).
+    """
+    async def process_frame(self, frame: Frame, direction):
+        await super().process_frame(frame, direction)
+        # block Audio going Downstream
+        if isinstance(frame, InputAudioRawFrame) and direction == FrameDirection.DOWNSTREAM:
+            return
+        await self.push_frame(frame, direction)
+class SilenceFilter(FrameProcessor):
+    """
+    Intercepts LLM responses. If response is {"action": "silence"}, drops it.
+    """
+    def __init__(self):
+        super().__init__()
+        self.current_response_text = ""
+        self.is_collecting = False
+    async def process_frame(self, frame: Frame, direction):
+        await super().process_frame(frame, direction)
+        if isinstance(frame, (StartFrame, EndFrame, CancelFrame)):
+            self.current_response_text = ""
+            self.is_collecting = False
+            await self.push_frame(frame, direction)
+            return
+        # Start collecting text
+        if isinstance(frame, LLMFullResponseStartFrame):
+            self.current_response_text = ""
+            self.is_collecting = True
+            await self.push_frame(frame, direction)
+        # Accumulate text
+        elif isinstance(frame, LLMTextFrame) and self.is_collecting:
+            self.current_response_text += frame.text
+            await self.push_frame(frame, direction)
+        # Check the full response
+        elif isinstance(frame, LLMFullResponseEndFrame):
+            if self.is_collecting:
+                text = self.current_response_text.strip()
+                try:
+                    # Check for silence JSON
+                    if "action" in text and "silence" in text:
+                        clean_json = text.replace("```json", "").replace("```", "").strip()
+                        data = json.loads(clean_json)
+                        if data.get("action") == "silence":
+                            logger.info("SilenceFilter: Suppressing silent response.")
+                            self.is_collecting = False
+                            return # Drop the EndFrame (silence the turn)
+                except:
+                    pass
+                self.is_collecting = False
+            await self.push_frame(frame, direction)
+        # Pass everything else (like Audio or System messages)
+        else:
+            await self.push_frame(frame, direction)

src/processors/gating.py ADDED Viewed

	@@ -0,0 +1,129 @@

+"""Intervention Gating: Traffic Controller for Bot Responses."""
+import json
+import time
+import aiohttp
+import asyncio
+from loguru import logger
+from pipecat.processors.frame_processor import FrameProcessor, FrameDirection
+from pipecat.frames.frames import LLMMessagesFrame, Frame
+from character.prompts import build_gating_system_prompt
+class InterventionGating(FrameProcessor):
+    """
+    Traffic Controller: Decides if TARS should reply based on Audio + Vision + Emotions.
+    Uses OpenAI-compatible API (DeepInfra).
+    """
+    def __init__(
+        self,
+        api_key: str,
+        base_url: str = "https://api.deepinfra.com/v1/openai",
+        model: str = "meta-llama/Llama-3.2-3B-Instruct",
+        visual_observer=None,
+        emotional_monitor=None
+    ):
+        super().__init__()
+        self.api_key = api_key
+        self.base_url = base_url
+        self.model = model
+        self.visual_observer = visual_observer
+        self.emotional_monitor = emotional_monitor
+        self.api_url = f"{base_url}/chat/completions"
+    async def _check_should_reply(self, messages: list) -> bool:
+        """Asks the fast LLM if we should reply (Audio + Vision + Emotions)."""
+        if not messages:
+            return False
+        # Extract the last user message
+        last_msg = messages[-1]
+        if last_msg.get("role") != "user":
+            return True
+        # 1. READ EMOTIONAL STATE (Highest Priority)
+        emotional_state = None
+        needs_help = False
+        if self.emotional_monitor:
+            emotional_state = self.emotional_monitor.get_current_state()
+            if emotional_state and emotional_state.needs_intervention():
+                # User is confused/hesitant/frustrated - ALWAYS respond
+                logger.info(
+                    f"🧠 Gating: User shows {emotional_state} - BYPASSING gating, offering help"
+                )
+                return True
+            needs_help = emotional_state.needs_intervention() if emotional_state else False
+        # 2. READ VISUAL CONTEXT (0ms Latency)
+        is_looking = False
+        if self.visual_observer:
+            # Read the variable updated by the background task
+            is_looking = self.visual_observer.visual_context.get("is_looking_at_robot", False)
+            # Ignore if data is too old (> 5 seconds)
+            last_update = self.visual_observer.visual_context.get("last_updated", 0)
+            if time.time() - last_update > 5.0:
+                is_looking = False
+        # 3. ANALYZE CONTEXT
+        history_text = "\n".join([f"{m['role']}: {m['content']}" for m in messages[-3:]])
+        # Build enriched system prompt with emotional context
+        system_prompt = build_gating_system_prompt(is_looking, emotional_state)
+        payload = {
+            "model": self.model,
+            "messages": [
+                {"role": "system", "content": system_prompt},
+                {"role": "user", "content": f"Context:\n{history_text}"}
+            ],
+            "response_format": {"type": "json_object"},
+            "max_tokens": 50
+        }
+        # Set strict timeout so we don't silence the bot if API is slow
+        timeout = aiohttp.ClientTimeout(total=1.5)
+        try:
+            async with aiohttp.ClientSession(timeout=timeout) as session:
+                async with session.post(
+                    self.api_url,
+                    headers={"Authorization": f"Bearer {self.api_key}"},
+                    json=payload
+                ) as resp:
+                    if resp.status == 200:
+                        result = await resp.json()
+                        content_response = result["choices"][0]["message"]["content"]
+                        content_response = content_response.replace("```json", "").replace("```", "").strip()
+                        data = json.loads(content_response)
+                        should_reply = data.get("reply", False)
+                        logger.debug(f"Gating decision: {should_reply} (Looking: {is_looking})")
+                        return should_reply
+                    else:
+                        logger.warning(f"Gating check failed: {resp.status}")
+                        return True # Fail open (reply if check fails)
+        except asyncio.TimeoutError:
+            logger.warning("🚦 Gating: Timed out! Defaulting to REPLY.")
+            return True
+        except Exception as e:
+            logger.error(f"Gating error: {e}")
+            return True
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        """
+        Intercepts LLMMessagesFrame.
+        If 'should_reply' is False, we DROP the frame, effectively silencing the bot.
+        """
+        await super().process_frame(frame, direction)
+        if isinstance(frame, LLMMessagesFrame) and direction == FrameDirection.DOWNSTREAM:
+            # Check if we should reply
+            should_reply = await self._check_should_reply(frame.messages)
+            if not should_reply:
+                logger.info(f"🚦 Gating: BLOCKING response.")
+                return # DROP THE FRAME
+            logger.info(f"🟢 Gating: PASSING through.")
+        await self.push_frame(frame, direction)

src/processors/visual_observer.py ADDED Viewed

	@@ -0,0 +1,389 @@

+import asyncio
+import time
+from typing import Optional, List, Dict, Any
+from loguru import logger
+from pipecat.frames.frames import Frame, ImageRawFrame, TextFrame
+from pipecat.processors.frame_processor import FrameProcessor, FrameDirection
+import base64
+from PIL import Image
+import io
+import cv2
+import numpy as np
+try:
+    import mediapipe as mp
+    MEDIAPIPE_AVAILABLE = True
+except ImportError:
+    MEDIAPIPE_AVAILABLE = False
+    logger.warning("MediaPipe not available, using OpenCV for face detection")
+class VisualObserver(FrameProcessor):
+    """
+    Observer that waits for UserImageRequestFrame, captures the next video frame,
+    analyzes it with a vision model, and injects the description back into the context.
+    Now includes face detection and display capabilities.
+    """
+    def __init__(
+        self,
+        vision_client,
+        model="moondream",
+        enable_display=False,
+        enable_face_detection=True,
+        webrtc_connection=None,
+        tars_client=None
+    ):
+        super().__init__()
+        self._vision_client = vision_client
+        self._model = model
+        self._waiting_for_image = False
+        self._current_request = None
+        self._last_analysis_time = 0
+        self._cooldown = 2.0  # Min seconds between analyses
+        self._enable_display = enable_display
+        self._enable_face_detection = enable_face_detection
+        self._webrtc_connection = webrtc_connection
+        self._tars_client = None  # Deprecated: Display control via gRPC in robot mode
+        self._display_window_name = "TARS Visual Observer"
+        # Face detection setup
+        self._face_detector = None
+        if self._enable_face_detection:
+            self._setup_face_detection()
+        # Stats
+        self._face_count = 0
+        self._frames_processed = 0
+        self._last_face_time = 0
+    def _setup_face_detection(self):
+        """Initialize face detection based on available libraries."""
+        try:
+            if MEDIAPIPE_AVAILABLE:
+                logger.info("🎯 Initializing MediaPipe face detection")
+                self._face_detector_type = "mediapipe"
+                self._mp_face_detection = mp.solutions.face_detection
+                self._mp_drawing = mp.solutions.drawing_utils
+                self._face_detector = self._mp_face_detection.FaceDetection(
+                    model_selection=0,  # 0 for short-range (< 2m), 1 for full-range
+                    min_detection_confidence=0.5
+                )
+            else:
+                # Fallback to OpenCV Haar Cascade
+                logger.info("🎯 Initializing OpenCV Haar Cascade face detection")
+                self._face_detector_type = "opencv"
+                cascade_path = cv2.data.haarcascades + 'haarcascade_frontalface_default.xml'
+                self._face_detector = cv2.CascadeClassifier(cascade_path)
+                if self._face_detector.empty():
+                    logger.error("Failed to load Haar Cascade classifier")
+                    self._face_detector = None
+        except Exception as e:
+            logger.error(f"Failed to initialize face detection: {e}")
+            self._face_detector = None
+    def detect_faces(self, image: np.ndarray) -> List[Dict[str, Any]]:
+        """
+        Detect faces in the image.
+        Args:
+            image: numpy array in BGR format
+        Returns:
+            List of face dictionaries with bounding boxes and confidence
+        """
+        if not self._face_detector:
+            return []
+        faces = []
+        try:
+            if self._face_detector_type == "mediapipe":
+                # Convert BGR to RGB for MediaPipe
+                rgb_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
+                results = self._face_detector.process(rgb_image)
+                if results.detections:
+                    h, w, _ = image.shape
+                    for detection in results.detections:
+                        bbox = detection.location_data.relative_bounding_box
+                        faces.append({
+                            'x': int(bbox.xmin * w),
+                            'y': int(bbox.ymin * h),
+                            'width': int(bbox.width * w),
+                            'height': int(bbox.height * h),
+                            'confidence': detection.score[0]
+                        })
+            else:  # opencv
+                gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
+                detected_faces = self._face_detector.detectMultiScale(
+                    gray,
+                    scaleFactor=1.1,
+                    minNeighbors=5,
+                    minSize=(30, 30)
+                )
+                for (x, y, w, h) in detected_faces:
+                    faces.append({
+                        'x': x,
+                        'y': y,
+                        'width': w,
+                        'height': h,
+                        'confidence': 1.0  # OpenCV Haar doesn't provide confidence
+                    })
+        except Exception as e:
+            logger.error(f"Error detecting faces: {e}")
+        return faces
+    def draw_faces(self, image: np.ndarray, faces: List[Dict[str, Any]]) -> np.ndarray:
+        """
+        Draw bounding boxes around detected faces.
+        Args:
+            image: numpy array in BGR format
+            faces: List of face dictionaries from detect_faces()
+        Returns:
+            Image with faces drawn
+        """
+        annotated_image = image.copy()
+        for face in faces:
+            x, y, w, h = face['x'], face['y'], face['width'], face['height']
+            confidence = face['confidence']
+            # Draw rectangle
+            cv2.rectangle(annotated_image, (x, y), (x + w, y + h), (0, 255, 0), 2)
+            # Draw confidence score
+            label = f"Face: {confidence:.2f}"
+            cv2.putText(
+                annotated_image,
+                label,
+                (x, y - 10),
+                cv2.FONT_HERSHEY_SIMPLEX,
+                0.5,
+                (0, 255, 0),
+                2
+            )
+        # Draw face count
+        cv2.putText(
+            annotated_image,
+            f"Faces: {len(faces)}",
+            (10, 30),
+            cv2.FONT_HERSHEY_SIMPLEX,
+            1,
+            (0, 255, 0),
+            2
+        )
+        return annotated_image
+    def display_frame(self, image: np.ndarray, faces: Optional[List[Dict[str, Any]]] = None):
+        """
+        Display the frame in a window with optional face annotations.
+        Args:
+            image: numpy array in BGR format
+            faces: Optional list of detected faces to draw
+        """
+        if not self._enable_display:
+            return
+        try:
+            display_image = image.copy()
+            if faces:
+                display_image = self.draw_faces(display_image, faces)
+            cv2.imshow(self._display_window_name, display_image)
+            cv2.waitKey(1)  # Required for window to update
+        except Exception as e:
+            logger.error(f"Error displaying frame: {e}")
+    def send_display_event(self, faces: List[Dict[str, Any]], image_base64: Optional[str] = None):
+        """
+        Send display event to WebRTC connection with face detection results.
+        Args:
+            faces: List of detected faces
+            image_base64: Optional base64-encoded image
+        """
+        if not self._webrtc_connection:
+            return
+        try:
+            if self._webrtc_connection.is_connected():
+                event_data = {
+                    "type": "face_detection",
+                    "status": "detected" if faces else "no_faces",
+                    "face_count": len(faces),
+                    "faces": faces,
+                    "timestamp": time.time()
+                }
+                # Optionally include thumbnail
+                if image_base64 and len(faces) > 0:
+                    event_data["thumbnail"] = image_base64
+                self._webrtc_connection.send_app_message(event_data)
+        except Exception as e:
+            logger.debug(f"Error sending display event: {e}")
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)
+        # 1. Handle Request from LLM (Check by class name to avoid import errors)
+        # We check for "UserImageRequestFrame" (your custom frame) OR "VisionImageRequestFrame"
+        if frame.__class__.__name__ in ["UserImageRequestFrame", "VisionImageRequestFrame"]:
+            logger.info(f"👁️ Vision request received: {getattr(frame, 'context', 'No context')}")
+            self._waiting_for_image = True
+            self._current_request = frame
+            # We don't yield this frame downstream; we consume it and act on it.
+            return
+        # 2. Handle Video Input (continuous face detection + optional vision analysis)
+        if isinstance(frame, ImageRawFrame):
+            self._frames_processed += 1
+            # Process face detection on every frame (or throttled)
+            if self._enable_face_detection and self._frames_processed % 5 == 0:
+                # Run face detection in background
+                asyncio.create_task(self._process_face_detection(frame))
+            # Vision analysis only when requested
+            if self._waiting_for_image:
+                # Check cooldown
+                if time.time() - self._last_analysis_time < self._cooldown:
+                    await self.push_frame(frame, direction)
+                    return
+                logger.info("📸 Capturing frame for analysis...")
+                self._waiting_for_image = False  # Reset flag immediately
+                self._last_analysis_time = time.time()
+                # Run analysis in background to avoid blocking audio pipeline
+                asyncio.create_task(self._analyze_and_respond(frame))
+                # Note: Still pass frame through for face detection
+        # Pass all other frames through
+        await self.push_frame(frame, direction)
+    async def _process_face_detection(self, frame: ImageRawFrame):
+        """Process face detection on video frame and send display events."""
+        try:
+            # Convert frame to numpy array
+            image = Image.frombytes(frame.format, frame.size, frame.image)
+            image_np = np.array(image)
+            # Convert RGB to BGR for OpenCV
+            if image_np.shape[2] == 3:
+                image_bgr = cv2.cvtColor(image_np, cv2.COLOR_RGB2BGR)
+            else:
+                image_bgr = image_np
+            # Get frame dimensions
+            frame_height, frame_width = image_bgr.shape[:2]
+            # Detect faces
+            faces = self.detect_faces(image_bgr)
+            if faces:
+                self._face_count = len(faces)
+                current_time = time.time()
+                # Log only periodically to avoid spam
+                if current_time - self._last_face_time > 5.0:
+                    logger.info(f"👤 Detected {len(faces)} face(s)")
+                    self._last_face_time = current_time
+                # Get the largest/most prominent face
+                primary_face = max(faces, key=lambda f: f['width'] * f['height'])
+                # Calculate face center
+                face_center_x = primary_face['x'] + primary_face['width'] // 2
+                face_center_y = primary_face['y'] + primary_face['height'] // 2
+                # Display the frame with face annotations
+                self.display_frame(image_bgr, faces)
+                # Send face position event to WebRTC frontend
+                self.send_display_event(faces)
+                # Optionally send face position to text frame for LLM context
+                # This can be used for "user is looking at you" type feedback
+                # Uncomment if you want the LLM to know about face position
+                # face_text = f"[Face Detected]: Position ({face_center_x}, {face_center_y}), Size: {primary_face['width']}x{primary_face['height']}"
+                # await self.push_frame(TextFrame(text=face_text), FrameDirection.UPSTREAM)
+            else:
+                # No faces detected
+                if self._face_count > 0:
+                    logger.debug("No faces detected")
+                    self._face_count = 0
+                    # Send "no face" event to WebRTC
+                    self.send_display_event([])
+                # Display frame without annotations
+                self.display_frame(image_bgr)
+        except Exception as e:
+            logger.error(f"Error in face detection: {e}")
+    async def _analyze_and_respond(self, frame: ImageRawFrame):
+        """Analyze image and push result text frame downstream."""
+        try:
+            # Convert raw frame to base64
+            image = Image.frombytes(frame.format, frame.size, frame.image)
+            buffered = io.BytesIO()
+            image.save(buffered, format="JPEG")
+            img_str = base64.b64encode(buffered.getvalue()).decode()
+            prompt = "Describe this image briefly."
+            # Try to extract prompt from the request context if available
+            if self._current_request and hasattr(self._current_request, 'context'):
+                 # Assuming context might be the question text
+                 context = self._current_request.context
+                 if context:
+                     prompt = f"{context} (Describe the image to answer this)"
+            logger.info(f"🔍 Sending image to vision model ({self._model})...")
+            try:
+                response = await asyncio.wait_for(
+                    self._vision_client.chat.completions.create(
+                        model=self._model,
+                        messages=[
+                            {
+                                "role": "user",
+                                "content": [
+                                    {"type": "text", "text": prompt},
+                                    {
+                                        "type": "image_url",
+                                        "image_url": {
+                                            "url": f"data:image/jpeg;base64,{img_str}"
+                                        },
+                                    },
+                                ],
+                            }
+                        ],
+                        max_tokens=100
+                    ),
+                    timeout=8.0  # 8 second timeout to prevent hanging
+                )
+                description = response.choices[0].message.content
+                logger.info(f"✅ Vision analysis: {description}")
+            except asyncio.TimeoutError:
+                logger.warning("⚠️ Vision model timed out!")
+                description = "I couldn't see clearly because the visual processing timed out."
+            except Exception as e:
+                logger.error(f"❌ Vision model error: {e}")
+                description = "I had trouble processing the visual data."
+            feedback_text = f"[Visual Observation]: {description}"
+            # Push text frame to LLM
+            await self.push_frame(TextFrame(text=feedback_text), FrameDirection.UPSTREAM)
+        except Exception as e:
+            logger.error(f"Error in vision pipeline: {e}")
+            self._waiting_for_image = False

src/services/README.md ADDED Viewed

	@@ -0,0 +1,110 @@

+# Services
+Backend services for TARS voice AI. These provide core functionality like speech recognition, text-to-speech, memory, and robot control.
+## Organization
+| Service | Purpose |
+|---------|---------|
+| `tars_robot.py` | Robot hardware control via gRPC (movement, camera, display) |
+| `tts_qwen.py` | Local text-to-speech using Qwen3 models |
+| `memory_chromadb.py` | Semantic memory using ChromaDB |
+| `memory_hybrid.py` | Hybrid memory combining ChromaDB and Mem0 |
+| `factories/` | Factory functions for creating STT/TTS services |
+## Robot Control
+Robot hardware is controlled exclusively via gRPC using the TARS SDK.
+### tars_robot.py
+Provides functions for robot control in robot mode (tars_bot.py):
+```python
+from services import tars_robot
+# Get robot client (singleton) - replace with your robot's IP
+client = tars_robot.get_robot_client(address="100.115.193.41:50051")
+# Control functions
+await tars_robot.execute_movement(["wave_right", "step_forward"])
+result = await tars_robot.capture_camera_view()
+tars_robot.set_emotion("happy")
+tars_robot.set_eye_state("listening")
+status = tars_robot.get_robot_status()
+available = tars_robot.is_robot_available()
+# Cleanup
+tars_robot.close_robot_client()
+```
+### Architecture
+Robot mode uses two communication channels:
+| Channel | Protocol | Purpose | Latency |
+|---------|----------|---------|---------|
+| Audio | WebRTC | Voice conversation | ~20ms |
+| Commands | gRPC | Hardware control | ~5-10ms |
+Audio flows through aiortc WebRTC connection.
+All hardware commands (movement, camera, display) use gRPC.
+### Browser Mode
+Browser mode (bot.py) does NOT support robot control.
+It only provides:
+- WebRTC audio/video with browser
+- Vision analysis
+- Conversation
+Display observers in browser mode are deprecated and do nothing.
+## Service Factories
+The `factories/` directory contains factory functions for creating STT and TTS services:
+```python
+from services.factories import create_stt_service, create_tts_service
+# Create STT service
+stt = create_stt_service(
+    provider="deepgram",  # or "speechmatics", "deepgram-flux"
+    deepgram_api_key=DEEPGRAM_API_KEY,
+    language=Language.EN
+)
+# Create TTS service
+tts = create_tts_service(
+    provider="elevenlabs",  # or "qwen3"
+    elevenlabs_api_key=ELEVENLABS_API_KEY,
+    elevenlabs_voice_id=VOICE_ID
+)
+```
+## Memory Services
+### ChromaDB (memory_chromadb.py)
+Simple semantic memory using ChromaDB vector database:
+```python
+from services.memory_chromadb import ChromaDBMemoryService
+memory = ChromaDBMemoryService()
+await memory.store("user_id", "The user likes pizza")
+results = await memory.search("user_id", "What does the user like?")
+```
+### Hybrid Memory (memory_hybrid.py)
+Combines ChromaDB with Mem0 for enhanced memory capabilities.
+## Not Services
+This directory is for backend services only. Other code belongs in:
+- `tools/` - LLM callable functions
+- `processors/` - Pipeline frame processors
+- `transport/` - Network transport (WebRTC, gRPC)
+- `observers/` - Pipeline observers

src/services/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+

src/services/factories/__init__.py ADDED Viewed

	@@ -0,0 +1,6 @@

+"""Service factories for STT and TTS providers."""
+from .stt_factory import create_stt_service
+from .tts_factory import create_tts_service
+__all__ = ["create_stt_service", "create_tts_service"]

src/services/factories/stt_factory.py ADDED Viewed

	@@ -0,0 +1,127 @@

+"""STT Service Factory - Centralized STT service creation."""
+from loguru import logger
+from pipecat.transcriptions.language import Language
+def create_stt_service(
+    provider: str,
+    speechmatics_api_key: str = None,
+    deepgram_api_key: str = None,
+    language: Language = Language.EN,
+    enable_diarization: bool = False,
+):
+    """
+    Create and configure STT service based on provider.
+    Args:
+        provider: "speechmatics", "deepgram", or "deepgram-flux"
+        speechmatics_api_key: Speechmatics API key (if using speechmatics)
+        deepgram_api_key: Deepgram API key (if using deepgram/deepgram-flux)
+        language: Language for transcription (default: English)
+        enable_diarization: Enable speaker diarization (default: False)
+    Returns:
+        Configured STT service instance
+    Raises:
+        ValueError: If provider is invalid or required parameters are missing
+        Exception: If STT service initialization fails
+    """
+    logger.info(f"Creating STT service: {provider}")
+    try:
+        if provider == "speechmatics":
+            # Lazy import to avoid requiring package when not in use
+            from pipecat.services.speechmatics.stt import SpeechmaticsSTTService, TurnDetectionMode
+            # Speechmatics with SMART_TURN mode for built-in turn detection
+            if not speechmatics_api_key:
+                raise ValueError("speechmatics_api_key is required for Speechmatics")
+            logger.info("Using Speechmatics STT with SMART_TURN mode")
+            stt_params = SpeechmaticsSTTService.InputParams(
+                language=language,
+                enable_diarization=enable_diarization,
+                turn_detection_mode=TurnDetectionMode.SMART_TURN,
+            )
+            stt = SpeechmaticsSTTService(
+                api_key=speechmatics_api_key,
+                params=stt_params,
+            )
+            logger.info("✓ Speechmatics STT service created with SMART_TURN mode")
+        elif provider == "deepgram":
+            # Lazy import to avoid requiring package when not in use
+            from pipecat.services.deepgram.stt import DeepgramSTTService
+            from deepgram.clients.listen.v1.websocket.options import LiveOptions
+            # Deepgram STT with server-side endpointing for turn detection
+            # Note: This uses Deepgram's server-side silence detection, not local smart turn
+            if not deepgram_api_key:
+                raise ValueError("deepgram_api_key is required for Deepgram")
+            logger.info("Using Deepgram STT with server-side endpointing")
+            live_options = LiveOptions(
+                language=language.value if hasattr(language, 'value') else str(language),
+                model="nova-2",  # Deepgram's latest model
+                interim_results=True,  # Enable interim transcription results
+                smart_format=True,  # Auto-format transcripts
+                punctuate=True,  # Add punctuation
+                endpointing=300,  # 300ms silence to detect end of speech (server-side)
+                vad_events=True,  # Enable VAD events for speech detection
+            )
+            stt = DeepgramSTTService(
+                api_key=deepgram_api_key,
+                live_options=live_options,
+                stt_ttfb_timeout=5.0,  # TTFB timeout for transcription (seconds)
+            )
+            logger.info("✓ Deepgram STT service created")
+            logger.info("  Turn detection: Server-side endpointing (300ms silence)")
+            logger.info("  VAD events: Enabled for speech detection")
+            logger.info("  TTFB timeout: 5.0s for transcription metrics")
+        elif provider == "deepgram-flux":
+            # Lazy import to avoid requiring package when not in use
+            from pipecat.services.deepgram.flux.stt import DeepgramFluxSTTService
+            # Deepgram Flux with built-in turn detection
+            if not deepgram_api_key:
+                raise ValueError("deepgram_api_key is required for Deepgram Flux")
+            logger.info("Using Deepgram Flux STT with built-in turn detection")
+            # Flux has different parameters - uses EOT (End of Transcript) detection
+            # Default model is "flux-general-en" and encoding is "linear16"
+            stt_params = DeepgramFluxSTTService.InputParams(
+                min_confidence=0.3,  # Minimum confidence threshold for accepting transcriptions
+                # Optional: Configure end-of-turn detection thresholds
+                # eot_threshold: Confidence threshold for detecting end of turn (0.0-1.0)
+                # eot_timeout_ms: Max time to wait before forcing turn end
+                # eager_eot_threshold: More aggressive turn ending threshold
+            )
+            stt = DeepgramFluxSTTService(
+                api_key=deepgram_api_key,
+                model="flux-general-en",  # Flux model for general English
+                params=stt_params,
+            )
+            # Set up debug event handler for Flux updates
+            @stt.event_handler("on_update")
+            async def on_flux_update(stt_service, transcript):
+                logger.debug(f"[Deepgram Flux] Update: {transcript}")
+            logger.info("✓ Deepgram Flux STT service created with built-in turn detection")
+            logger.info("  Note: STT latency will be tracked via MetricsFrame if emitted by Flux")
+        else:
+            raise ValueError(f"Unknown STT provider: {provider}. Must be 'speechmatics', 'deepgram', or 'deepgram-flux'")
+        return stt
+    except Exception as e:
+        logger.error(f"Failed to create STT service '{provider}': {e}", exc_info=True)
+        raise

src/services/factories/tts_factory.py ADDED Viewed

	@@ -0,0 +1,84 @@

+"""TTS Service Factory - Centralized TTS service creation."""
+from loguru import logger
+from pipecat.services.elevenlabs.tts import ElevenLabsTTSService
+from ..tts.tts_qwen import Qwen3TTSService
+def create_tts_service(
+    provider: str,
+    elevenlabs_api_key: str = None,
+    elevenlabs_voice_id: str = None,
+    qwen_model: str = None,
+    qwen_device: str = None,
+    qwen_ref_audio: str = None,
+):
+    """
+    Create and configure TTS service based on provider.
+    Args:
+        provider: "elevenlabs" or "qwen3"
+        elevenlabs_api_key: ElevenLabs API key (if using elevenlabs)
+        elevenlabs_voice_id: ElevenLabs voice ID (if using elevenlabs)
+        qwen_model: Qwen3-TTS model name (if using qwen3)
+        qwen_device: Device for Qwen3-TTS (if using qwen3)
+        qwen_ref_audio: Reference audio path for Qwen3-TTS (if using qwen3)
+    Returns:
+        Configured TTS service instance
+    Raises:
+        ValueError: If provider is invalid or required parameters are missing
+        Exception: If TTS service initialization fails
+    """
+    logger.info(f"Creating TTS service: {provider}")
+    try:
+        if provider == "qwen3":
+            # Local Qwen3-TTS with voice cloning
+            if not qwen_model:
+                raise ValueError("qwen_model is required for Qwen3-TTS")
+            logger.info("Using Qwen3-TTS (local, voice cloning)")
+            tts = Qwen3TTSService(
+                model_name=qwen_model,
+                device=qwen_device or "mps",
+                ref_audio_path=qwen_ref_audio,
+                x_vector_only_mode=True,
+                sample_rate=24000,
+            )
+            logger.info(f"✓ Qwen3-TTS service created (device: {qwen_device})")
+        elif provider == "elevenlabs":
+            # Cloud ElevenLabs TTS
+            if not elevenlabs_api_key or not elevenlabs_voice_id:
+                raise ValueError("elevenlabs_api_key and elevenlabs_voice_id are required for ElevenLabs")
+            logger.info("Using ElevenLabs TTS")
+            tts = ElevenLabsTTSService(
+                api_key=elevenlabs_api_key,
+                voice_id=elevenlabs_voice_id,
+                model="eleven_flash_v2_5",
+                output_format="pcm_24000",
+                enable_word_timestamps=False,
+                voice_settings={
+                    "stability": 0.5,
+                    "similarity_boost": 0.75,
+                    "style": 0.0,
+                    "use_speaker_boost": True
+                },
+                params=ElevenLabsTTSService.InputParams(
+                    enable_logging=True,  # Enable ElevenLabs logging for metrics
+                ),
+            )
+            logger.info("✓ ElevenLabs TTS service created")
+        else:
+            raise ValueError(f"Unknown TTS provider: {provider}. Must be 'qwen3' or 'elevenlabs'")
+        return tts
+    except Exception as e:
+        logger.error(f"Failed to create TTS service '{provider}': {e}", exc_info=True)
+        raise

src/services/memory/memory_chromadb.py ADDED Viewed

	@@ -0,0 +1,195 @@

+"""Local memory service using ChromaDB for semantic search."""
+import time
+from loguru import logger
+from pipecat.frames.frames import Frame, LLMMessagesFrame, LLMContextFrame, MetricsFrame
+from pipecat.metrics.metrics import TTFBMetricsData
+from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext, OpenAILLMContextFrame
+from pipecat.processors.frame_processor import FrameProcessor, FrameDirection
+from sentence_transformers import SentenceTransformer
+import chromadb
+class ChromaDBMemoryService(FrameProcessor):
+    """
+    Local memory service using ChromaDB for semantic search.
+    Replaces Mem0 with a local, fast, and free alternative:
+    - Stores conversation history with semantic embeddings
+    - Retrieves relevant memories based on similarity search
+    - No external API calls - everything runs locally
+    - Latency: ~50-100ms vs Mem0's ~200-500ms
+    """
+    def __init__(
+        self,
+        user_id: str,
+        agent_id: str = "tars_agent",
+        collection_name: str = "conversations",
+        search_limit: int = 5,
+        search_threshold: float = 0.5,
+        system_prompt_prefix: str = "Based on previous conversations, I recall:\n\n",
+        **kwargs
+    ):
+        super().__init__(**kwargs)
+        self.user_id = user_id
+        self.agent_id = agent_id
+        self.search_limit = search_limit
+        self.search_threshold = search_threshold
+        self.system_prompt_prefix = system_prompt_prefix
+        # Initialize ChromaDB (persistent local storage)
+        self.client = chromadb.PersistentClient(path="./chroma_memory")
+        # Create or get collection for this user
+        self.collection = self.client.get_or_create_collection(
+            name=f"{collection_name}_{user_id}",
+            metadata={"agent_id": agent_id}
+        )
+        # Load embedding model (lightweight, ~80MB)
+        logger.info("Loading sentence transformer model...")
+        self.embedder = SentenceTransformer('all-MiniLM-L6-v2')
+        # Frame counter for debugging
+        self._frame_count = 0
+        logger.info("✓ ChromaDB memory service initialized and ready to process frames")
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        """Process frames and inject memories into LLM context."""
+        try:
+            await super().process_frame(frame, direction)
+            # Frame counter
+            self._frame_count += 1
+            # Debug: Log all frame types to understand what's flowing through
+            frame_type = type(frame).__name__
+            direction_name = "DOWNSTREAM" if direction == FrameDirection.DOWNSTREAM else "UPSTREAM"
+            # Log LLM-related frames to debug
+            if 'LLM' in frame_type or 'Messages' in frame_type or 'Context' in frame_type:
+                logger.info(f"🔍 [ChromaDB] >>> RECEIVED: {frame_type} | Direction: {direction_name} | Count: {self._frame_count}")
+            # Log every 100th frame to verify it's being called
+            if self._frame_count % 100 == 0:
+                logger.info(f"🔍 [ChromaDB] Processed {self._frame_count} frames so far (latest: {frame_type})")
+            # Handle both LLMContextFrame and LLMMessagesFrame (like Mem0 does)
+            context = None
+            messages = None
+            if isinstance(frame, (LLMContextFrame, OpenAILLMContextFrame)):
+                logger.info(f"🧠 [ChromaDB] ✓✓✓ PROCESSING LLMContextFrame ✓✓✓")
+                context = frame.context
+            elif isinstance(frame, LLMMessagesFrame):
+                logger.info(f"🧠 [ChromaDB] ✓✓✓ PROCESSING LLMMessagesFrame ✓✓✓")
+                messages = frame.messages
+                context = LLMContext(messages)
+            if context:
+                # Get the latest user message
+                context_messages = context.get_messages()
+                user_message = None
+                for msg in reversed(context_messages):
+                    if msg.get("role") == "user" and isinstance(msg.get("content"), str):
+                        user_message = msg.get("content", "")
+                        break
+                if user_message:
+                    logger.info(f"🧠 [ChromaDB] Searching memories for: '{user_message[:50]}...'")
+                    # Search for relevant memories
+                    start_time = time.time()
+                    memories = await self._search_memories(user_message)
+                    search_latency_ms = (time.time() - start_time) * 1000
+                    # Emit metrics for observer tracking
+                    logger.info(f"📊 [ChromaDB] Search completed in {search_latency_ms:.0f}ms, emitting MetricsFrame")
+                    metrics_frame = MetricsFrame(
+                        data=[TTFBMetricsData(processor="ChromaDBMemoryService", value=search_latency_ms / 1000)]
+                    )
+                    await self.push_frame(metrics_frame, direction)
+                    if memories:
+                        # Inject memories into context
+                        memory_text = self.system_prompt_prefix + "\n".join(memories)
+                        context.add_message({"role": "system", "content": memory_text})
+                        logger.info(f"📚 Retrieved {len(memories)} memories in {search_latency_ms:.0f}ms")
+                    # Store current conversation turn
+                    await self._store_memory(user_message)
+                # If we received an LLMMessagesFrame, create a new one with the enhanced messages
+                if messages is not None:
+                    await self.push_frame(LLMMessagesFrame(context.get_messages()), direction)
+                else:
+                    # Otherwise, pass the enhanced context frame downstream
+                    await self.push_frame(frame, direction)
+            else:
+                # For non-context frames, just pass them through
+                await self.push_frame(frame, direction)
+        except Exception as e:
+            logger.error(f"❌ [ChromaDB] Error in process_frame: {e}", exc_info=True)
+            # Still pass frame through even if we failed
+            await self.push_frame(frame, direction)
+    async def _search_memories(self, query: str) -> list[str]:
+        """Search for relevant memories based on semantic similarity."""
+        try:
+            # Generate embedding for query
+            query_embedding = self.embedder.encode(query).tolist()
+            # Search in ChromaDB
+            results = self.collection.query(
+                query_embeddings=[query_embedding],
+                n_results=self.search_limit,
+            )
+            # Extract documents and filter by threshold
+            memories = []
+            if results and "documents" in results and results["documents"]:
+                for doc_list, distance_list in zip(results["documents"], results.get("distances", [[]])):
+                    for doc, distance in zip(doc_list, distance_list):
+                        # ChromaDB returns L2 distance, lower is better
+                        # Convert to similarity score (1 - normalized distance)
+                        similarity = 1 - (distance / 2)  # Normalize L2 distance to [0,1]
+                        if similarity >= self.search_threshold:
+                            memories.append(doc)
+            return memories
+        except Exception as e:
+            logger.error(f"Error searching memories: {e}")
+            return []
+    async def _store_memory(self, text: str):
+        """Store a memory with its embedding."""
+        try:
+            # Generate embedding
+            embedding = self.embedder.encode(text).tolist()
+            # Store in ChromaDB with timestamp as ID
+            doc_id = f"{int(time.time() * 1000)}"
+            self.collection.add(
+                documents=[text],
+                embeddings=[embedding],
+                ids=[doc_id],
+                metadatas=[{
+                    "user_id": self.user_id,
+                    "agent_id": self.agent_id,
+                    "timestamp": time.time()
+                }]
+            )
+            logger.debug(f"💾 Stored memory: {text[:50]}...")
+        except Exception as e:
+            logger.error(f"Error storing memory: {e}")
+    async def close(self):
+        """Cleanup resources."""
+        # ChromaDB client doesn't need explicit cleanup
+        pass

src/services/memory/memory_hybrid.py ADDED Viewed

	@@ -0,0 +1,393 @@

+"""
+Hybrid memory system optimized for voice AI with sub-50ms latency.
+Features:
+1. Hybrid search combining vector similarity (70%) and BM25 keyword matching (30%)
+2. SQLite + FTS5 for fast, local storage and search
+3. Query embedding cache to avoid redundant encoding
+4. Pre-warmed embedding model for consistent latency
+5. Strict timeout with graceful fallback
+6. Thread pool for non-blocking SQLite operations
+7. Fire-and-forget storage to prevent blocking
+Architecture:
+- Vector search for semantic similarity (cosine distance)
+- BM25 via FTS5 for exact keyword matching
+- Weighted score fusion for best of both worlds
+- Target latency: <50ms (vs ChromaDB's ~50-100ms)
+"""
+import asyncio
+import sqlite3
+import time
+from concurrent.futures import ThreadPoolExecutor
+from pathlib import Path
+from typing import Optional, List, Tuple
+import numpy as np
+from loguru import logger
+from pipecat.frames.frames import Frame, LLMMessagesFrame, LLMContextFrame, MetricsFrame
+from pipecat.metrics.metrics import TTFBMetricsData
+from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContextFrame
+from pipecat.processors.frame_processor import FrameProcessor, FrameDirection
+from sentence_transformers import SentenceTransformer
+class HybridMemoryService(FrameProcessor):
+    """
+    Hybrid memory service combining vector similarity and keyword search.
+    Target latency: <50ms
+    Architecture:
+    - Vector search via numpy (semantic similarity with cosine distance)
+    - BM25 via FTS5 (exact keyword matching)
+    - Weighted score fusion: 70% vector + 30% BM25
+    Voice AI optimizations:
+    - Query embedding cache (avoid re-encoding similar queries)
+    - Pre-warmed embedding model for consistent performance
+    - Thread pool for non-blocking SQLite operations
+    - Strict timeout with graceful fallback
+    - Fire-and-forget storage to prevent blocking
+    """
+    def __init__(
+        self,
+        user_id: str,
+        db_path: str = "./memory_data/memory.sqlite",
+        embedding_model: str = "all-MiniLM-L6-v2",
+        search_limit: int = 3,
+        search_timeout_ms: int = 40,
+        vector_weight: float = 0.7,
+        bm25_weight: float = 0.3,
+        system_prompt_prefix: str = "From our conversations:\n",
+        **kwargs,
+    ):
+        super().__init__(**kwargs)
+        self.user_id = user_id
+        self.db_path = db_path
+        self.search_limit = search_limit
+        self.search_timeout_ms = search_timeout_ms
+        self.vector_weight = vector_weight
+        self.bm25_weight = bm25_weight
+        self.system_prompt_prefix = system_prompt_prefix
+        # Thread pool for blocking operations
+        self._executor = ThreadPoolExecutor(max_workers=2, thread_name_prefix="HybridMemory")
+        # Initialize SQLite with FTS5 and vector support
+        Path(db_path).parent.mkdir(parents=True, exist_ok=True)
+        self._init_database()
+        # Load and warm embedding model
+        logger.info("Loading embedding model for hybrid memory...")
+        self.embedder = SentenceTransformer(embedding_model)
+        self._embedding_dim = self.embedder.get_sentence_embedding_dimension()
+        self._warmup_model()
+        # Embedding caches
+        self._query_cache: dict[str, np.ndarray] = {}  # For queries
+        self._doc_cache: dict[str, np.ndarray] = {}    # For documents
+        self._cache_max_size = 500
+        # Metrics
+        self._stats = {"searches": 0, "cache_hits": 0, "timeouts": 0, "total_latency_ms": 0}
+        self._frame_count = 0
+        logger.info(f"✓ Hybrid memory ready (vector + BM25, {search_timeout_ms}ms timeout)")
+    def _init_database(self):
+        """Initialize SQLite with FTS5 and vector table."""
+        conn = sqlite3.connect(self.db_path)
+        # Main memories table
+        conn.execute("""
+            CREATE TABLE IF NOT EXISTS memories (
+                id INTEGER PRIMARY KEY,
+                user_id TEXT NOT NULL,
+                content TEXT NOT NULL,
+                embedding BLOB,
+                created_at REAL DEFAULT (unixepoch('now', 'subsec'))
+            )
+        """)
+        # FTS5 virtual table for BM25 keyword search
+        conn.execute("""
+            CREATE VIRTUAL TABLE IF NOT EXISTS memories_fts
+            USING fts5(content, content='memories', content_rowid='id')
+        """)
+        # Triggers to keep FTS in sync
+        conn.execute("""
+            CREATE TRIGGER IF NOT EXISTS memories_ai AFTER INSERT ON memories BEGIN
+                INSERT INTO memories_fts(rowid, content) VALUES (new.id, new.content);
+            END
+        """)
+        conn.execute("""
+            CREATE TRIGGER IF NOT EXISTS memories_ad AFTER DELETE ON memories BEGIN
+                DELETE FROM memories_fts WHERE rowid = old.id;
+            END
+        """)
+        # Index for user filtering
+        conn.execute("CREATE INDEX IF NOT EXISTS idx_user ON memories(user_id)")
+        conn.commit()
+        conn.close()
+        logger.info("✓ SQLite database initialized with FTS5")
+    def _warmup_model(self):
+        """Warm up embedding model for consistent latency."""
+        warmup_start = time.perf_counter()
+        for _ in range(3):
+            _ = self.embedder.encode("warmup query", show_progress_bar=False)
+        warmup_time = (time.perf_counter() - warmup_start) * 1000
+        logger.info(f"✓ Embedding model warmed up ({warmup_time:.0f}ms)")
+    def _get_query_embedding(self, text: str) -> np.ndarray:
+        """Get embedding with query cache."""
+        cache_key = text.strip().lower()[:100]
+        if cache_key in self._query_cache:
+            self._stats["cache_hits"] += 1
+            return self._query_cache[cache_key]
+        embedding = self.embedder.encode(text, show_progress_bar=False)
+        # LRU eviction
+        if len(self._query_cache) >= self._cache_max_size:
+            oldest = next(iter(self._query_cache))
+            del self._query_cache[oldest]
+        self._query_cache[cache_key] = embedding
+        return embedding
+    def _cosine_similarity(self, a: np.ndarray, b: np.ndarray) -> float:
+        """Fast cosine similarity."""
+        return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b) + 1e-8))
+    def _bm25_rank_to_score(self, rank: int) -> float:
+        """Convert BM25 rank to normalized score."""
+        return 1.0 / (1.0 + max(0, rank))
+    def _hybrid_search_sync(self, query: str) -> List[Tuple[str, float]]:
+        """
+        Hybrid search combining vector similarity and BM25 keyword matching.
+        Returns [(content, score), ...] sorted by score.
+        """
+        conn = sqlite3.connect(self.db_path)
+        # Get query embedding
+        query_embedding = self._get_query_embedding(query)
+        # ========== Vector Search ==========
+        vector_results = {}
+        cursor = conn.execute(
+            "SELECT id, content, embedding FROM memories WHERE user_id = ? ORDER BY created_at DESC LIMIT 100",
+            (self.user_id,)
+        )
+        for row_id, content, embedding_blob in cursor:
+            if embedding_blob:
+                doc_embedding = np.frombuffer(embedding_blob, dtype=np.float32)
+                similarity = self._cosine_similarity(query_embedding, doc_embedding)
+                vector_results[row_id] = {
+                    "content": content,
+                    "vector_score": similarity,
+                    "bm25_score": 0.0,
+                }
+        # ========== BM25 Search (FTS5) ==========
+        # Build FTS query using OR for flexible token matching
+        tokens = [t for t in query.split() if len(t) > 2]
+        if tokens:
+            # Use OR for more flexible matching
+            fts_query = " OR ".join(f'"{t}"' for t in tokens[:5])  # Limit tokens
+            try:
+                bm25_cursor = conn.execute(
+                    """
+                    SELECT rowid, rank FROM memories_fts
+                    WHERE memories_fts MATCH ?
+                    ORDER BY rank
+                    LIMIT ?
+                    """,
+                    (fts_query, self.search_limit * 4)
+                )
+                for rank_idx, (row_id, bm25_rank) in enumerate(bm25_cursor):
+                    bm25_score = self._bm25_rank_to_score(rank_idx)
+                    if row_id in vector_results:
+                        vector_results[row_id]["bm25_score"] = bm25_score
+                    else:
+                        # BM25 found something vector didn't
+                        content_cursor = conn.execute(
+                            "SELECT content FROM memories WHERE id = ?", (row_id,)
+                        )
+                        row = content_cursor.fetchone()
+                        if row:
+                            vector_results[row_id] = {
+                                "content": row[0],
+                                "vector_score": 0.0,
+                                "bm25_score": bm25_score,
+                            }
+            except sqlite3.OperationalError as e:
+                # FTS query failed, continue with vector only
+                logger.debug(f"FTS query failed: {e}")
+                pass
+        conn.close()
+        # ========== Weighted Score Fusion ==========
+        results = []
+        for data in vector_results.values():
+            final_score = (
+                self.vector_weight * data["vector_score"] +
+                self.bm25_weight * data["bm25_score"]
+            )
+            results.append((data["content"], final_score))
+        # Sort by score, return top N
+        results.sort(key=lambda x: x[1], reverse=True)
+        return results[:self.search_limit]
+    def _store_sync(self, text: str):
+        """Store memory with embedding."""
+        embedding = self.embedder.encode(text, show_progress_bar=False)
+        embedding_blob = embedding.astype(np.float32).tobytes()
+        conn = sqlite3.connect(self.db_path)
+        conn.execute(
+            "INSERT INTO memories (user_id, content, embedding) VALUES (?, ?, ?)",
+            (self.user_id, text, embedding_blob)
+        )
+        conn.commit()
+        conn.close()
+    async def _search_with_timeout(self, query: str) -> List[Tuple[str, float]]:
+        """Async search with strict timeout."""
+        loop = asyncio.get_event_loop()
+        try:
+            result = await asyncio.wait_for(
+                loop.run_in_executor(self._executor, self._hybrid_search_sync, query),
+                timeout=self.search_timeout_ms / 1000,
+            )
+            return result
+        except asyncio.TimeoutError:
+            self._stats["timeouts"] += 1
+            logger.warning(f"⏱️  Memory search timed out ({self.search_timeout_ms}ms)")
+            return []
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        """Process Pipecat frames with hybrid memory injection."""
+        await super().process_frame(frame, direction)
+        try:
+            self._frame_count += 1
+            # Debug: Log all frame types to understand what's flowing through
+            frame_type = type(frame).__name__
+            direction_name = "DOWNSTREAM" if direction == FrameDirection.DOWNSTREAM else "UPSTREAM"
+            # Log LLM-related frames to debug
+            if 'LLM' in frame_type or 'Messages' in frame_type or 'Context' in frame_type:
+                logger.info(f"🔍 [HybridMemory] >>> RECEIVED: {frame_type} | Direction: {direction_name} | Count: {self._frame_count}")
+            context = None
+            messages = None
+            if isinstance(frame, (LLMContextFrame, OpenAILLMContextFrame)):
+                logger.info(f"🧠 [HybridMemory] ✓✓✓ PROCESSING LLMContextFrame ✓✓✓")
+                context = frame.context
+            elif isinstance(frame, LLMMessagesFrame):
+                logger.info(f"🧠 [HybridMemory] ✓✓✓ PROCESSING LLMMessagesFrame ✓✓✓")
+                messages = frame.messages
+                context = LLMContext(messages)
+            if context:
+                # Extract user message
+                user_message = None
+                for msg in reversed(context.get_messages()):
+                    if msg.get("role") == "user" and isinstance(msg.get("content"), str):
+                        user_message = msg["content"]
+                        break
+                if user_message:
+                    self._stats["searches"] += 1
+                    start_time = time.perf_counter()
+                    logger.info(f"🔍 [HybridMemory] Searching for: '{user_message[:50]}...'")
+                    # Hybrid search with timeout
+                    results = await self._search_with_timeout(user_message)
+                    latency_ms = (time.perf_counter() - start_time) * 1000
+                    self._stats["total_latency_ms"] += latency_ms
+                    # Emit metrics
+                    await self.push_frame(
+                        MetricsFrame(data=[
+                            TTFBMetricsData(processor="HybridMemory", value=latency_ms / 1000)
+                        ]),
+                        direction,
+                    )
+                    # Inject memories
+                    if results:
+                        memories_text = self.system_prompt_prefix + "\n".join(
+                            f"- {content}" for content, score in results
+                        )
+                        context.add_message({"role": "system", "content": memories_text})
+                        cache_rate = self._stats["cache_hits"] / max(1, self._stats["searches"]) * 100
+                        avg_latency = self._stats["total_latency_ms"] / max(1, self._stats["searches"])
+                        logger.info(
+                            f"📚 [HybridMemory] {len(results)} memories ({latency_ms:.0f}ms, "
+                            f"avg: {avg_latency:.0f}ms, cache: {cache_rate:.0f}%)"
+                        )
+                    else:
+                        logger.info(f"📚 [HybridMemory] No relevant memories ({latency_ms:.0f}ms)")
+                    # Fire-and-forget storage
+                    asyncio.create_task(self._store_async(user_message))
+                # Push frame
+                if messages is not None:
+                    await self.push_frame(LLMMessagesFrame(context.get_messages()), direction)
+                else:
+                    await self.push_frame(frame, direction)
+            else:
+                await self.push_frame(frame, direction)
+        except Exception as e:
+            logger.error(f"❌ [HybridMemory] Memory error: {e}", exc_info=True)
+            await self.push_frame(frame, direction)
+    async def _store_async(self, text: str):
+        """Async storage (fire-and-forget)."""
+        loop = asyncio.get_event_loop()
+        try:
+            await loop.run_in_executor(self._executor, self._store_sync, text)
+            logger.debug(f"💾 [HybridMemory] Stored: {text[:50]}...")
+        except Exception as e:
+            logger.debug(f"[HybridMemory] Store failed: {e}")
+    def get_stats(self) -> dict:
+        """Get performance statistics."""
+        searches = max(1, self._stats["searches"])
+        return {
+            "searches": self._stats["searches"],
+            "cache_hits": self._stats["cache_hits"],
+            "cache_hit_rate": f"{(self._stats['cache_hits'] / searches) * 100:.1f}%",
+            "timeouts": self._stats["timeouts"],
+            "avg_latency_ms": f"{self._stats['total_latency_ms'] / searches:.1f}",
+        }
+    async def close(self):
+        """Cleanup resources."""
+        self._executor.shutdown(wait=False)
+        stats = self.get_stats()
+        logger.info(f"📊 [HybridMemory] Final stats: {stats}")