Spaces:

latishab
/

tars-conversation-app

Running

App Files Files Community

tars-conversation-app / README.md

latishab

Update: Professional React landing page

7fb83e4 verified 2 months ago

preview code

raw

history blame contribute delete

9.26 kB

metadata

title: TARS Conversation App
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: static
short_description: Real-time AI voice assistant for TARS
pinned: false

TARS Conversation App

Real-time voice AI with transcription, vision, and intelligent conversation using Speechmatics/Deepgram, Qwen3-TTS/ElevenLabs, DeepInfra LLM, and Moondream.

Features

Dual Operation Modes
- WebRTC Mode (src/bot.py) - Browser-based voice AI with real-time metrics dashboard
- Robot Mode (src/tars_bot.py) - Connect to Raspberry Pi TARS robot via WebRTC and gRPC
Real-time Transcription - Speechmatics or Deepgram with smart turn detection
Dual TTS Options - Qwen3-TTS (local, free, voice cloning) or ElevenLabs (cloud)
LLM Integration - Any model via DeepInfra
Vision Analysis - Moondream for image understanding
Smart Gating Layer - AI-powered decision system for natural conversation flow
Hybrid Memory - SQLite-based hybrid search (70% vector + 30% BM25)
Emotional Monitoring - Real-time detection of confusion, hesitation, and frustration
Gradio Dashboard - Live TTFB metrics, latency charts, and conversation transcription
WebRTC Transport - Low-latency peer-to-peer audio
gRPC Robot Control - Hardware control with 5-10ms latency (robot mode only)

Project Structure

tars-conversation-app/
├── src/bot.py                      # WebRTC mode - Browser voice AI
├── src/tars_bot.py                 # Robot mode - Raspberry Pi hardware
├── src/pipecat_service.py          # FastAPI backend (WebRTC signaling)
├── config.py                   # Configuration management
├── config.ini                  # User configuration file
├── requirements.txt            # Python dependencies
│
├── src/                        # Backend
│   ├── observers/              # Pipeline observers (metrics, transcription)
│   ├── processors/             # Pipeline processors (silence filter, gating)
│   ├── services/               # Services (STT, TTS, Memory, Robot)
│   ├── tools/                  # LLM callable functions
│   ├── transport/              # WebRTC transport (aiortc)
│   ├── character/              # TARS personality and prompts
│   └── shared_state.py         # Shared metrics storage
│
├── ui/                         # Frontend
│   └── app.py                  # Gradio dashboard (metrics + transcription)
│
├── tests/                      # Tests
│   └── gradio/
│       └── test_gradio.py      # UI integration test
│
├── character/                  # TARS character data
│   ├── TARS.json              # Character definition
│   └── persona.ini            # Personality parameters

Operation Modes

WebRTC Mode (`src/bot.py`)

Use case: Browser-based voice AI conversations
Transport: SmallWebRTC (browser ↔ Pipecat)
Features: Full pipeline with STT, LLM, TTS, Memory
UI: Gradio dashboard for metrics and transcription
Best for: Development, testing, remote conversations

Robot Mode (`src/tars_bot.py`)

Use case: Physical TARS robot on Raspberry Pi
Transport: aiortc (RPi ↔ Pipecat) + gRPC (commands)
Features: Same pipeline + robot control (eyes, gestures, movement)
Hardware: Requires TARS robot with servos and display
Best for: Physical robot interactions, demos

Quick Start

Installation on TARS Robot (Recommended)

Install directly from HuggingFace Space via the TARS dashboard:

Open TARS dashboard at http://your-pi:8000
Go to App Store tab
Enter Space ID: latishab/tars-conversation-app
Click Install from HuggingFace
Configure API keys in .env.local
Click Start
Access metrics dashboard at http://your-pi:7860

The app will:

Auto-install dependencies
Set up virtual environment
Configure for robot mode
Start Gradio dashboard

Easy Installation (Manual)

For first-time setup on Raspberry Pi:

# Clone and install
git clone https://github.com/latishab/tars-conversation-app.git
cd tars-conversation-app
bash install.sh

The installer handles:

System dependencies (portaudio, ffmpeg)
Python virtual environment
All Python packages
Configuration file setup

Manual Installation

# Python dependencies
pip install -r requirements.txt

# For robot mode, install TARS SDK
pip install tars-robot[sdk]

2. Configure Environment

# Copy and edit environment file with your API keys
cp env.example .env.local

# Copy and edit configuration file
cp config.ini.example config.ini

Required API Keys (in .env.local):

SPEECHMATICS_API_KEY or DEEPGRAM_API_KEY - For speech-to-text
DEEPINFRA_API_KEY - For LLM
ELEVENLABS_API_KEY - Optional (if using ElevenLabs TTS)

Settings (in config.ini):

[LLM]
model = meta-llama/Llama-3.3-70B-Instruct

[STT]
provider = deepgram  # or speechmatics

[TTS]
provider = qwen3  # or elevenlabs

[Memory]
type = hybrid  # SQLite-based hybrid search (vector + BM25)

3. Run

WebRTC Mode (Browser)

Terminal 1: Python backend

python src/pipecat_service.py

Terminal 2: Gradio UI (optional)

python ui/app.py

Then:

Open WebRTC client in browser (connect to pipecat_service)
Open Gradio dashboard at http://localhost:7861 (for metrics)
Start talking

Robot Mode (Raspberry Pi)

Prerequisites:

Raspberry Pi TARS robot running tars_daemon.py
Network connection (LAN or Tailscale)
TARS SDK installed

Configuration in config.ini:

[Connection]
mode = robot
rpi_url = http://<your-rpi-ip>:8001
rpi_grpc = <your-rpi-ip>:50051
auto_connect = true

[Display]
enabled = true

Deployment detection:

Remote (Mac/computer): Uses configured addresses
Local (on RPi): Auto-detects localhost:50051

Run:

python src/tars_bot.py

Gradio Dashboard

The Gradio UI (ui/app.py) provides real-time monitoring:

Latency Dashboard

Service configuration (STT, Memory, LLM, TTS)
TTFB metrics with min/max/avg/last stats
Line chart: Latency trends over time
Bar chart: Stacked latency breakdown
Metrics table: Last 15 turns

Conversation Tab

Live user and assistant transcriptions
Auto-updates every second

Connection Tab

Architecture documentation
Usage instructions

Architecture

WebRTC Mode Data Flow

Browser (WebRTC client)
    ↕ (audio)
SmallWebRTC Transport
    ↓
Pipeline: STT → Memory → LLM → TTS
    ↓
Observers (metrics, transcription, assistant)
    ↓
shared_state.py
    ↓
Gradio UI (http://localhost:7861)

Robot Mode Data Flow

RPi Mic → WebRTC → Pipecat Pipeline → WebRTC → RPi Speaker
          (audio)        ↓              (audio)
                        STT → Memory → LLM → TTS
                                ↓
                         LLM Tools (set_emotion, do_gesture)
                                ↓
                        gRPC → RPi Hardware
                            (eyes, servos, display)

Communication channels (Robot Mode):

Channel	Protocol	Purpose	Latency
Audio	WebRTC (aiortc)	Voice conversation	~20ms
Commands	gRPC	Hardware control	~5-10ms
State	DataChannel	Battery, movement status	~10ms

Development

See docs/DEVELOPING_APPS.md for comprehensive guide on creating TARS SDK apps.

Adding Metrics

Emit MetricsFrame in your service/processor
MetricsObserver will capture it automatically
Metrics appear in Gradio dashboard

Adding Tools

Create function in src/tools/
Create schema with create_*_schema()
Register in src/bot.py or src/tars_bot.py
LLM can now call your tool

Modifying UI

Edit ui/app.py
Gradio hot-reloads automatically
Access metrics_store for data

Uninstalling

bash uninstall.sh

Removes virtual environment and optionally data/config files.

Troubleshooting

No metrics in Gradio UI

Ensure bot is running (src/bot.py or src/tars_bot.py)
Check WebRTC client is connected
Verify at least one conversation turn completed

Robot mode connection issues

Check RPi is reachable: ping <rpi-ip>
Verify tars_daemon is running on RPi
Check gRPC port 50051 is open
Review config.ini addresses

Import errors

pip install -r requirements.txt
pip install gradio plotly  # For UI

Audio issues (robot mode)

Check RPi mic/speaker with arecord/aplay
Verify WebRTC connection in logs
Test with tests/test_hardware.py

Contributing

Contributions welcome.

Fork the repository
Create a feature branch
Make your changes
Test with python tests/gradio/test_gradio.py
Commit with clear messages (see CLAUDE.md for style)
Push to your fork
Open a Pull Request

Code Style:

Python: Follow PEP 8
Add comments for complex logic
Update docs for new features
See CLAUDE.md for guidelines (concise, technical, no fluff)

License

MIT License - see LICENSE file for details