Spaces:

pgits
/

voiceCal-ai-v3

Build error

poetry run pytest - Run test suite
poetry run pytest unit_tests/test_booking_interception.py - Run single test file
poetry run pytest --cov=app - Run tests with coverage
poetry run black . - Code formatting
poetry run flake8 - Linting
poetry run mypy . - Type checking

Utility Scripts

python scripts/refresh_google_token.py - Refresh Google OAuth credentials
python scripts/debug_chat.py - Interactive CLI for testing conversations

Docker

docker build -t voicecal-ai:latest . - Build Docker image
docker run -p 7860:7860 voicecal-ai:latest - Run container

Architecture Overview

Core Components

app/core/agent.py - ChatCalAgent: ReAct agent with Google Calendar tools, user info tracking, and loop detection
app/core/tools.py - CalendarTools: check_availability and create_appointment implementations
app/core/llm.py + app/core/llm_anthropic.py - LLM abstraction; primary is Anthropic claude-sonnet-4-20250514; Groq and Gemini are fallbacks
app/core/custom_parser.py - VerbatimOutputParser: extracts raw tool responses from ReAct output (critical for preserving HTML formatting)
app/core/session_factory.py - Selects Redis (dev) or JWT (HuggingFace) backend at runtime
app/personality/prompts.py - 400+ line system prompt that drives the entire booking workflow

API Structure

app/api/main.py - FastAPI application with ~20 endpoints
app/api/chat_widget.py - Embeddable chat UI (HTML/JS served inline)
app/api/models.py - Pydantic models for API requests/responses

Configuration

app/config.py - Centralized Pydantic BaseSettings; loads from env vars then .env
Session backend configurable: SESSION_BACKEND=redis (default) or SESSION_BACKEND=jwt

Key Architectural Patterns

Agent Workflow

The agent is a LlamaIndex ReActAgent. On each turn:

A CalendarLLMWrapper dynamically injects a system prompt with missing user info and user context
The agent reasons through tool calls (check_availability, create_appointment)
VerbatimOutputParser extracts raw tool output verbatim — never summarized — to preserve HTML confirmation markup
Booking success is detected by markers like <div id="booking-success"> and celebration phrases ("thanks", "all set") to return a pre-written farewell without an LLM call

Email Auto-Provision

Email flows from the landing page URL (/chat-widget?email=...) → stored in session user_data → loaded into agent.user_info on init. The system prompt strictly forbids asking for email again.

Meeting Conflict Handling

When create_appointment detects a conflict, it stores the full booking details in conversation_state.pending_operation so the agent can retry with an alternate time without re-asking for topic/duration.

Custom Meeting ID Format

MMDD-HHMM-DURm (e.g., 0731-1400-60m) — encodes date, time, and duration in a human-readable string stored alongside the Google Calendar event ID.

Loop Detection

Agent tracks the last 5 responses and stops after 2 similar consecutive responses (normalized for missing-info content).

Key Features

LLM Integration

Primary: Anthropic Claude Sonnet 4 (claude-sonnet-4-20250514)
Fallbacks: Groq Llama-3.1-8b-instant, Google Gemini
Mock LLM available for testing via USE_MOCK_LLM=true env var
HTML-formatted responses pass through verbatim

Calendar Integration

Google Calendar OAuth2 authentication
Meeting booking with conflict detection
Google Meet video link integration
Email invitations sent to attendees (SMTP via Gmail)
Business hours configurable via env vars (default: 9 AM–5 PM EST)

Session Management

Redis sessions for stateful development
JWT sessions for stateless HuggingFace deployment (no external dependencies)
Configurable session timeout (default: 10 minutes)
Conversation objects are always in-memory regardless of backend

Speech-to-Text / Text-to-Speech

STT file upload: POST /api/stt/transcribe — Groq Whisper API (whisper-large-v3-turbo); auto-converts MP4+Opus → WebM
TTS: POST /tts/synthesize — Groq PlayAI with 27 available voices; audio cached in-memory (10 most recent)
WebSocket STT: Configured for wss://pgits-stt-gpu-service-v3.hf.space/ws at 16kHz (feature flag, not active by default)

Deployment

HuggingFace Spaces

Entry point: app.py (forces SESSION_BACKEND=jwt, port 7860)
Logs written to stdout and /tmp/app.log
SSH debugging: ssh -i ~/.ssh/id_ed25519 pgits-voicecal-ai@ssh.hf.space (via Dev Mode)
HF deploys from main branch

Environment Variables

Required:

GROQ_API_KEY — Groq LLM and Whisper STT
ANTHROPIC_API_KEY — Primary LLM
GOOGLE_CLIENT_ID, GOOGLE_CLIENT_SECRET — OAuth2 credentials
SECRET_KEY — Session signing
MY_PHONE_NUMBER, MY_EMAIL_ADDRESS — Peter's contact info

Optional:

GEMINI_API_KEY — Fallback LLM
TESTING_MODE=true — Bypass email validation for development
SESSION_BACKEND=jwt — Stateless mode for HuggingFace
USE_MOCK_LLM=true — Mock LLM for unit tests
BUSINESS_START_HOUR, BUSINESS_END_HOUR — Override default 9–17

Version Management

Semantic versioning in pyproject.toml and version.txt
Update both files with every commit
Version displayed in UI footer

Development Notes

Tool response preservation is critical: never summarize tool output; use VerbatimOutputParser to return as-is
System prompt is the source of truth for agent behavior — edit app/personality/prompts.py for workflow changes
HTML responses require allow_html=true in agent configuration
testing_mode bypasses email validation (allows non-Peter emails)
Decision: removed booking summary step — allow raw HTML confirmation through directly to frontend
Google credentials stored in credentials/ dir and synced to HF Secrets after OAuth flow