Spaces:

GoutamSachdev
/

agent_backend

Sleeping

App Files Files Community

agent_backend / README.md

GoutamSachdev

Update README.md

73dcc95 verified 5 months ago

preview code

Raw

History Blame Contribute Delete

4.68 kB

metadata

title: ChatKit Backend
emoji: 🤖
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
app_port: 7860

ChatKit Python Backend

This FastAPI service implements an advanced multi-agent support system using the OpenAI Agents SDK and OpenAI ChatKit. It provides specialized support agents for different platforms (Kimi, DeepSeek, Google) with integrated RAG and persistent memory.

🏗️ Architecture Overview

The system is built on a modular, agentic architecture designed for high-performance customer support.

Component Diagram

graph TD
    Client[ChatKit Frontend] <--> API[FastAPI Orchestrator]
    
    subgraph "Agent Layer"
        API <--> KimiAgent[Kimi Agent]
        API <--> DeepSeekAgent[DeepSeek Agent]
        API <--> GoogleAgent[Google Agent]
        API <--> Summ[Summarizer Agent]
    end
    
    subgraph "Tools & Intelligence"
        KimiAgent & DeepSeekAgent & GoogleAgent --> RAG[LlamaIndex RAG Tool]
        KimiAgent & DeepSeekAgent & GoogleAgent --> Facts[Fact Recording Tool]
        RAG --> Chroma[ChromaDB Vector Store]
        RAG --> WebsiteData[Website Knowledge Base]
    end
    
    subgraph "Persistence"
        API --> SQLiteThreads[SQLite: Chat Threads]
        API --> SQLiteState[SQLite: User State]
    end
    
    subgraph "External Providers"
        KimiAgent --> Groq[Groq API]
        DeepSeekAgent --> OpenRouter[OpenRouter API]
        GoogleAgent --> OR_Gemini[OpenRouter/Gemini]
    end

Key Components

FastAPI Orchestrator: Handles request routing, SSE streaming, and handoffs between Agents and ChatKit.
OpenAI Agents SDK: Provides the logic for agent loops, tool calling, and handoffs.
ChatKit Server: Manages the ChatKit protocol, ensuring real-time UI updates (widgets, thoughts, tool results).
Vector RAG Engine: Uses LlamaIndex and ChromaDB to query school-specific services from scraped website data.
Multi-Modal: Integrated Groq Whisper for audio transcriptions.

🤖 Model Configuration

The system utilizes a specialized mix of state-of-the-art models to balance performance, cost, and reasoning capabilities.

Agent / Service	Model Name	Primary Provider	API Class
Kimi Agent	`moonshotai/kimi-k2-instruct-0905`	Groq	`OpenAIResponsesModel`
DeepSeek Agent	`deepseek/deepseek-chat`	OpenRouter	`OpenAIChatCompletionsModel`
Google Agent	`google/gemini-2.5-flash`	OpenRouter	`OpenAIChatCompletionsModel`
Summarizer Agent	`meta-llama/llama-4-scout-17b-16e-instruct`	Groq	`OpenAIResponsesModel`
Audio Transcription	`whisper-large-v3-turbo`	Groq	Native Deepgram/Whisper

📋 Assumptions & Limitations

Assumptions

API Availability: The system assumes stable connections to Groq and OpenRouter.
Static Knowledge: The RAG system assumes the vector database is pre-built from the school's website data (accessible in the website/ folder).
Single Instance: Currently architected for single-instance deployment (SQLite persistence).

Limitations

Responses API Compatibility: Only Groq natively supports the full OpenAIResponsesModel required for advanced thread state. OpenRouter models use OpenAIChatCompletionsModel with a custom persistence bridge.
Concurrency: SQLite is configured in WAL mode, but extremely high concurrent traffic would require a transition to PostgreSQL.
Context Limits: While the Summarizer Agent mitigates context bloat, extremely complex multi-turn sessions still rely on the provider's context window (e.g., 128k for DeepSeek/Gemini).

💰 Cost Estimation (Rough Calculation)

Calculations based on 1,000 user queries with an average of 2,000 tokens per turn (1,500 input / 500 output).

Model (Provider)	Input Cost (per 1k)	Output Cost (per 1k)	Estimated Total / 1,000 Queries
DeepSeek Chat (OpenRouter)	$0.021	$0.035	$0.056
Gemini 2.5 Flash (OpenRouter)	$0.150	$0.200	$0.350
Kimi (Groq)	Free Tier / $0	Free Tier / $0	$0.000

Note: Costs are based on current OpenRouter pricing (Jan 2025) and Groq's high-speed free tier. DeepSeek remains the most cost-effective provider for reasoning-intensive tasks.

🚀 Getting Started

To enable the realtime assistant you need to install both the ChatKit Python package and the OpenAI SDK, then provide an OPENAI_API_KEY environment variable.

uv sync
export OPENAI_API_KEY=sk-proj-...
uv run uvicorn app.main:app --reload