agent_backend / README.md
GoutamSachdev's picture
Update README.md
73dcc95 verified
|
Raw
History Blame Contribute Delete
4.68 kB
---
title: ChatKit Backend
emoji: πŸ€–
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
app_port: 7860
---
# ChatKit Python Backend
This FastAPI service implements an advanced multi-agent support system using the **OpenAI Agents SDK** and **OpenAI ChatKit**. It provides specialized support agents for different platforms (Kimi, DeepSeek, Google) with integrated RAG and persistent memory.
## πŸ—οΈ Architecture Overview
The system is built on a modular, agentic architecture designed for high-performance customer support.
### Component Diagram
```mermaid
graph TD
Client[ChatKit Frontend] <--> API[FastAPI Orchestrator]
subgraph "Agent Layer"
API <--> KimiAgent[Kimi Agent]
API <--> DeepSeekAgent[DeepSeek Agent]
API <--> GoogleAgent[Google Agent]
API <--> Summ[Summarizer Agent]
end
subgraph "Tools & Intelligence"
KimiAgent & DeepSeekAgent & GoogleAgent --> RAG[LlamaIndex RAG Tool]
KimiAgent & DeepSeekAgent & GoogleAgent --> Facts[Fact Recording Tool]
RAG --> Chroma[ChromaDB Vector Store]
RAG --> WebsiteData[Website Knowledge Base]
end
subgraph "Persistence"
API --> SQLiteThreads[SQLite: Chat Threads]
API --> SQLiteState[SQLite: User State]
end
subgraph "External Providers"
KimiAgent --> Groq[Groq API]
DeepSeekAgent --> OpenRouter[OpenRouter API]
GoogleAgent --> OR_Gemini[OpenRouter/Gemini]
end
```
### Key Components
1. **FastAPI Orchestrator**: Handles request routing, SSE streaming, and handoffs between Agents and ChatKit.
2. **OpenAI Agents SDK**: Provides the logic for agent loops, tool calling, and handoffs.
3. **ChatKit Server**: Manages the ChatKit protocol, ensuring real-time UI updates (widgets, thoughts, tool results).
4. **Vector RAG Engine**: Uses LlamaIndex and ChromaDB to query school-specific services from scraped website data.
5. **Multi-Modal**: Integrated Groq Whisper for audio transcriptions.
---
## πŸ€– Model Configuration
The system utilizes a specialized mix of state-of-the-art models to balance performance, cost, and reasoning capabilities.
| Agent / Service | Model Name | Primary Provider | API Class |
| :--- | :--- | :--- | :--- |
| **Kimi Agent** | `moonshotai/kimi-k2-instruct-0905` | **Groq** | `OpenAIResponsesModel` |
| **DeepSeek Agent** | `deepseek/deepseek-chat` | **OpenRouter** | `OpenAIChatCompletionsModel` |
| **Google Agent** | `google/gemini-2.5-flash` | **OpenRouter** | `OpenAIChatCompletionsModel` |
| **Summarizer Agent** | `meta-llama/llama-4-scout-17b-16e-instruct` | **Groq** | `OpenAIResponsesModel` |
| **Audio Transcription** | `whisper-large-v3-turbo` | **Groq** | Native Deepgram/Whisper |
---
## πŸ“‹ Assumptions & Limitations
### Assumptions
- **API Availability**: The system assumes stable connections to Groq and OpenRouter.
- **Static Knowledge**: The RAG system assumes the vector database is pre-built from the school's website data (accessible in the `website/` folder).
- **Single Instance**: Currently architected for single-instance deployment (SQLite persistence).
### Limitations
- **Responses API Compatibility**: Only Groq natively supports the full `OpenAIResponsesModel` required for advanced thread state. OpenRouter models use `OpenAIChatCompletionsModel` with a custom persistence bridge.
- **Concurrency**: SQLite is configured in WAL mode, but extremely high concurrent traffic would require a transition to PostgreSQL.
- **Context Limits**: While the Summarizer Agent mitigates context bloat, extremely complex multi-turn sessions still rely on the provider's context window (e.g., 128k for DeepSeek/Gemini).
---
## πŸ’° Cost Estimation (Rough Calculation)
*Calculations based on 1,000 user queries with an average of 2,000 tokens per turn (1,500 input / 500 output).*
| Model (Provider) | Input Cost (per 1k) | Output Cost (per 1k) | **Estimated Total / 1,000 Queries** |
| :--- | :--- | :--- | :--- |
| **DeepSeek Chat** (OpenRouter) | $0.021 | $0.035 | **$0.056** |
| **Gemini 2.5 Flash** (OpenRouter) | $0.150 | $0.200 | **$0.350** |
| **Kimi** (Groq) | Free Tier / $0 | Free Tier / $0 | **$0.000** |
*Note: Costs are based on current OpenRouter pricing (Jan 2025) and Groq's high-speed free tier. DeepSeek remains the most cost-effective provider for reasoning-intensive tasks.*
---
## πŸš€ Getting Started
To enable the realtime assistant you need to install both the ChatKit Python package and the OpenAI SDK, then provide an `OPENAI_API_KEY` environment variable.
```bash
uv sync
export OPENAI_API_KEY=sk-proj-...
uv run uvicorn app.main:app --reload
```