Spaces:

GoutamSachdev
/

agent_backend

Sleeping

App Files Files Community

agent_backend / README.md

GoutamSachdev

Update README.md

73dcc95 verified 5 months ago

preview code

Raw

History Blame Contribute Delete

4.68 kB

	---
	title: ChatKit Backend
	emoji: 🤖
	colorFrom: blue
	colorTo: indigo
	sdk: docker
	pinned: false
	app_port: 7860
	---

	# ChatKit Python Backend

	This FastAPI service implements an advanced multi-agent support system using the OpenAI Agents SDK and OpenAI ChatKit. It provides specialized support agents for different platforms (Kimi, DeepSeek, Google) with integrated RAG and persistent memory.

	## 🏗️ Architecture Overview

	The system is built on a modular, agentic architecture designed for high-performance customer support.

	### Component Diagram

	```mermaid
	graph TD
	Client[ChatKit Frontend] <--> API[FastAPI Orchestrator]

	subgraph "Agent Layer"
	API <--> KimiAgent[Kimi Agent]
	API <--> DeepSeekAgent[DeepSeek Agent]
	API <--> GoogleAgent[Google Agent]
	API <--> Summ[Summarizer Agent]
	end

	subgraph "Tools & Intelligence"
	KimiAgent & DeepSeekAgent & GoogleAgent --> RAG[LlamaIndex RAG Tool]
	KimiAgent & DeepSeekAgent & GoogleAgent --> Facts[Fact Recording Tool]
	RAG --> Chroma[ChromaDB Vector Store]
	RAG --> WebsiteData[Website Knowledge Base]
	end

	subgraph "Persistence"
	API --> SQLiteThreads[SQLite: Chat Threads]
	API --> SQLiteState[SQLite: User State]
	end

	subgraph "External Providers"
	KimiAgent --> Groq[Groq API]
	DeepSeekAgent --> OpenRouter[OpenRouter API]
	GoogleAgent --> OR_Gemini[OpenRouter/Gemini]
	end
	```

	### Key Components

	1. FastAPI Orchestrator: Handles request routing, SSE streaming, and handoffs between Agents and ChatKit.
	2. OpenAI Agents SDK: Provides the logic for agent loops, tool calling, and handoffs.
	3. ChatKit Server: Manages the ChatKit protocol, ensuring real-time UI updates (widgets, thoughts, tool results).
	4. Vector RAG Engine: Uses LlamaIndex and ChromaDB to query school-specific services from scraped website data.
	5. Multi-Modal: Integrated Groq Whisper for audio transcriptions.

	---

	## 🤖 Model Configuration

	The system utilizes a specialized mix of state-of-the-art models to balance performance, cost, and reasoning capabilities.

	\| Agent / Service \| Model Name \| Primary Provider \| API Class \|
	\| :--- \| :--- \| :--- \| :--- \|
	\| Kimi Agent \| `moonshotai/kimi-k2-instruct-0905` \| Groq \| `OpenAIResponsesModel` \|
	\| DeepSeek Agent \| `deepseek/deepseek-chat` \| OpenRouter \| `OpenAIChatCompletionsModel` \|
	\| Google Agent \| `google/gemini-2.5-flash` \| OpenRouter \| `OpenAIChatCompletionsModel` \|
	\| Summarizer Agent \| `meta-llama/llama-4-scout-17b-16e-instruct` \| Groq \| `OpenAIResponsesModel` \|
	\| Audio Transcription \| `whisper-large-v3-turbo` \| Groq \| Native Deepgram/Whisper \|

	---

	## 📋 Assumptions & Limitations

	### Assumptions
	- API Availability: The system assumes stable connections to Groq and OpenRouter.
	- Static Knowledge: The RAG system assumes the vector database is pre-built from the school's website data (accessible in the `website/` folder).
	- Single Instance: Currently architected for single-instance deployment (SQLite persistence).

	### Limitations
	- Responses API Compatibility: Only Groq natively supports the full `OpenAIResponsesModel` required for advanced thread state. OpenRouter models use `OpenAIChatCompletionsModel` with a custom persistence bridge.
	- Concurrency: SQLite is configured in WAL mode, but extremely high concurrent traffic would require a transition to PostgreSQL.
	- Context Limits: While the Summarizer Agent mitigates context bloat, extremely complex multi-turn sessions still rely on the provider's context window (e.g., 128k for DeepSeek/Gemini).

	---

	## 💰 Cost Estimation (Rough Calculation)

	Calculations based on 1,000 user queries with an average of 2,000 tokens per turn (1,500 input / 500 output).

	\| Model (Provider) \| Input Cost (per 1k) \| Output Cost (per 1k) \| Estimated Total / 1,000 Queries \|
	\| :--- \| :--- \| :--- \| :--- \|
	\| DeepSeek Chat (OpenRouter) \| $0.021 \| $0.035 \| $0.056 \|
	\| Gemini 2.5 Flash (OpenRouter) \| $0.150 \| $0.200 \| $0.350 \|
	\| Kimi (Groq) \| Free Tier / $0 \| Free Tier / $0 \| $0.000 \|

	Note: Costs are based on current OpenRouter pricing (Jan 2025) and Groq's high-speed free tier. DeepSeek remains the most cost-effective provider for reasoning-intensive tasks.

	---

	## 🚀 Getting Started

	To enable the realtime assistant you need to install both the ChatKit Python package and the OpenAI SDK, then provide an `OPENAI_API_KEY` environment variable.

	```bash
	uv sync
	export OPENAI_API_KEY=sk-proj-...
	uv run uvicorn app.main:app --reload
	```