Spaces:
Running
Running
| # 🤖 Chatbot Architecture Overview: Krishna's Personal AI Assistant | |
| This document details the architecture of **Krishna Vamsi Dhulipalla’s** personal AI assistant, implemented with **LangGraph** for orchestrated state management and tool execution. The system is designed for **retrieval-augmented, memory-grounded, and multi-turn conversational intelligence**, integrating **OpenAI GPT-4o**, **Hugging Face embeddings**, and **cross-encoder reranking**. | |
| --- | |
| ## 🧱 Core Components | |
| ### 1. **Models & Their Roles** | |
| | Purpose | Model Name | Role Description | | |
| | -------------------------- | ---------------------------------------- | ------------------------------------------------ | | |
| | **Main Chat Model** | `gpt-4o` | Handles conversation, tool calls, and reasoning | | |
| | **Retriever Embeddings** | `sentence-transformers/all-MiniLM-L6-v2` | Embedding generation for FAISS vector search | | |
| | **Cross-Encoder Reranker** | `cross-encoder/ms-marco-MiniLM-L-6-v2` | Reranks retrieval results for semantic relevance | | |
| | **BM25 Retriever** | (LangChain BM25Retriever) | Keyword-based search complementing vector search | | |
| All models are bound to LangGraph **StateGraph** nodes for structured execution. | |
| --- | |
| ## 🔍 Retrieval System | |
| ### ✅ **Hybrid Retrieval** | |
| - **FAISS Vector Search** with normalized embeddings | |
| - **BM25Retriever** for lexical keyword matching | |
| - Combined using **Reciprocal Rank Fusion (RRF)** | |
| ### 📊 **Reranking & Diversity** | |
| 1. Initial retrieval with FAISS & BM25 (top-K per retriever) | |
| 2. Fusion via RRF scoring | |
| 3. **Cross-Encoder reranking** (top-N candidates) | |
| 4. **Maximal Marginal Relevance (MMR)** selection for diversity | |
| ### 🔎 Retriever Tool (`@tool retriever`) | |
| - Returns top passages with minimal duplication | |
| - Used in-system prompt to fetch accurate facts about Krishna | |
| --- | |
| ## 🧠 Memory System | |
| ### Long-Term Memory | |
| - **FAISS-based memory vector store** stored at `backend/data/memory_faiss` | |
| - Stores conversation summaries per thread ID | |
| ### Memory Search Tool (`@tool memory_search`) | |
| - Retrieves relevant conversation snippets by semantic similarity | |
| - Supports **thread-scoped** search for contextual continuity | |
| ### Memory Write Node | |
| - After each AI response, stores `[Q]: ... [A]: ...` summary | |
| - Autosaves after every `MEM_AUTOSAVE_EVERY` turns or on thread end | |
| --- | |
| ## 🧭 Orchestration Flow (LangGraph) | |
| ```mermaid | |
| graph TD | |
| A[START] --> B[agent node] | |
| B -->|tool call| C[tools node] | |
| B -->|no tool| D[memory_write] | |
| C --> B | |
| D --> E[END] | |
| ``` | |
| ### **Nodes**: | |
| - **agent**: Calls main LLM with conversation window + system prompt | |
| - **tools**: Executes retriever or memory search tools | |
| - **memory_write**: Persists summaries to long-term memory | |
| ### **Conditional Edges**: | |
| - From **agent** → `tools` if tool call detected | |
| - From **agent** → `memory_write` if no tool call | |
| --- | |
| ## 💬 System Prompt | |
| The assistant: | |
| - Uses retriever and memory search tools to gather facts about Krishna | |
| - Avoids fabrication and requests clarification when needed | |
| - Responds humorously when off-topic but steers back to Krishna’s expertise | |
| - Formats with Markdown, headings, and bullet points | |
| Embedded **Krishna’s Bio** provides static grounding context. | |
| --- | |
| ## 🌐 API & Streaming | |
| - **Backend**: FastAPI (`backend/api.py`) | |
| - `/chat` SSE endpoint streams tokens in real-time | |
| - Passes `thread_id` & `is_final` to LangGraph for stateful conversations | |
| - **Frontend**: React + Tailwind (custom chat UI) | |
| - Threaded conversation storage in browser `localStorage` | |
| - Real-time token rendering via `EventSource` | |
| - Features: new chat, clear chat, delete thread, suggestions | |
| --- | |
| ## 🧩 Design Improvements | |
| - **LangGraph StateGraph** ensures explicit control of message flow | |
| - **Thread-scoped memory** enables multi-session personalization | |
| - **Hybrid RRF + Cross-Encoder + MMR** retrieval pipeline improves relevance & diversity | |
| - **SSE streaming** for low-latency feedback | |
| - **Decoupled retrieval** and **memory** as separate tools for modularity | |