Spaces:
Running
Running
File size: 4,170 Bytes
102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
# 🤖 Chatbot Architecture Overview: Krishna's Personal AI Assistant
This document details the architecture of **Krishna Vamsi Dhulipalla’s** personal AI assistant, implemented with **LangGraph** for orchestrated state management and tool execution. The system is designed for **retrieval-augmented, memory-grounded, and multi-turn conversational intelligence**, integrating **OpenAI GPT-4o**, **Hugging Face embeddings**, and **cross-encoder reranking**.
---
## 🧱 Core Components
### 1. **Models & Their Roles**
| Purpose | Model Name | Role Description |
| -------------------------- | ---------------------------------------- | ------------------------------------------------ |
| **Main Chat Model** | `gpt-4o` | Handles conversation, tool calls, and reasoning |
| **Retriever Embeddings** | `sentence-transformers/all-MiniLM-L6-v2` | Embedding generation for FAISS vector search |
| **Cross-Encoder Reranker** | `cross-encoder/ms-marco-MiniLM-L-6-v2` | Reranks retrieval results for semantic relevance |
| **BM25 Retriever** | (LangChain BM25Retriever) | Keyword-based search complementing vector search |
All models are bound to LangGraph **StateGraph** nodes for structured execution.
---
## 🔍 Retrieval System
### ✅ **Hybrid Retrieval**
- **FAISS Vector Search** with normalized embeddings
- **BM25Retriever** for lexical keyword matching
- Combined using **Reciprocal Rank Fusion (RRF)**
### 📊 **Reranking & Diversity**
1. Initial retrieval with FAISS & BM25 (top-K per retriever)
2. Fusion via RRF scoring
3. **Cross-Encoder reranking** (top-N candidates)
4. **Maximal Marginal Relevance (MMR)** selection for diversity
### 🔎 Retriever Tool (`@tool retriever`)
- Returns top passages with minimal duplication
- Used in-system prompt to fetch accurate facts about Krishna
---
## 🧠 Memory System
### Long-Term Memory
- **FAISS-based memory vector store** stored at `backend/data/memory_faiss`
- Stores conversation summaries per thread ID
### Memory Search Tool (`@tool memory_search`)
- Retrieves relevant conversation snippets by semantic similarity
- Supports **thread-scoped** search for contextual continuity
### Memory Write Node
- After each AI response, stores `[Q]: ... [A]: ...` summary
- Autosaves after every `MEM_AUTOSAVE_EVERY` turns or on thread end
---
## 🧭 Orchestration Flow (LangGraph)
```mermaid
graph TD
A[START] --> B[agent node]
B -->|tool call| C[tools node]
B -->|no tool| D[memory_write]
C --> B
D --> E[END]
```
### **Nodes**:
- **agent**: Calls main LLM with conversation window + system prompt
- **tools**: Executes retriever or memory search tools
- **memory_write**: Persists summaries to long-term memory
### **Conditional Edges**:
- From **agent** → `tools` if tool call detected
- From **agent** → `memory_write` if no tool call
---
## 💬 System Prompt
The assistant:
- Uses retriever and memory search tools to gather facts about Krishna
- Avoids fabrication and requests clarification when needed
- Responds humorously when off-topic but steers back to Krishna’s expertise
- Formats with Markdown, headings, and bullet points
Embedded **Krishna’s Bio** provides static grounding context.
---
## 🌐 API & Streaming
- **Backend**: FastAPI (`backend/api.py`)
- `/chat` SSE endpoint streams tokens in real-time
- Passes `thread_id` & `is_final` to LangGraph for stateful conversations
- **Frontend**: React + Tailwind (custom chat UI)
- Threaded conversation storage in browser `localStorage`
- Real-time token rendering via `EventSource`
- Features: new chat, clear chat, delete thread, suggestions
---
## 🧩 Design Improvements
- **LangGraph StateGraph** ensures explicit control of message flow
- **Thread-scoped memory** enables multi-session personalization
- **Hybrid RRF + Cross-Encoder + MMR** retrieval pipeline improves relevance & diversity
- **SSE streaming** for low-latency feedback
- **Decoupled retrieval** and **memory** as separate tools for modularity
|