Spaces:
Running
Running
π€ Chatbot Architecture Overview: Krishna's Personal AI Assistant
This document details the architecture of Krishna Vamsi Dhulipallaβs personal AI assistant, implemented with LangGraph for orchestrated state management and tool execution. The system is designed for retrieval-augmented, memory-grounded, and multi-turn conversational intelligence, integrating OpenAI GPT-4o, Hugging Face embeddings, and cross-encoder reranking.
π§± Core Components
1. Models & Their Roles
| Purpose | Model Name | Role Description |
|---|---|---|
| Main Chat Model | gpt-4o |
Handles conversation, tool calls, and reasoning |
| Retriever Embeddings | sentence-transformers/all-MiniLM-L6-v2 |
Embedding generation for FAISS vector search |
| Cross-Encoder Reranker | cross-encoder/ms-marco-MiniLM-L-6-v2 |
Reranks retrieval results for semantic relevance |
| BM25 Retriever | (LangChain BM25Retriever) | Keyword-based search complementing vector search |
All models are bound to LangGraph StateGraph nodes for structured execution.
π Retrieval System
β Hybrid Retrieval
- FAISS Vector Search with normalized embeddings
- BM25Retriever for lexical keyword matching
- Combined using Reciprocal Rank Fusion (RRF)
π Reranking & Diversity
- Initial retrieval with FAISS & BM25 (top-K per retriever)
- Fusion via RRF scoring
- Cross-Encoder reranking (top-N candidates)
- Maximal Marginal Relevance (MMR) selection for diversity
π Retriever Tool (@tool retriever)
- Returns top passages with minimal duplication
- Used in-system prompt to fetch accurate facts about Krishna
π§ Memory System
Long-Term Memory
- FAISS-based memory vector store stored at
backend/data/memory_faiss - Stores conversation summaries per thread ID
Memory Search Tool (@tool memory_search)
- Retrieves relevant conversation snippets by semantic similarity
- Supports thread-scoped search for contextual continuity
Memory Write Node
- After each AI response, stores
[Q]: ... [A]: ...summary - Autosaves after every
MEM_AUTOSAVE_EVERYturns or on thread end
π§ Orchestration Flow (LangGraph)
graph TD
A[START] --> B[agent node]
B -->|tool call| C[tools node]
B -->|no tool| D[memory_write]
C --> B
D --> E[END]
Nodes:
- agent: Calls main LLM with conversation window + system prompt
- tools: Executes retriever or memory search tools
- memory_write: Persists summaries to long-term memory
Conditional Edges:
- From agent β
toolsif tool call detected - From agent β
memory_writeif no tool call
π¬ System Prompt
The assistant:
- Uses retriever and memory search tools to gather facts about Krishna
- Avoids fabrication and requests clarification when needed
- Responds humorously when off-topic but steers back to Krishnaβs expertise
- Formats with Markdown, headings, and bullet points
Embedded Krishnaβs Bio provides static grounding context.
π API & Streaming
- Backend: FastAPI (
backend/api.py)/chatSSE endpoint streams tokens in real-time- Passes
thread_id&is_finalto LangGraph for stateful conversations
- Frontend: React + Tailwind (custom chat UI)
- Threaded conversation storage in browser
localStorage - Real-time token rendering via
EventSource - Features: new chat, clear chat, delete thread, suggestions
- Threaded conversation storage in browser
π§© Design Improvements
- LangGraph StateGraph ensures explicit control of message flow
- Thread-scoped memory enables multi-session personalization
- Hybrid RRF + Cross-Encoder + MMR retrieval pipeline improves relevance & diversity
- SSE streaming for low-latency feedback
- Decoupled retrieval and memory as separate tools for modularity