Spaces:

krishnadhulipalla
/

ChatBot

Running

App Files Files Community

ChatBot / personal_data /Chatbot_Architecture_Notes.md

krishnadhulipalla

Updated UI & personal data

102dac3 about 1 month ago

preview code

raw

history blame contribute delete

4.17 kB

🤖 Chatbot Architecture Overview: Krishna's Personal AI Assistant

This document details the architecture of Krishna Vamsi Dhulipalla’s personal AI assistant, implemented with LangGraph for orchestrated state management and tool execution. The system is designed for retrieval-augmented, memory-grounded, and multi-turn conversational intelligence, integrating OpenAI GPT-4o, Hugging Face embeddings, and cross-encoder reranking.

🧱 Core Components

1. Models & Their Roles

Purpose	Model Name	Role Description
Main Chat Model	`gpt-4o`	Handles conversation, tool calls, and reasoning
Retriever Embeddings	`sentence-transformers/all-MiniLM-L6-v2`	Embedding generation for FAISS vector search
Cross-Encoder Reranker	`cross-encoder/ms-marco-MiniLM-L-6-v2`	Reranks retrieval results for semantic relevance
BM25 Retriever	(LangChain BM25Retriever)	Keyword-based search complementing vector search

All models are bound to LangGraph StateGraph nodes for structured execution.

🔍 Retrieval System

✅ Hybrid Retrieval

FAISS Vector Search with normalized embeddings
BM25Retriever for lexical keyword matching
Combined using Reciprocal Rank Fusion (RRF)

📊 Reranking & Diversity

Initial retrieval with FAISS & BM25 (top-K per retriever)
Fusion via RRF scoring
Cross-Encoder reranking (top-N candidates)
Maximal Marginal Relevance (MMR) selection for diversity

🔎 Retriever Tool (`@tool retriever`)

Returns top passages with minimal duplication
Used in-system prompt to fetch accurate facts about Krishna

🧠 Memory System

Long-Term Memory

FAISS-based memory vector store stored at backend/data/memory_faiss
Stores conversation summaries per thread ID

Memory Search Tool (`@tool memory_search`)

Retrieves relevant conversation snippets by semantic similarity
Supports thread-scoped search for contextual continuity

Memory Write Node

After each AI response, stores [Q]: ... [A]: ... summary
Autosaves after every MEM_AUTOSAVE_EVERY turns or on thread end

🧭 Orchestration Flow (LangGraph)

graph TD
    A[START] --> B[agent node]
    B -->|tool call| C[tools node]
    B -->|no tool| D[memory_write]
    C --> B
    D --> E[END]

Nodes:

agent: Calls main LLM with conversation window + system prompt
tools: Executes retriever or memory search tools
memory_write: Persists summaries to long-term memory

Conditional Edges:

From agent → tools if tool call detected
From agent → memory_write if no tool call

💬 System Prompt

The assistant:

Uses retriever and memory search tools to gather facts about Krishna
Avoids fabrication and requests clarification when needed
Responds humorously when off-topic but steers back to Krishna’s expertise
Formats with Markdown, headings, and bullet points

Embedded Krishna’s Bio provides static grounding context.

🌐 API & Streaming

Backend: FastAPI (backend/api.py)
- /chat SSE endpoint streams tokens in real-time
- Passes thread_id & is_final to LangGraph for stateful conversations
Frontend: React + Tailwind (custom chat UI)
- Threaded conversation storage in browser localStorage
- Real-time token rendering via EventSource
- Features: new chat, clear chat, delete thread, suggestions

🧩 Design Improvements

LangGraph StateGraph ensures explicit control of message flow
Thread-scoped memory enables multi-session personalization
Hybrid RRF + Cross-Encoder + MMR retrieval pipeline improves relevance & diversity
SSE streaming for low-latency feedback
Decoupled retrieval and memory as separate tools for modularity