Spaces:
Sleeping
Sleeping
KUNAL SHAW
Initial commit: RAG Chatbot for Agentic AI eBook with LangGraph, Pinecone, and Groq
f9c215a | # Architecture Overview | |
| This document explains the architecture of the RAG (Retrieval-Augmented Generation) chatbot for the Agentic AI eBook. | |
| ## System Overview | |
| The system follows a standard RAG pattern: documents are chunked and embedded into a vector database during ingestion, then at query time, relevant chunks are retrieved and used to generate grounded answers. | |
| ### Key Components | |
| 1. **Ingestion Pipeline** (`app/ingest.py`) - Processes the PDF, creates chunks, generates embeddings, and stores in Pinecone | |
| 2. **Vector Store** (`app/vectorstore.py`) - Wrapper around Pinecone for storing and retrieving vectors | |
| 3. **RAG Pipeline** (`app/rag_pipeline.py`) - LangGraph-based pipeline for query processing | |
| 4. **Streamlit UI** (`streamlit_app/app.py`) - Web interface for user interactions | |
| --- | |
| ## Architecture Diagram | |
| ``` | |
| INGESTION FLOW | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β β | |
| β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββββββββββ β | |
| β β PDF βββββΆβ Extract βββββΆβ Clean βββββΆβ Chunk β β | |
| β β File β β Text β β Text β β (500 tokens, β β | |
| β β β β by Page β β β β 50 overlap) β β | |
| β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββ¬ββββββββββ β | |
| β β β | |
| β βΌ β | |
| β ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ β | |
| β β Pinecone ββββββ Upsert ββββββ Embeddings β β | |
| β β Vector Store β β Vectors β β (MiniLM-L6-v2) β β | |
| β β β β β β 384 dims β β | |
| β ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ β | |
| β β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| QUERY FLOW | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β β | |
| β ββββββββββββ β | |
| β β User β β | |
| β β Query β β | |
| β ββββββ¬ββββββ β | |
| β β β | |
| β βΌ β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β LANGGRAPH PIPELINE β β | |
| β β β β | |
| β β βββββββββββββββ βββββββββββββββ βββββββββββββββ β β | |
| β β β Embed ββββΆβ Retrieve ββββΆβ Calculate β β β | |
| β β β Query β β Top-K β β Confidence β β β | |
| β β β β β Chunks β β β β β | |
| β β βββββββββββββββ ββββββββ¬βββββββ ββββββββ¬βββββββ β β | |
| β β β β β β | |
| β β βΌ βΌ β β | |
| β β βββββββββββββββββββββββββββββββ β β | |
| β β β Generate Answer β β β | |
| β β β β β β | |
| β β β βββββββββββββββββββββββ β β β | |
| β β β β If OpenAI Key: β β β β | |
| β β β β β LLM Generation β β β β | |
| β β β β (grounded prompt) β β β β | |
| β β β βββββββββββββββββββββββ€ β β β | |
| β β β β Else: β β β β | |
| β β β β β Extractive Mode β β β β | |
| β β β β (return chunks) β β β β | |
| β β β βββββββββββββββββββββββ β β β | |
| β β βββββββββββββββ¬ββββββββββββββββ β β | |
| β β β β β | |
| β ββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββ β | |
| β β β | |
| β βΌ β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β RESPONSE β β | |
| β β { β β | |
| β β "final_answer": "...", β β | |
| β β "retrieved_chunks": [...], β β | |
| β β "confidence": 0.92 β β | |
| β β } β β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β β | |
| β βΌ β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β STREAMLIT UI β β | |
| β β ββββββββββββββββββββ βββββββββββββββββββββββββββββββββββββ β β | |
| β β β Chat Interface β β Retrieved Chunks Panel β β β | |
| β β β - Question box β β - Chunk text β β β | |
| β β β - Answer card β β - Page numbers β β β | |
| β β β - Confidence β β - Relevance scores β β β | |
| β β ββββββββββββββββββββ βββββββββββββββββββββββββββββββββββββ β β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| --- | |
| ## Design Decisions | |
| ### 1. Chunking Strategy | |
| We use **500 tokens** as the target chunk size with **50-100 token overlap**. This provides: | |
| - Enough context for meaningful retrieval | |
| - Overlap ensures important information spanning chunk boundaries isn't lost | |
| - Token counting via tiktoken ensures consistent chunk sizes across different text densities | |
| **Chunk ID Format**: `pdfpage_{page}_chunk_{index}` - This makes it easy to trace retrieved content back to the source PDF page for verification. | |
| ### 2. Embedding Model Choice | |
| We use **sentence-transformers/all-MiniLM-L6-v2**: | |
| - Open source and free (no API costs) | |
| - Small model (384 dimensions) = fast inference and lower storage costs | |
| - Good quality for semantic similarity tasks | |
| - Can run entirely on CPU | |
| Trade-off: Larger models like OpenAI's ada-002 (1536 dims) may provide better retrieval quality, but MiniLM offers excellent cost/performance ratio for this use case. | |
| ### 3. LangGraph Pipeline | |
| The RAG pipeline uses LangGraph for orchestration because: | |
| - Clear separation of pipeline stages (embed β retrieve β generate) | |
| - Easy to add/modify nodes (e.g., reranking, query expansion) | |
| - Built-in state management | |
| - Aligns with modern LLM application patterns | |
| ### 4. Dual-Mode Answer Generation | |
| The system supports two modes: | |
| **LLM Generation Mode** (with OpenAI key): | |
| - Uses GPT-3.5-turbo for natural language generation | |
| - System prompt strictly instructs the model to only use provided chunks | |
| - Produces more readable, synthesized answers | |
| **Extractive Fallback Mode** (no API key): | |
| - Returns relevant chunks directly with minimal formatting | |
| - Always works, even offline | |
| - Ensures the app is functional without paid APIs | |
| This design choice ensures the application is **always functional** regardless of API availability. | |
| ### 5. Confidence Score Computation | |
| Confidence is computed from retrieval similarity scores: | |
| ```python | |
| # Normalize cosine similarity from [-1, 1] to [0, 1] | |
| normalized = (score + 1) / 2 | |
| # Use maximum normalized score as confidence | |
| confidence = max(normalized_scores) | |
| ``` | |
| This gives users an intuitive sense of how well the retrieved chunks match their query. | |
| --- | |
| ## File Structure | |
| ``` | |
| rag-eAgenticAI/ | |
| βββ app/ | |
| β βββ __init__.py # Package exports | |
| β βββ ingest.py # PDF β chunks β embeddings β Pinecone | |
| β βββ vectorstore.py # Pinecone wrapper (create, upsert, query) | |
| β βββ rag_pipeline.py # LangGraph pipeline + answer generation | |
| β βββ utils.py # Chunking, cleaning, confidence calculation | |
| β | |
| βββ streamlit_app/ | |
| β βββ app.py # Main Streamlit application | |
| β βββ assets/ # Static assets (images, CSS) | |
| β | |
| βββ samples/ | |
| β βββ sample_queries.txt # Example questions to test | |
| β βββ expected_responses.md # Expected JSON response format | |
| β | |
| βββ infra/ | |
| β βββ hf_space_readme_template.md # Hugging Face Spaces config | |
| β | |
| βββ data/ # PDF files and generated chunks (gitignored) | |
| β | |
| βββ README.md # Main documentation | |
| βββ architecture.md # This file | |
| βββ requirements.txt # Python dependencies | |
| βββ LICENSE # MIT License | |
| βββ .gitignore # Git ignore rules | |
| ``` | |
| --- | |
| ## Data Flow Summary | |
| 1. **Ingestion** (run once): | |
| - PDF β pdfplumber β raw text by page | |
| - Text β clean_text() β cleaned text | |
| - Cleaned text β chunk_text() β chunks with metadata | |
| - Chunks β SentenceTransformer β embeddings | |
| - Embeddings β Pinecone upsert β stored vectors | |
| 2. **Query** (each user question): | |
| - Question β SentenceTransformer β query embedding | |
| - Query embedding β Pinecone query β top-k chunks | |
| - Chunks + scores β compute_confidence() β confidence score | |
| - Chunks + question β LLM/extractive β final answer | |
| - Answer + chunks + confidence β JSON response β Streamlit UI | |