rag-chatbot / README.md
Abeshith's picture
Simplify README with clear flow and user-friendly explanations
7c3a93a
metadata
title: RAG Chatbot
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false

RAG Chatbot with Advanced Retrieval

A question-answering system that lets you upload documents and ask questions about them. The system retrieves relevant information from your documents and generates accurate answers.

How It Works

When You Upload a Document

1. Upload File (PDF/DOCX/TXT)
        ↓
2. Extract Text
        ↓
3. Split into Chunks (512 tokens each)
        ↓
4. Convert to Embeddings (384D vectors)
        ↓
5. Store in Vector Database (Qdrant)
        ↓
6. Save Metadata in MongoDB

What happens: Your document is broken into small chunks, each chunk is converted into a numerical vector that captures its meaning, and stored in a database for fast searching.

When You Ask a Question

1. Type Your Question
        ↓
2. Check Cache (answered before?)
        ↓
3. Search Documents (if RAG is ON)
   - BM25: Find keyword matches
   - Vector: Find similar meanings
        ↓
4. Rerank Results (pick top 5 most relevant)
        ↓
5. Build Context from Chunks
        ↓
6. Generate Answer with LLM
        ↓
7. Stream Response to You

What happens: The system searches for relevant chunks from your documents, combines them as context, and uses an AI model to generate an answer based on that context.

Key Components

Document Processing

DocumentProcessor - Main coordinator for document uploads

  • Validates file type and size
  • Calls the right loader for PDF, DOCX, or TXT files
  • Manages the entire processing pipeline

Embedder - Converts text to vectors

  • Uses FastEmbed with BAAI/bge-small-en-v1.5 model
  • Generates 384-dimensional vectors for semantic search
  • Each chunk becomes a searchable vector

Qdrant Vector Store - Stores embeddings

  • Fast similarity search across millions of vectors
  • Returns most relevant chunks for any query
  • Handles all vector operations

Question Answering

HybridRetriever - Finds relevant information

  • BM25: Traditional keyword search (good for exact matches)
  • Vector Search: Semantic search (understands meaning)
  • Combines both for better results

Reranker - Improves search quality

  • Uses FlashRank model to score relevance
  • Filters the best 5 chunks from 20 candidates
  • Ensures only the most relevant context is used

Generator - Creates answers

  • Uses Groq LLM (llama-3.1-70b)
  • Streams responses in real-time
  • Bases answers on retrieved context when RAG is ON
  • Uses general knowledge when RAG is OFF

Semantic Cache - Speeds up responses

  • Remembers previous questions and answers
  • Returns cached response if same question asked again
  • Separate caches for RAG ON vs RAG OFF

Memory & Storage

Conversation Memory - Remembers chat history

  • Stores last 10 messages in Redis
  • Enables follow-up questions
  • Each session has independent history

MongoDB - Document metadata

  • Tracks uploaded documents
  • Stores file info, upload time, chunk count
  • Links to vectors in Qdrant

Redis - Fast caching

  • Stores conversation history
  • Caches LLM responses
  • In-memory for instant access

Technology Stack

  • LangChain 0.3.13 - RAG framework
  • Groq API - Fast LLM (llama-3.1-70b)
  • FastEmbed - Embedding generation
  • FlashRank - Result reranking
  • Qdrant - Vector database
  • MongoDB - Document storage
  • Redis - Caching layer
  • FastAPI - Web framework

Quick Start

Installation

# Clone and install
git clone https://github.com/Abeshith/RAG.git
cd RAG
pip install -r requirements.txt

Configuration

Create .env file:

GROQ_API_KEY=your_groq_key
MONGODB_URI=your_mongodb_uri
REDIS_URL=your_redis_url
QDRANT_URL=your_qdrant_url
QDRANT_API_KEY=your_qdrant_key
JWT_SECRET_KEY=your_secret_key

Run

uvicorn app.main:app --host 0.0.0.0 --port 7860

Open: http://localhost:7860

Usage

  1. Upload Documents: Click upload, select PDF/DOCX/TXT file
  2. Ask Questions: Type question in chat box
  3. Toggle RAG:
    • ON = answers from your documents
    • OFF = general knowledge answers
  4. View Sources: See which document chunks were used

API Endpoints

GET  /health/                    - Check system status
POST /chat/stream                - Send question, get streaming answer
POST /documents/upload           - Upload new document
GET  /documents/                 - List all documents
GET  /documents/stats            - Get document statistics
DELETE /documents/{id}           - Delete specific document

Docker Deployment

docker build -t rag-chatbot .
docker run -p 7860:7860 --env-file .env rag-chatbot