Spaces:

Abeshith
/

rag-chatbot

Sleeping

App Files Files Community

rag-chatbot / README.md

Abeshith

Simplify README with clear flow and user-friendly explanations

7c3a93a 5 months ago

preview code

raw

history blame contribute delete

4.74 kB

metadata

title: RAG Chatbot
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false

RAG Chatbot with Advanced Retrieval

A question-answering system that lets you upload documents and ask questions about them. The system retrieves relevant information from your documents and generates accurate answers.

How It Works

When You Upload a Document

1. Upload File (PDF/DOCX/TXT)
        ↓
2. Extract Text
        ↓
3. Split into Chunks (512 tokens each)
        ↓
4. Convert to Embeddings (384D vectors)
        ↓
5. Store in Vector Database (Qdrant)
        ↓
6. Save Metadata in MongoDB

What happens: Your document is broken into small chunks, each chunk is converted into a numerical vector that captures its meaning, and stored in a database for fast searching.

When You Ask a Question

1. Type Your Question
        ↓
2. Check Cache (answered before?)
        ↓
3. Search Documents (if RAG is ON)
   - BM25: Find keyword matches
   - Vector: Find similar meanings
        ↓
4. Rerank Results (pick top 5 most relevant)
        ↓
5. Build Context from Chunks
        ↓
6. Generate Answer with LLM
        ↓
7. Stream Response to You

What happens: The system searches for relevant chunks from your documents, combines them as context, and uses an AI model to generate an answer based on that context.

Key Components

Document Processing

DocumentProcessor - Main coordinator for document uploads

Validates file type and size
Calls the right loader for PDF, DOCX, or TXT files
Manages the entire processing pipeline

Embedder - Converts text to vectors

Uses FastEmbed with BAAI/bge-small-en-v1.5 model
Generates 384-dimensional vectors for semantic search
Each chunk becomes a searchable vector

Qdrant Vector Store - Stores embeddings

Fast similarity search across millions of vectors
Returns most relevant chunks for any query
Handles all vector operations

Question Answering

HybridRetriever - Finds relevant information

BM25: Traditional keyword search (good for exact matches)
Vector Search: Semantic search (understands meaning)
Combines both for better results

Reranker - Improves search quality

Uses FlashRank model to score relevance
Filters the best 5 chunks from 20 candidates
Ensures only the most relevant context is used

Generator - Creates answers

Uses Groq LLM (llama-3.1-70b)
Streams responses in real-time
Bases answers on retrieved context when RAG is ON
Uses general knowledge when RAG is OFF

Semantic Cache - Speeds up responses

Remembers previous questions and answers
Returns cached response if same question asked again
Separate caches for RAG ON vs RAG OFF

Memory & Storage

Conversation Memory - Remembers chat history

Stores last 10 messages in Redis
Enables follow-up questions
Each session has independent history

MongoDB - Document metadata

Tracks uploaded documents
Stores file info, upload time, chunk count
Links to vectors in Qdrant

Redis - Fast caching

Stores conversation history
Caches LLM responses
In-memory for instant access

Technology Stack

LangChain 0.3.13 - RAG framework
Groq API - Fast LLM (llama-3.1-70b)
FastEmbed - Embedding generation
FlashRank - Result reranking
Qdrant - Vector database
MongoDB - Document storage
Redis - Caching layer
FastAPI - Web framework

Quick Start

Installation

# Clone and install
git clone https://github.com/Abeshith/RAG.git
cd RAG
pip install -r requirements.txt

Configuration

Create .env file:

GROQ_API_KEY=your_groq_key
MONGODB_URI=your_mongodb_uri
REDIS_URL=your_redis_url
QDRANT_URL=your_qdrant_url
QDRANT_API_KEY=your_qdrant_key
JWT_SECRET_KEY=your_secret_key

Run

uvicorn app.main:app --host 0.0.0.0 --port 7860

Open: http://localhost:7860

Usage

Upload Documents: Click upload, select PDF/DOCX/TXT file
Ask Questions: Type question in chat box
Toggle RAG:
- ON = answers from your documents
- OFF = general knowledge answers
View Sources: See which document chunks were used

API Endpoints

GET  /health/                    - Check system status
POST /chat/stream                - Send question, get streaming answer
POST /documents/upload           - Upload new document
GET  /documents/                 - List all documents
GET  /documents/stats            - Get document statistics
DELETE /documents/{id}           - Delete specific document

Docker Deployment

docker build -t rag-chatbot .
docker run -p 7860:7860 --env-file .env rag-chatbot