Spaces:
Sleeping
title: RAG Chatbot
emoji: π€
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
RAG Chatbot with Advanced Retrieval
A question-answering system that lets you upload documents and ask questions about them. The system retrieves relevant information from your documents and generates accurate answers.
How It Works
When You Upload a Document
1. Upload File (PDF/DOCX/TXT)
β
2. Extract Text
β
3. Split into Chunks (512 tokens each)
β
4. Convert to Embeddings (384D vectors)
β
5. Store in Vector Database (Qdrant)
β
6. Save Metadata in MongoDB
What happens: Your document is broken into small chunks, each chunk is converted into a numerical vector that captures its meaning, and stored in a database for fast searching.
When You Ask a Question
1. Type Your Question
β
2. Check Cache (answered before?)
β
3. Search Documents (if RAG is ON)
- BM25: Find keyword matches
- Vector: Find similar meanings
β
4. Rerank Results (pick top 5 most relevant)
β
5. Build Context from Chunks
β
6. Generate Answer with LLM
β
7. Stream Response to You
What happens: The system searches for relevant chunks from your documents, combines them as context, and uses an AI model to generate an answer based on that context.
Key Components
Document Processing
DocumentProcessor - Main coordinator for document uploads
- Validates file type and size
- Calls the right loader for PDF, DOCX, or TXT files
- Manages the entire processing pipeline
Embedder - Converts text to vectors
- Uses FastEmbed with BAAI/bge-small-en-v1.5 model
- Generates 384-dimensional vectors for semantic search
- Each chunk becomes a searchable vector
Qdrant Vector Store - Stores embeddings
- Fast similarity search across millions of vectors
- Returns most relevant chunks for any query
- Handles all vector operations
Question Answering
HybridRetriever - Finds relevant information
- BM25: Traditional keyword search (good for exact matches)
- Vector Search: Semantic search (understands meaning)
- Combines both for better results
Reranker - Improves search quality
- Uses FlashRank model to score relevance
- Filters the best 5 chunks from 20 candidates
- Ensures only the most relevant context is used
Generator - Creates answers
- Uses Groq LLM (llama-3.1-70b)
- Streams responses in real-time
- Bases answers on retrieved context when RAG is ON
- Uses general knowledge when RAG is OFF
Semantic Cache - Speeds up responses
- Remembers previous questions and answers
- Returns cached response if same question asked again
- Separate caches for RAG ON vs RAG OFF
Memory & Storage
Conversation Memory - Remembers chat history
- Stores last 10 messages in Redis
- Enables follow-up questions
- Each session has independent history
MongoDB - Document metadata
- Tracks uploaded documents
- Stores file info, upload time, chunk count
- Links to vectors in Qdrant
Redis - Fast caching
- Stores conversation history
- Caches LLM responses
- In-memory for instant access
Technology Stack
- LangChain 0.3.13 - RAG framework
- Groq API - Fast LLM (llama-3.1-70b)
- FastEmbed - Embedding generation
- FlashRank - Result reranking
- Qdrant - Vector database
- MongoDB - Document storage
- Redis - Caching layer
- FastAPI - Web framework
Quick Start
Installation
# Clone and install
git clone https://github.com/Abeshith/RAG.git
cd RAG
pip install -r requirements.txt
Configuration
Create .env file:
GROQ_API_KEY=your_groq_key
MONGODB_URI=your_mongodb_uri
REDIS_URL=your_redis_url
QDRANT_URL=your_qdrant_url
QDRANT_API_KEY=your_qdrant_key
JWT_SECRET_KEY=your_secret_key
Run
uvicorn app.main:app --host 0.0.0.0 --port 7860
Open: http://localhost:7860
Usage
- Upload Documents: Click upload, select PDF/DOCX/TXT file
- Ask Questions: Type question in chat box
- Toggle RAG:
- ON = answers from your documents
- OFF = general knowledge answers
- View Sources: See which document chunks were used
API Endpoints
GET /health/ - Check system status
POST /chat/stream - Send question, get streaming answer
POST /documents/upload - Upload new document
GET /documents/ - List all documents
GET /documents/stats - Get document statistics
DELETE /documents/{id} - Delete specific document
Docker Deployment
docker build -t rag-chatbot .
docker run -p 7860:7860 --env-file .env rag-chatbot