Spaces:

Abeshith
/

rag-chatbot

Sleeping

App Files Files Community

rag-chatbot / README.md

Abeshith

Simplify README with clear flow and user-friendly explanations

7c3a93a 5 months ago

preview code

raw

history blame contribute delete

4.74 kB

	---
	title: RAG Chatbot
	emoji: 🤖
	colorFrom: blue
	colorTo: purple
	sdk: docker
	app_port: 7860
	pinned: false
	---

	# RAG Chatbot with Advanced Retrieval

	A question-answering system that lets you upload documents and ask questions about them. The system retrieves relevant information from your documents and generates accurate answers.

	## How It Works

	### When You Upload a Document

	```
	1. Upload File (PDF/DOCX/TXT)
	↓
	2. Extract Text
	↓
	3. Split into Chunks (512 tokens each)
	↓
	4. Convert to Embeddings (384D vectors)
	↓
	5. Store in Vector Database (Qdrant)
	↓
	6. Save Metadata in MongoDB
	```

	What happens: Your document is broken into small chunks, each chunk is converted into a numerical vector that captures its meaning, and stored in a database for fast searching.

	### When You Ask a Question

	```
	1. Type Your Question
	↓
	2. Check Cache (answered before?)
	↓
	3. Search Documents (if RAG is ON)
	- BM25: Find keyword matches
	- Vector: Find similar meanings
	↓
	4. Rerank Results (pick top 5 most relevant)
	↓
	5. Build Context from Chunks
	↓
	6. Generate Answer with LLM
	↓
	7. Stream Response to You
	```

	What happens: The system searches for relevant chunks from your documents, combines them as context, and uses an AI model to generate an answer based on that context.

	## Key Components

	### Document Processing

	DocumentProcessor - Main coordinator for document uploads
	- Validates file type and size
	- Calls the right loader for PDF, DOCX, or TXT files
	- Manages the entire processing pipeline

	Embedder - Converts text to vectors
	- Uses FastEmbed with BAAI/bge-small-en-v1.5 model
	- Generates 384-dimensional vectors for semantic search
	- Each chunk becomes a searchable vector

	Qdrant Vector Store - Stores embeddings
	- Fast similarity search across millions of vectors
	- Returns most relevant chunks for any query
	- Handles all vector operations

	### Question Answering

	HybridRetriever - Finds relevant information
	- BM25: Traditional keyword search (good for exact matches)
	- Vector Search: Semantic search (understands meaning)
	- Combines both for better results

	Reranker - Improves search quality
	- Uses FlashRank model to score relevance
	- Filters the best 5 chunks from 20 candidates
	- Ensures only the most relevant context is used

	Generator - Creates answers
	- Uses Groq LLM (llama-3.1-70b)
	- Streams responses in real-time
	- Bases answers on retrieved context when RAG is ON
	- Uses general knowledge when RAG is OFF

	Semantic Cache - Speeds up responses
	- Remembers previous questions and answers
	- Returns cached response if same question asked again
	- Separate caches for RAG ON vs RAG OFF

	### Memory & Storage

	Conversation Memory - Remembers chat history
	- Stores last 10 messages in Redis
	- Enables follow-up questions
	- Each session has independent history

	MongoDB - Document metadata
	- Tracks uploaded documents
	- Stores file info, upload time, chunk count
	- Links to vectors in Qdrant

	Redis - Fast caching
	- Stores conversation history
	- Caches LLM responses
	- In-memory for instant access

	## Technology Stack

	- LangChain 0.3.13 - RAG framework
	- Groq API - Fast LLM (llama-3.1-70b)
	- FastEmbed - Embedding generation
	- FlashRank - Result reranking
	- Qdrant - Vector database
	- MongoDB - Document storage
	- Redis - Caching layer
	- FastAPI - Web framework

	## Quick Start

	### Installation

	```bash
	# Clone and install
	git clone https://github.com/Abeshith/RAG.git
	cd RAG
	pip install -r requirements.txt
	```

	### Configuration

	Create `.env` file:

	```env
	GROQ_API_KEY=your_groq_key
	MONGODB_URI=your_mongodb_uri
	REDIS_URL=your_redis_url
	QDRANT_URL=your_qdrant_url
	QDRANT_API_KEY=your_qdrant_key
	JWT_SECRET_KEY=your_secret_key
	```

	### Run

	```bash
	uvicorn app.main:app --host 0.0.0.0 --port 7860
	```

	Open: http://localhost:7860

	## Usage

	1. Upload Documents: Click upload, select PDF/DOCX/TXT file
	2. Ask Questions: Type question in chat box
	3. Toggle RAG:
	- ON = answers from your documents
	- OFF = general knowledge answers
	4. View Sources: See which document chunks were used

	## API Endpoints

	```
	GET /health/ - Check system status
	POST /chat/stream - Send question, get streaming answer
	POST /documents/upload - Upload new document
	GET /documents/ - List all documents
	GET /documents/stats - Get document statistics
	DELETE /documents/{id} - Delete specific document
	```

	## Docker Deployment

	```bash
	docker build -t rag-chatbot .
	docker run -p 7860:7860 --env-file .env rag-chatbot
	```