Spaces:

Abeshith
/

rag-chatbot

Sleeping

File size: 4,735 Bytes

cad96c2
 
 
 
 
 
 
 
 
 
64d7fdf
 
7c3a93a
64d7fdf
7c3a93a
64d7fdf
7c3a93a
64d7fdf
 
7c3a93a
 
 
 
 
 
 
 
 
 
 
64d7fdf
 
7c3a93a
64d7fdf
7c3a93a
64d7fdf
 
7c3a93a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64d7fdf
 
7c3a93a
64d7fdf
7c3a93a
64d7fdf
7c3a93a
64d7fdf
7c3a93a
 
 
 
64d7fdf
7c3a93a
 
 
 
64d7fdf
7c3a93a
 
 
 
64d7fdf
7c3a93a
64d7fdf
7c3a93a
 
 
 
64d7fdf
7c3a93a
 
 
 
64d7fdf
7c3a93a
 
 
 
 
64d7fdf
7c3a93a
 
 
 
64d7fdf
7c3a93a
64d7fdf
7c3a93a
 
 
 
64d7fdf
7c3a93a
 
 
 
64d7fdf
7c3a93a
 
 
 
64d7fdf
7c3a93a
64d7fdf
7c3a93a
 
 
 
 
 
 
 
64d7fdf
7c3a93a
64d7fdf
7c3a93a
64d7fdf
7c3a93a
 
 
 
 
64d7fdf
 
7c3a93a
64d7fdf
7c3a93a
64d7fdf
7c3a93a
 
 
 
 
 
 
 
64d7fdf
7c3a93a
64d7fdf
7c3a93a
 
 
64d7fdf
7c3a93a
64d7fdf
7c3a93a
64d7fdf
7c3a93a
 
 
 
 
 
64d7fdf
7c3a93a
64d7fdf
7c3a93a
 
 
 
 
 
 
 
64d7fdf
7c3a93a
64d7fdf
7c3a93a

---
title: RAG Chatbot
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
---

# RAG Chatbot with Advanced Retrieval

A question-answering system that lets you upload documents and ask questions about them. The system retrieves relevant information from your documents and generates accurate answers.

## How It Works

### When You Upload a Document

```
1. Upload File (PDF/DOCX/TXT)
        ↓
2. Extract Text
        ↓
3. Split into Chunks (512 tokens each)
        ↓
4. Convert to Embeddings (384D vectors)
        ↓
5. Store in Vector Database (Qdrant)
        ↓
6. Save Metadata in MongoDB
```

**What happens:** Your document is broken into small chunks, each chunk is converted into a numerical vector that captures its meaning, and stored in a database for fast searching.

### When You Ask a Question

```
1. Type Your Question
        ↓
2. Check Cache (answered before?)
        ↓
3. Search Documents (if RAG is ON)
   - BM25: Find keyword matches
   - Vector: Find similar meanings
        ↓
4. Rerank Results (pick top 5 most relevant)
        ↓
5. Build Context from Chunks
        ↓
6. Generate Answer with LLM
        ↓
7. Stream Response to You
```

**What happens:** The system searches for relevant chunks from your documents, combines them as context, and uses an AI model to generate an answer based on that context.

## Key Components

### Document Processing

**DocumentProcessor** - Main coordinator for document uploads
- Validates file type and size
- Calls the right loader for PDF, DOCX, or TXT files
- Manages the entire processing pipeline

**Embedder** - Converts text to vectors
- Uses FastEmbed with BAAI/bge-small-en-v1.5 model
- Generates 384-dimensional vectors for semantic search
- Each chunk becomes a searchable vector

**Qdrant Vector Store** - Stores embeddings
- Fast similarity search across millions of vectors
- Returns most relevant chunks for any query
- Handles all vector operations

### Question Answering

**HybridRetriever** - Finds relevant information
- **BM25**: Traditional keyword search (good for exact matches)
- **Vector Search**: Semantic search (understands meaning)
- Combines both for better results

**Reranker** - Improves search quality
- Uses FlashRank model to score relevance
- Filters the best 5 chunks from 20 candidates
- Ensures only the most relevant context is used

**Generator** - Creates answers
- Uses Groq LLM (llama-3.1-70b)
- Streams responses in real-time
- Bases answers on retrieved context when RAG is ON
- Uses general knowledge when RAG is OFF

**Semantic Cache** - Speeds up responses
- Remembers previous questions and answers
- Returns cached response if same question asked again
- Separate caches for RAG ON vs RAG OFF

### Memory & Storage

**Conversation Memory** - Remembers chat history
- Stores last 10 messages in Redis
- Enables follow-up questions
- Each session has independent history

**MongoDB** - Document metadata
- Tracks uploaded documents
- Stores file info, upload time, chunk count
- Links to vectors in Qdrant

**Redis** - Fast caching
- Stores conversation history
- Caches LLM responses
- In-memory for instant access

## Technology Stack

- **LangChain 0.3.13** - RAG framework
- **Groq API** - Fast LLM (llama-3.1-70b)
- **FastEmbed** - Embedding generation
- **FlashRank** - Result reranking
- **Qdrant** - Vector database
- **MongoDB** - Document storage
- **Redis** - Caching layer
- **FastAPI** - Web framework

## Quick Start

### Installation

```bash
# Clone and install
git clone https://github.com/Abeshith/RAG.git
cd RAG
pip install -r requirements.txt
```

### Configuration

Create `.env` file:

```env
GROQ_API_KEY=your_groq_key
MONGODB_URI=your_mongodb_uri
REDIS_URL=your_redis_url
QDRANT_URL=your_qdrant_url
QDRANT_API_KEY=your_qdrant_key
JWT_SECRET_KEY=your_secret_key
```

### Run

```bash
uvicorn app.main:app --host 0.0.0.0 --port 7860
```

Open: http://localhost:7860

## Usage

1. **Upload Documents**: Click upload, select PDF/DOCX/TXT file
2. **Ask Questions**: Type question in chat box
3. **Toggle RAG**: 
   - ON = answers from your documents
   - OFF = general knowledge answers
4. **View Sources**: See which document chunks were used

## API Endpoints

```
GET  /health/                    - Check system status
POST /chat/stream                - Send question, get streaming answer
POST /documents/upload           - Upload new document
GET  /documents/                 - List all documents
GET  /documents/stats            - Get document statistics
DELETE /documents/{id}           - Delete specific document
```

## Docker Deployment

```bash
docker build -t rag-chatbot .
docker run -p 7860:7860 --env-file .env rag-chatbot
```