Spaces:

Zeri00
/

Cogni-chat-document-reader

Sleeping

App Files Files Community

riteshraut commited on Oct 14, 2025

Commit

edea2d6

1 Parent(s): f3c5275

feat/included the re ranker

Browse files

Files changed (3) hide show

README.md +499 -91
app.py +61 -17
rag_processor.py +1 -0

README.md CHANGED Viewed

@@ -1,142 +1,550 @@
 ---
-title: CogniChat - Chat with Your Documents
-emoji: 🤖
-colorFrom: blue
-colorTo: purple
-sdk: docker
-pinned: false
-license: mit
-app_port: 7860
 ---
-# CogniChat - Chat with Your Documents 🤖
-An intelligent document chat application that allows you to upload documents and have conversations with them using advanced RAG (Retrieval Augmented Generation) technology.
-## Features
-- **Multi-format Document Support**: Upload PDF, TXT, DOCX, and image files
-- **Advanced RAG Pipeline**: Hybrid search with BM25 and FAISS retrievers
-- **Conversational Memory**: Maintains chat history for contextual conversations
-- **Text-to-Speech**: Listen to AI responses with built-in TTS
-- **Streaming Responses**: Real-time response generation
-- **Modern UI**: Clean, responsive interface with dark mode
-## How to Use
-1. Upload your documents using drag & drop or file selection
-2. Wait for processing (may take a few minutes for large documents)
-3. Start chatting with your documents!
-4. Use the play button to listen to responses
-## Technology Stack
-- **Backend**: Flask, LangChain, FAISS
-- **AI Models**: Groq API with Llama 3.1
-- **Embeddings**: HuggingFace all-miniLM-L6-v2
-- **Frontend**: Vanilla JavaScript, TailwindCSS
-- **Document Processing**: Unstructured, PyPDF, python-docx
-## Quick Start
-### 1. Set up Environment
-#### For Hugging Face Spaces:
-1. Go to your Space Settings → Repository Secrets
-2. Add a new secret:
-   - **Name**: `GROQ_API_KEY`
-   - **Value**: Your actual GROQ API key from [console.groq.com/keys](https://console.groq.com/keys)
-3. Restart your Space after adding the secret
-#### For Local Development:
-```bash
-# Copy the environment template
-cp .env.example .env
-# Edit .env and add your GROQ API key
-# Replace 'your_groq_api_key_here' with your actual GROQ API key
-# Get your API key from: https://console.groq.com/keys
 ```
-**Important**: The API key must be set correctly for the chat functionality to work!
-### 2. Run with Docker (Recommended)
 ```bash
-# Build the Docker image
-docker build -t cognichat .
-# Run the container
 docker run -p 7860:7860 --env-file .env cognichat
 ```
-### 3. Run Locally
 ```bash
 # Install dependencies
 pip install -r requirements.txt
-# Set your GROQ API key
-export GROQ_API_KEY=your_groq_api_key_here
 # Run the application
 python app.py
 ```
-Visit `http://localhost:7860` to use the application.
-## Recent Fixes (October 2025)
-✅ **Fixed Docker Permission Issues**:
-- Resolved cache directory permission problems
-- Application now runs as non-root user for security
-- Improved error handling and fallback mechanisms
-✅ **Fixed HF Spaces Upload Directory Issue**:
-- **CRITICAL**: Changed upload folder to `/tmp/uploads` for HF Spaces compatibility
-- Automatically detects HF Spaces environment and uses writable directories
-- Added comprehensive error handling for file save operations
-- Fixed 400 chat errors caused by read-only directory access
-✅ **Enhanced Model Loading**:
-- Multiple fallback strategies for embedding model initialization
-- Better cache management for HuggingFace models
-- Improved startup reliability
-## Troubleshooting
-### Permission Errors
-If you encounter permission errors, ensure:
-1. Docker containers run with proper user permissions
-2. Cache directories are writable
-3. Environment variables are set correctly
-### Model Loading Issues
-The app includes multiple fallback mechanisms:
-1. Primary: `sentence-transformers/all-miniLM-L6-v2`
-2. Fallback: `all-miniLM-L6-v2`
-3. Final fallback: Default model without cache specification
-### API Key Issues
-Make sure your GROQ API key is:
-1. Valid and active
-2. Set in the `.env` file or environment variables
-3. Has sufficient credits/quota
-## Development
-For development and testing:
 ```bash
-# Test embedding model loading
 python test_embeddings.py
-# Run with debug mode
 export FLASK_DEBUG=1
 python app.py
 ```
-## Environment Variables
-- `GROQ_API_KEY`: Your Groq API key (required)
-- `HF_HOME`: HuggingFace cache directory
-- `PORT`: Application port (default: 7860)
-Developed by [Ritesh](https://github.com/RautRitesh) and [Alish-0x](https://github.com/Alish-0x)

+# 🤖 CogniChat - Intelligent Document Chat System
+<div align="center">
+![License](https://img.shields.io/badge/license-MIT-blue.svg)
+![Python](https://img.shields.io/badge/python-3.9+-brightgreen.svg)
+![Docker](https://img.shields.io/badge/docker-ready-blue.svg)
+![HuggingFace](https://img.shields.io/badge/🤗-Spaces-yellow.svg)
+**Transform your documents into interactive conversations powered by advanced RAG technology**
+[Features](#-features) • [Quick Start](#-quick-start) • [Architecture](#-architecture) • [Deployment](#-deployment) • [API](#-api-reference)
+</div>
 ---
+## 📋 Table of Contents
+- [Overview](#-overview)
+- [Features](#-features)
+- [Architecture](#-architecture)
+- [Technology Stack](#-technology-stack)
+- [Quick Start](#-quick-start)
+- [Deployment](#-deployment)
+- [Configuration](#-configuration)
+- [API Reference](#-api-reference)
+- [Troubleshooting](#-troubleshooting)
+- [Contributing](#-contributing)
+- [License](#-license)
 ---
+## 🎯 Overview
+CogniChat is a production-ready, intelligent document chat application that leverages **Retrieval Augmented Generation (RAG)** to enable natural conversations with your documents. Built with enterprise-grade technologies, it provides accurate, context-aware responses from your document corpus.
+### Why CogniChat?
+- **🔉 Audio Overview of Your document**:Simply ask the question and listen the audio. Now your document can speak with you.
+- **🎯 Accurate Retrieval**: Hybrid search combining BM25 and FAISS for optimal results
+- **💬 Conversational Memory**: Maintains context across multiple interactions
+- **📄 Multi-Format Support**: Handles PDF, DOCX, TXT, and image files
+- **🚀 Production Ready**: Docker support, comprehensive error handling, and security best practices
+- **🎨 Modern UI**: Responsive design with dark mode and real-time streaming
+---
+## ✨ Features
+### Core Capabilities
+| Feature | Description |
+|---------|-------------|
+| **Multi-Format Processing** | Upload and process PDF, DOCX, TXT, and image files |
+| **Hybrid Search** | Combines BM25 (keyword) and FAISS (semantic) for superior retrieval |
+| **Conversational AI** | Powered by Groq's Llama 3.1 for intelligent responses |
+| **Memory Management** | Maintains chat history for contextual conversations |
+| **Text-to-Speech** | Built-in TTS for audio playback of responses |
+| **Streaming Responses** | Real-time token streaming for better UX |
+| **Document Chunking** | Intelligent text splitting for optimal context windows |
+### Advanced Features
+- **Semantic Embeddings**: HuggingFace `all-miniLM-L6-v2` for accurate vector representations
+- **Reranking**: Contextual compression for improved relevance
+- **Error Handling**: Comprehensive fallback mechanisms and error recovery
+- **Security**: Non-root Docker execution and environment-based secrets
+- **Scalability**: Optimized for both local and cloud deployments
+---
+## 🏗 Architecture
+### RAG Pipeline Overview
+```mermaid
+graph TB
+    A[Document Upload] --> B[Document Processing]
+    B --> C[Text Extraction]
+    C --> D[Chunking Strategy]
+    D --> E[Embedding Generation]
+    E --> F[Vector Store FAISS]
+    G[User Query] --> H[Query Embedding]
+    H --> I[Hybrid Retrieval]
+    F --> I
+    J[BM25 Index] --> I
+    I --> K[Reranking]
+    K --> L[Context Assembly]
+    L --> M[LLM Groq Llama 3.1]
+    M --> N[Response Generation]
+    N --> O[Streaming Output]
+    P[Chat History] --> M
+    N --> P
+    style A fill:#e1f5ff
+    style G fill:#e1f5ff
+    style F fill:#ffe1f5
+    style J fill:#ffe1f5
+    style M fill:#f5e1ff
+    style O fill:#e1ffe1
+```
+### System Architecture
+```mermaid
+graph LR
+    A[Client Browser] -->|HTTP/WebSocket| B[Flask Server]
+    B --> C[Document Processor]
+    B --> D[RAG Engine]
+    B --> E[TTS Service]
+    C --> F[(File Storage)]
+    D --> G[(FAISS Vector DB)]
+    D --> H[(BM25 Index)]
+    D --> I[Groq API]
+    J[HuggingFace Models] --> D
+    style B fill:#4a90e2
+    style D fill:#e24a90
+    style I fill:#90e24a
 ```
+### Data Flow
+1. **Document Ingestion**: Files are uploaded and validated
+2. **Processing Pipeline**: Text extraction → Chunking → Embedding
+3. **Indexing**: Dual indexing (FAISS + BM25) for hybrid search
+4. **Query Processing**: User queries are embedded and searched
+5. **Retrieval**: Top-k relevant chunks retrieved using hybrid approach
+6. **Generation**: LLM generates contextual responses with citations
+7. **Streaming**: Responses streamed back to client in real-time
+---
+## 🛠 Technology Stack
+### Backend
+| Component | Technology | Purpose |
+|-----------|-----------|---------|
+| **Framework** | Flask 2.3+ | Web application framework |
+| **RAG** | LangChain | RAG pipeline orchestration |
+| **Vector DB** | FAISS | Fast similarity search |
+| **Keyword Search** | BM25 | Sparse retrieval |
+| **LLM** | Groq Llama 3.1 | Response generation |
+| **Embeddings** | HuggingFace Transformers | Semantic embeddings |
+| **Doc Processing** | Unstructured, PyPDF, python-docx | Multi-format parsing |
+### Frontend
+| Component | Technology |
+|-----------|-----------|
+| **UI Framework** | TailwindCSS |
+| **JavaScript** | Vanilla ES6+ |
+| **Icons** | Font Awesome |
+| **Markdown** | Marked.js |
+### Infrastructure
+- **Containerization**: Docker + Docker Compose
+- **Deployment**: HuggingFace Spaces, local, cloud-agnostic
+- **Security**: Environment-based secrets, non-root execution
+---
+## 🚀 Quick Start
+### Prerequisites
+- Python 3.9+
+- Docker (optional, recommended)
+- Groq API Key ([Get one here](https://console.groq.com/keys))
+### Installation Methods
+#### 🐳 Method 1: Docker (Recommended)
 ```bash
+# Clone the repository
+git clone https://github.com/RautRitesh/Chat-with-docs
+cd cognichat
+# Create environment file
+cp .env.example .env
+# Add your Groq API key to .env
+echo "GROQ_API_KEY=your_actual_api_key_here" >> .env
+# Build and run with Docker Compose
+docker-compose up -d
+# Or build manually
+docker build -t cognichat .
 docker run -p 7860:7860 --env-file .env cognichat
 ```
+#### 🐍 Method 2: Local Python Environment
 ```bash
+# Clone the repository
+git clone https://github.com/RautRitesh/Chat-with-docs
+cd cognichat
+# Create virtual environment
+python -m venv venv
+source venv/bin/activate  # On Windows: venv\Scripts\activate
 # Install dependencies
 pip install -r requirements.txt
+# Set environment variables
+export GROQ_API_KEY=your_actual_api_key_here
 # Run the application
 python app.py
 ```
+#### 🤗 Method 3: HuggingFace Spaces
+1. Fork this repository
+2. Create a new Space on [HuggingFace](https://huggingface.co/spaces)
+3. Link your forked repository
+4. Add `GROQ_API_KEY` in Settings → Repository Secrets
+5. Space will auto-deploy!
+### First Steps
+1. Open `http://localhost:7860` in your browser
+2. Upload a document (PDF, DOCX, TXT, or image)
+3. Wait for processing (progress indicator will show status)
+4. Start chatting with your document!
+5. Use the 🔊 button to hear responses via TTS
+---
+## 📦 Deployment
+### Environment Variables
+Create a `.env` file with the following variables:
+```bash
+# Required
+GROQ_API_KEY=your_groq_api_key_here
+# Optional
+PORT=7860
+HF_HOME=/tmp/huggingface_cache  # For HF Spaces
+FLASK_DEBUG=0  # Set to 1 for development
+MAX_UPLOAD_SIZE=10485760  # 10MB default
+```
+### Docker Deployment
 ```bash
+# Production build
+docker build -t cognichat:latest .
+# Run with resource limits
+docker run -d \
+  --name cognichat \
+  -p 7860:7860 \
+  --env-file .env \
+  --memory="2g" \
+  --cpus="1.5" \
+  cognichat:latest
+```
+### Docker Compose
+```yaml
+version: '3.8'
+services:
+  cognichat:
+    build: .
+    ports:
+      - "7860:7860"
+    environment:
+      - GROQ_API_KEY=${GROQ_API_KEY}
+    volumes:
+      - ./data:/app/data
+    restart: unless-stopped
+```
+### HuggingFace Spaces Configuration
+Add these files to your repository:
+**app_port** in `README.md` header:
+```yaml
+app_port: 7860
+```
+**Repository Secrets**:
+- `GROQ_API_KEY`: Your Groq API key
+The application automatically detects HF Spaces environment and adjusts paths accordingly.
+---
+## ⚙️ Configuration
+### Document Processing Settings
+```python
+# In app.py - Customize these settings
+CHUNK_SIZE = 1000  # Characters per chunk
+CHUNK_OVERLAP = 200  # Overlap between chunks
+EMBEDDING_MODEL = "sentence-transformers/all-miniLM-L6-v2"
+RETRIEVER_K = 5  # Number of chunks to retrieve
+```
+### Model Configuration
+```python
+# LLM Settings
+LLM_PROVIDER = "groq"
+MODEL_NAME = "llama-3.1-70b-versatile"
+TEMPERATURE = 0.7
+MAX_TOKENS = 2048
+```
+### Search Configuration
+```python
+# Hybrid Search Weights
+FAISS_WEIGHT = 0.6  # Semantic search weight
+BM25_WEIGHT = 0.4   # Keyword search weight
+```
+---
+## 📚 API Reference
+### Endpoints
+#### Upload Document
+```http
+POST /upload
+Content-Type: multipart/form-data
+{
+  "file": <binary>
+}
+```
+**Response**:
+```json
+{
+  "status": "success",
+  "message": "Document processed successfully",
+  "filename": "example.pdf",
+  "chunks": 45
+}
+```
+#### Chat
+```http
+POST /chat
+Content-Type: application/json
+{
+  "message": "What is the main topic?",
+  "stream": true
+}
+```
+**Response** (Streaming):
+```
+data: {"token": "The", "done": false}
+data: {"token": " main", "done": false}
+data: {"token": " topic", "done": false}
+data: {"done": true}
+```
+#### Clear Session
+```http
+POST /clear
+```
+**Response**:
+```json
+{
+  "status": "success",
+  "message": "Session cleared"
+}
+```
+---
+## 🔧 Troubleshooting
+### Common Issues
+#### 1. Permission Errors in Docker
+**Problem**: `Permission denied` when writing to cache directories
+**Solution**:
+```bash
+# Rebuild with proper permissions
+docker build --no-cache -t cognichat .
+# Or run with volume permissions
+docker run -v $(pwd)/cache:/tmp/huggingface_cache \
+  --user $(id -u):$(id -g) \
+  cognichat
+```
+#### 2. Model Loading Fails
+**Problem**: Cannot download HuggingFace models
+**Solution**:
+```bash
+# Pre-download models
 python test_embeddings.py
+# Or use HF_HOME environment variable
+export HF_HOME=/path/to/writable/directory
+```
+#### 3. Chat Returns 400 Error
+**Problem**: Upload directory not writable (common in HF Spaces)
+**Solution**: Application now automatically uses `/tmp/uploads` in HF Spaces environment. Ensure latest version is deployed.
+#### 4. API Key Invalid
+**Problem**: Groq API returns authentication error
+**Solution**:
+- Verify key at [Groq Console](https://console.groq.com/keys)
+- Check `.env` file has correct format: `GROQ_API_KEY=gsk_...`
+- Restart application after updating key
+### Debug Mode
+Enable detailed logging:
+```bash
 export FLASK_DEBUG=1
+export LANGCHAIN_VERBOSE=true
 python app.py
 ```
+---
+## 🧪 Testing
+```bash
+# Run test suite
+pytest tests/
+# Test embedding model
+python test_embeddings.py
+# Test document processing
+pytest tests/test_document_processor.py
+# Integration tests
+pytest tests/test_integration.py
+```
+---
+## 🤝 Contributing
+We welcome contributions! Please follow these steps:
+1. Fork the repository
+2. Create a feature branch (`git checkout -b feature/amazing-feature`)
+3. Commit your changes (`git commit -m 'Add amazing feature'`)
+4. Push to the branch (`git push origin feature/amazing-feature`)
+5. Open a Pull Request
+### Development Guidelines
+- Follow PEP 8 style guide
+- Add tests for new features
+- Update documentation
+- Ensure Docker build succeeds
+---
+## 📝 Changelog
+### Version 2.0 (October 2025)
+✅ **Major Improvements**:
+- Fixed Docker permission issues
+- HuggingFace Spaces compatibility
+- Enhanced error handling
+- Multiple model loading fallbacks
+- Improved security (non-root execution)
+✅ **Bug Fixes**:
+- Upload directory write permissions
+- Cache directory access
+- Model initialization reliability
+### Version 1.0 (Initial Release)
+- Basic RAG functionality
+- PDF and DOCX support
+- FAISS vector store
+- Conversational memory
+---
+## 📄 License
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
+---
+## 🙏 Acknowledgments
+- **LangChain** for RAG framework
+- **Groq** for high-speed LLM inference
+- **HuggingFace** for embeddings and hosting
+- **FAISS** for efficient vector search
+---
+## 📞 Support
+- **Issues**: [GitHub Issues](https://github.com/yourusername/cognichat/issues)
+- **Discussions**: [GitHub Discussions](https://github.com/yourusername/cognichat/discussions)
+- **Email**: riteshraut123321@gmail.com
+---
+<div align="center">
+**Made with ❤️ by the CogniChat Team**
+</div>

app.py CHANGED Viewed

@@ -6,14 +6,13 @@ import uuid
 from flask import Flask, request, render_template, session, jsonify, Response, stream_with_context
 from werkzeug.utils import secure_filename
 from rag_processor import create_rag_chain
-# ============================ ADDITIONS START ============================
 from gtts import gTTS
 import io
-import re # <-- Import the regular expression module
-# ============================ ADDITIONS END ==============================
-# Document Loaders
 from langchain_community.document_loaders import (
     TextLoader,
     PyPDFLoader,
@@ -22,28 +21,56 @@ from langchain_community.document_loaders import (
 # Additional imports for robust PDF handling
 from langchain_core.documents import Document
-import fitz  # PyMuPDF for alternative PDF processing
 # Text Splitter, Embeddings, Retrievers
 from langchain.text_splitter import RecursiveCharacterTextSplitter
-from langchain_community.embeddings import HuggingFaceEmbeddings
 from langchain_community.vectorstores import FAISS
-from langchain.retrievers import EnsembleRetriever
 from langchain_community.retrievers import BM25Retriever
 from langchain_community.chat_message_histories import ChatMessageHistory
-# --- Basic Flask App Setup ---
 app = Flask(__name__)
 app.config['SECRET_KEY'] = os.urandom(24)
-# Use /tmp directory for uploads in HF Spaces (writable), fallback to local uploads for development
 is_hf_spaces = bool(os.getenv("SPACE_ID") or os.getenv("SPACES_ZERO_GPU"))
 if is_hf_spaces:
     app.config['UPLOAD_FOLDER'] = '/tmp/uploads'
 else:
     app.config['UPLOAD_FOLDER'] = 'uploads'
-# Create upload directory with proper error handling
 try:
     os.makedirs(app.config['UPLOAD_FOLDER'], exist_ok=True)
     print(f"✓ Upload folder ready: {app.config['UPLOAD_FOLDER']}")
@@ -54,21 +81,23 @@ except Exception as e:
     os.makedirs(app.config['UPLOAD_FOLDER'], exist_ok=True)
     print(f"✓ Using fallback upload folder: {app.config['UPLOAD_FOLDER']}")
-# --- In-memory Storage & Global Model Loading ---
 rag_chains = {}
 message_histories = {}
-# Load the embedding model once when the application starts for efficiency.
 print("Loading embedding model...")
-# Set environment variables for HuggingFace cache (use home directory if available)
 cache_base = os.path.expanduser("~/.cache") if os.path.expanduser("~") != "~" else "/tmp/hf_cache"
 os.environ.setdefault('HF_HOME', f'{cache_base}/huggingface')
 os.environ.setdefault('HF_HUB_CACHE', f'{cache_base}/huggingface/hub')
 os.environ.setdefault('TRANSFORMERS_CACHE', f'{cache_base}/transformers')
 os.environ.setdefault('SENTENCE_TRANSFORMERS_HOME', f'{cache_base}/sentence_transformers')
-# Create cache directories with proper permissions
 cache_dirs = [
     os.environ['HF_HOME'],
     os.environ['HF_HUB_CACHE'],
@@ -103,6 +132,8 @@ for cache_dir in cache_dirs:
     except Exception as e:
         print(f"Warning: Could not create {cache_dir}: {e}")
 # Try loading embedding model with error handling and fallbacks
 try:
     print("Attempting to load embedding model...")
@@ -135,6 +166,13 @@ except Exception as e:
             print(f"Final attempt failed: {e3}")
             # Use a simpler fallback model or raise the error
             raise Exception(f"Could not load any embedding model. Last error: {e3}")
 def load_pdf_with_fallback(filepath):
     """
@@ -336,12 +374,19 @@ def upload_files():
             retrievers=[bm25_retriever, faiss_retriever],
             weights=[0.5, 0.5]
         )
         session_id = str(uuid.uuid4())
         print(f"Creating RAG chain for session {session_id}...")
         try:
-            rag_chain = create_rag_chain(ensemble_retriever, get_session_history)
             rag_chains[session_id] = rag_chain
             print(f"✓ RAG chain created successfully for session {session_id} with {len(processed_files)} documents.")
         except Exception as rag_error:
@@ -443,7 +488,6 @@ def chat():
         print(f"Error during chat invocation: {e}")
         return Response("An error occurred while getting the answer.", status=500, mimetype='text/plain')
-# ============================ ADDITIONS START ============================
 def clean_markdown_for_tts(text: str) -> str:
     """Removes markdown formatting for cleaner text-to-speech output."""
@@ -484,7 +528,7 @@ def text_to_speech():
     except Exception as e:
         print(f"Error in TTS generation: {e}")
         return jsonify({'status': 'error', 'message': 'Failed to generate audio.'}), 500
-# ============================ ADDITIONS END ==============================
 @app.route('/debug', methods=['GET'])

 from flask import Flask, request, render_template, session, jsonify, Response, stream_with_context
 from werkzeug.utils import secure_filename
 from rag_processor import create_rag_chain
+from typing import Sequence, Any
 from gtts import gTTS
 import io
+import re
 from langchain_community.document_loaders import (
     TextLoader,
     PyPDFLoader,
 # Additional imports for robust PDF handling
 from langchain_core.documents import Document
+import fitz
 # Text Splitter, Embeddings, Retrievers
 from langchain.text_splitter import RecursiveCharacterTextSplitter
+from langchain_huggingface import HuggingFaceEmbeddings
 from langchain_community.vectorstores import FAISS
+from langchain.retrievers import EnsembleRetriever, ContextualCompressionRetriever
+from langchain.retrievers.document_compressors.base import BaseDocumentCompressor
 from langchain_community.retrievers import BM25Retriever
 from langchain_community.chat_message_histories import ChatMessageHistory
+from sentence_transformers.cross_encoder import CrossEncoder
+import numpy as np
 app = Flask(__name__)
 app.config['SECRET_KEY'] = os.urandom(24)
+class LocalReranker(BaseDocumentCompressor):
+    model: Any
+    top_n: int = 5
+    class Config:
+        arbitrary_types_allowed = True
+    def compress_documents(
+        self,
+        documents: Sequence[Document],
+        query: str,
+        callbacks=None,
+    ) -> Sequence[Document]:
+        if not documents:
+            return []
+        pairs = [[query, doc.page_content] for doc in documents]
+        scores = self.model.predict(pairs, show_progress_bar=False)
+        doc_scores = list(zip(documents, scores))
+        sorted_doc_scores = sorted(doc_scores, key=lambda x: x[1], reverse=True)
+        return [doc for doc, score in sorted_doc_scores[:self.top_n]]
 is_hf_spaces = bool(os.getenv("SPACE_ID") or os.getenv("SPACES_ZERO_GPU"))
 if is_hf_spaces:
     app.config['UPLOAD_FOLDER'] = '/tmp/uploads'
 else:
     app.config['UPLOAD_FOLDER'] = 'uploads'
 try:
     os.makedirs(app.config['UPLOAD_FOLDER'], exist_ok=True)
     print(f"✓ Upload folder ready: {app.config['UPLOAD_FOLDER']}")
     os.makedirs(app.config['UPLOAD_FOLDER'], exist_ok=True)
     print(f"✓ Using fallback upload folder: {app.config['UPLOAD_FOLDER']}")
 rag_chains = {}
 message_histories = {}
 print("Loading embedding model...")
 cache_base = os.path.expanduser("~/.cache") if os.path.expanduser("~") != "~" else "/tmp/hf_cache"
 os.environ.setdefault('HF_HOME', f'{cache_base}/huggingface')
 os.environ.setdefault('HF_HUB_CACHE', f'{cache_base}/huggingface/hub')
 os.environ.setdefault('TRANSFORMERS_CACHE', f'{cache_base}/transformers')
 os.environ.setdefault('SENTENCE_TRANSFORMERS_HOME', f'{cache_base}/sentence_transformers')
 cache_dirs = [
     os.environ['HF_HOME'],
     os.environ['HF_HUB_CACHE'],
     except Exception as e:
         print(f"Warning: Could not create {cache_dir}: {e}")
 # Try loading embedding model with error handling and fallbacks
 try:
     print("Attempting to load embedding model...")
             print(f"Final attempt failed: {e3}")
             # Use a simpler fallback model or raise the error
             raise Exception(f"Could not load any embedding model. Last error: {e3}")
+print("Loading local re-ranking model...")
+RERANKER_MODEL = CrossEncoder("mixedbread-ai/mxbai-rerank-xsmall-v1", device='cpu')
+print("Re-ranking model loaded successfully.")
 def load_pdf_with_fallback(filepath):
     """
             retrievers=[bm25_retriever, faiss_retriever],
             weights=[0.5, 0.5]
         )
+        reranker = LocalReranker(model=RERANKER_MODEL, top_n=3)
+        compression_retriever = ContextualCompressionRetriever(
+        base_compressor=reranker,
+        base_retriever=ensemble_retriever
+                                        )
         session_id = str(uuid.uuid4())
         print(f"Creating RAG chain for session {session_id}...")
         try:
+            rag_chain = create_rag_chain(compression_retriever, get_session_history)
             rag_chains[session_id] = rag_chain
             print(f"✓ RAG chain created successfully for session {session_id} with {len(processed_files)} documents.")
         except Exception as rag_error:
         print(f"Error during chat invocation: {e}")
         return Response("An error occurred while getting the answer.", status=500, mimetype='text/plain')
 def clean_markdown_for_tts(text: str) -> str:
     """Removes markdown formatting for cleaner text-to-speech output."""
     except Exception as e:
         print(f"Error in TTS generation: {e}")
         return jsonify({'status': 'error', 'message': 'Failed to generate audio.'}), 500
 @app.route('/debug', methods=['GET'])

rag_processor.py CHANGED Viewed

@@ -83,6 +83,7 @@ Standalone Question:"""
     rag_template = """You are an expert assistant named `Cognichat`.Whenver user ask you about who you are , simply say you are `Cognichat`.
     You are developed by Ritesh and Alish.
     Your job is to provide accurate and helpful answers based ONLY on the provided context.
 If the information is not in the context, clearly state that you don't know the answer.
 Provide a clear and concise answer.

     rag_template = """You are an expert assistant named `Cognichat`.Whenver user ask you about who you are , simply say you are `Cognichat`.
     You are developed by Ritesh and Alish.
     Your job is to provide accurate and helpful answers based ONLY on the provided context.
+    Whatever the user ask,it is always about the document so based on the document only provide the answer.
 If the information is not in the context, clearly state that you don't know the answer.
 Provide a clear and concise answer.