Spaces:

sinhapiyush86
/

convAI

Sleeping

App Files Files Community

convAI / README.md

sinhapiyush86

Upload 15 files

afad319 verified 6 months ago

preview code

raw

history blame contribute delete

8.56 kB

metadata

title: RAG System with PDF Documents
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: docker
sdk_version: latest
app_file: app.py
pinned: false
app_port: 8501

🤖 Conversational AI RAG System

A comprehensive Retrieval-Augmented Generation (RAG) system with advanced guard rails, built with Streamlit, FAISS, and Hugging Face models.

🚀 Features

Hybrid Search: Combines dense (FAISS) and sparse (BM25) retrieval for optimal results
Advanced Guard Rails: Comprehensive safety and security measures
Multiple Models: Support for Qwen 2.5 1.5B and distilgpt2 fallback
PDF Processing: Intelligent document chunking and processing
Real-time Monitoring: Performance metrics and system health checks
Docker Support: Containerized deployment with Docker Compose
Hugging Face Spaces Ready: Optimized for HF Spaces deployment

🏗️ Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Streamlit UI  │───▶│   RAG System    │───▶│  Guard Rails    │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                              │
                              ▼
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│  PDF Processor  │    │   FAISS Index   │    │  Language Model │
└─────────────────┘    └─────────────────┘    └─────────────────┘

🛠️ Technology Stack

Core Technologies

🔍 Vector Database: FAISS for efficient similarity search
📝 Sparse Retrieval: BM25 for keyword-based search
🧠 Embedding Model: all-MiniLM-L6-v2 for document embeddings
🤖 Generative Model: Qwen 2.5 1.5B for answer generation
🌐 UI Framework: Streamlit for interactive interface
🐳 Containerization: Docker for deployment

Supporting Libraries

📊 Data Processing: Pandas, NumPy for data manipulation
📄 PDF Handling: PyPDF for document processing
🔧 ML Utilities: Scikit-learn for preprocessing
📝 Logging: Loguru for structured logging
⚡ Optimization: Accelerate for model optimization

🚀 Quick Start

Local Development

Clone and Setup:

git clone <repository-url>
cd convAI
pip install -r requirements.txt

Run the Application:

streamlit run app.py

Upload PDFs and Start Chatting!

Docker Deployment

Build and Run:

docker-compose up --build

Access at: http://localhost:8501

🌟 Hugging Face Spaces Deployment

This application is optimized for deployment on Hugging Face Spaces. The system automatically:

Uses /tmp directories for cache storage (writable in HF Spaces)
Configures environment variables for HF Spaces compatibility
Handles permission issues automatically
Optimizes model loading for HF Spaces environment

HF Spaces Configuration

The application includes:

Cache Management: All model caches stored in /tmp directories
Permission Handling: Automatic fallback to writable directories
Environment Detection: Adapts to HF Spaces runtime environment
Resource Optimization: Efficient memory and CPU usage

Deploy to HF Spaces

Create a new Space on Hugging Face
Choose Docker as the SDK
Upload all files from this repository
The system will automatically:
- Set up cache directories in /tmp
- Download and cache models
- Initialize the RAG system with guard rails
- Start the Streamlit interface

HF Spaces Environment Variables

The system automatically configures:

HF_HOME=/tmp/huggingface
TRANSFORMERS_CACHE=/tmp/huggingface/transformers
TORCH_HOME=/tmp/torch
XDG_CACHE_HOME=/tmp
HF_HUB_CACHE=/tmp/huggingface/hub

📖 Usage Guide

Document Upload

Automatic Loading: PDF documents in the container are loaded automatically
Manual Upload: Use the sidebar to upload additional PDF documents
Supported Formats: PDF files with text content

Search Methods

🔀 Hybrid: Combines vector similarity and keyword matching (recommended)
🎯 Dense: Uses only vector similarity search
📝 Sparse: Uses only keyword-based BM25 search

Query Interface

Natural Language: Ask questions in plain English
Context Awareness: System uses retrieved documents for context
Confidence Scores: See how confident the system is in its answers
Source Citations: View which documents were used for the answer

⚙️ Configuration

Environment Variables

# Model Configuration
EMBEDDING_MODEL=all-MiniLM-L6-v2
GENERATIVE_MODEL=Qwen/Qwen2.5-1.5B-Instruct

# Chunk Sizes
CHUNK_SIZES=100,400

# Vector Store Path
VECTOR_STORE_PATH=./vector_store

# Streamlit Configuration
STREAMLIT_SERVER_PORT=8501
STREAMLIT_SERVER_ADDRESS=0.0.0.0

Performance Tuning

Chunk Sizes: Adjust for different document types (smaller for technical docs, larger for narratives)
Top-k Results: Increase for more comprehensive answers, decrease for faster responses
Model Selection: Choose between Qwen 2.5 1.5B and distilgpt2 based on performance needs

📊 Performance

Optimization Features

Parallel Processing: Documents are loaded concurrently for faster initialization
Optimized Search: Hybrid retrieval combines the best of vector and keyword search
Memory Efficient: Uses CPU-optimized models for deployment compatibility
Caching: FAISS index and metadata are cached for faster subsequent queries

Expected Performance

Document Loading: ~2-5 seconds per PDF (depending on size)
Query Response: ~1-3 seconds for typical questions
Memory Usage: ~2-4GB RAM for typical document collections
Storage: ~100MB per 1000 document chunks

🔧 Development

Project Structure

convAI/
├── app.py                 # Main Streamlit application
├── rag_system.py          # Core RAG system implementation
├── pdf_processor.py       # PDF processing utilities
├── requirements.txt       # Python dependencies
├── Dockerfile            # Container configuration
├── docker-compose.yml    # Multi-container setup
├── README.md             # This file
├── DEPLOYMENT_GUIDE.md   # Detailed deployment instructions
├── test_deployment.py    # Deployment testing script
├── test_docker.py        # Docker testing script
└── src/
    └── streamlit_app.py  # Sample Streamlit app

Testing

# Test deployment readiness
python test_deployment.py

# Test Docker configuration
python test_docker.py

# Run local tests
streamlit run app.py

🐛 Troubleshooting

Common Issues

Model Loading Errors
- Check internet connectivity for model downloads
- Verify sufficient disk space
- Try the fallback model (distilgpt2)
Memory Issues
- Reduce chunk sizes
- Use smaller embedding models
- Limit the number of documents
Performance Issues
- Adjust top-k parameter
- Use sparse search for keyword-heavy queries
- Consider hardware upgrades
Docker Issues
- Check Docker installation
- Verify port availability
- Check container logs

Getting Help

Check the logs in your Space's "Logs" tab
Review the deployment guide for common solutions
Create an issue in the project repository

🤝 Contributing

We welcome contributions! Please see our contributing guidelines for:

Code style and standards
Testing requirements
Documentation updates
Feature requests and bug reports

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Hugging Face for providing the platform and models
FAISS team for the efficient vector search library
Streamlit team for the excellent web framework
OpenAI for inspiring the RAG architecture

Built with ❤️ for efficient document question-answering

Ready to explore your documents? Start asking questions! 🚀