Spaces:

sinhapiyush86
/

convAI

Sleeping

File size: 8,563 Bytes

---
title: RAG System with PDF Documents
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: docker
sdk_version: latest
app_file: app.py
pinned: false
app_port: 8501
---

# 🤖 Conversational AI RAG System

A comprehensive Retrieval-Augmented Generation (RAG) system with advanced guard rails, built with Streamlit, FAISS, and Hugging Face models.

## 🚀 Features

- **Hybrid Search**: Combines dense (FAISS) and sparse (BM25) retrieval for optimal results
- **Advanced Guard Rails**: Comprehensive safety and security measures
- **Multiple Models**: Support for Qwen 2.5 1.5B and distilgpt2 fallback
- **PDF Processing**: Intelligent document chunking and processing
- **Real-time Monitoring**: Performance metrics and system health checks
- **Docker Support**: Containerized deployment with Docker Compose
- **Hugging Face Spaces Ready**: Optimized for HF Spaces deployment

## 🏗️ Architecture

```
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Streamlit UI  │───▶│   RAG System    │───▶│  Guard Rails    │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                              │
                              ▼
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│  PDF Processor  │    │   FAISS Index   │    │  Language Model │
└─────────────────┘    └─────────────────┘    └─────────────────┘
```

## 🛠️ Technology Stack

### Core Technologies
- **🔍 Vector Database**: FAISS for efficient similarity search
- **📝 Sparse Retrieval**: BM25 for keyword-based search
- **🧠 Embedding Model**: all-MiniLM-L6-v2 for document embeddings
- **🤖 Generative Model**: Qwen 2.5 1.5B for answer generation
- **🌐 UI Framework**: Streamlit for interactive interface
- **🐳 Containerization**: Docker for deployment

### Supporting Libraries
- **📊 Data Processing**: Pandas, NumPy for data manipulation
- **📄 PDF Handling**: PyPDF for document processing
- **🔧 ML Utilities**: Scikit-learn for preprocessing
- **📝 Logging**: Loguru for structured logging
- **⚡ Optimization**: Accelerate for model optimization

## 🚀 Quick Start

### Local Development

1. **Clone and Setup**:
```bash
git clone <repository-url>
cd convAI
pip install -r requirements.txt
```

2. **Run the Application**:
```bash
streamlit run app.py
```

3. **Upload PDFs and Start Chatting**!

### Docker Deployment

1. **Build and Run**:
```bash
docker-compose up --build
```

2. **Access at**: http://localhost:8501

## 🌟 Hugging Face Spaces Deployment

This application is optimized for deployment on Hugging Face Spaces. The system automatically:

- Uses `/tmp` directories for cache storage (writable in HF Spaces)
- Configures environment variables for HF Spaces compatibility
- Handles permission issues automatically
- Optimizes model loading for HF Spaces environment

### HF Spaces Configuration

The application includes:
- **Cache Management**: All model caches stored in `/tmp` directories
- **Permission Handling**: Automatic fallback to writable directories
- **Environment Detection**: Adapts to HF Spaces runtime environment
- **Resource Optimization**: Efficient memory and CPU usage

### Deploy to HF Spaces

1. **Create a new Space** on Hugging Face
2. **Choose Docker** as the SDK
3. **Upload all files** from this repository
4. **The system will automatically**:
   - Set up cache directories in `/tmp`
   - Download and cache models
   - Initialize the RAG system with guard rails
   - Start the Streamlit interface

### HF Spaces Environment Variables

The system automatically configures:
```bash
HF_HOME=/tmp/huggingface
TRANSFORMERS_CACHE=/tmp/huggingface/transformers
TORCH_HOME=/tmp/torch
XDG_CACHE_HOME=/tmp
HF_HUB_CACHE=/tmp/huggingface/hub
```

## 📖 Usage Guide

### Document Upload
- **Automatic Loading**: PDF documents in the container are loaded automatically
- **Manual Upload**: Use the sidebar to upload additional PDF documents
- **Supported Formats**: PDF files with text content

### Search Methods
- **🔀 Hybrid**: Combines vector similarity and keyword matching (recommended)
- **🎯 Dense**: Uses only vector similarity search
- **📝 Sparse**: Uses only keyword-based BM25 search

### Query Interface
- **Natural Language**: Ask questions in plain English
- **Context Awareness**: System uses retrieved documents for context
- **Confidence Scores**: See how confident the system is in its answers
- **Source Citations**: View which documents were used for the answer

## ⚙️ Configuration

### Environment Variables
```bash
# Model Configuration
EMBEDDING_MODEL=all-MiniLM-L6-v2
GENERATIVE_MODEL=Qwen/Qwen2.5-1.5B-Instruct

# Chunk Sizes
CHUNK_SIZES=100,400

# Vector Store Path
VECTOR_STORE_PATH=./vector_store

# Streamlit Configuration
STREAMLIT_SERVER_PORT=8501
STREAMLIT_SERVER_ADDRESS=0.0.0.0
```

### Performance Tuning
- **Chunk Sizes**: Adjust for different document types (smaller for technical docs, larger for narratives)
- **Top-k Results**: Increase for more comprehensive answers, decrease for faster responses
- **Model Selection**: Choose between Qwen 2.5 1.5B and distilgpt2 based on performance needs

## 📊 Performance

### Optimization Features
- **Parallel Processing**: Documents are loaded concurrently for faster initialization
- **Optimized Search**: Hybrid retrieval combines the best of vector and keyword search
- **Memory Efficient**: Uses CPU-optimized models for deployment compatibility
- **Caching**: FAISS index and metadata are cached for faster subsequent queries

### Expected Performance
- **Document Loading**: ~2-5 seconds per PDF (depending on size)
- **Query Response**: ~1-3 seconds for typical questions
- **Memory Usage**: ~2-4GB RAM for typical document collections
- **Storage**: ~100MB per 1000 document chunks

## 🔧 Development

### Project Structure
```
convAI/
├── app.py                 # Main Streamlit application
├── rag_system.py          # Core RAG system implementation
├── pdf_processor.py       # PDF processing utilities
├── requirements.txt       # Python dependencies
├── Dockerfile            # Container configuration
├── docker-compose.yml    # Multi-container setup
├── README.md             # This file
├── DEPLOYMENT_GUIDE.md   # Detailed deployment instructions
├── test_deployment.py    # Deployment testing script
├── test_docker.py        # Docker testing script
└── src/
    └── streamlit_app.py  # Sample Streamlit app
```

### Testing
```bash
# Test deployment readiness
python test_deployment.py

# Test Docker configuration
python test_docker.py

# Run local tests
streamlit run app.py
```

## 🐛 Troubleshooting

### Common Issues

1. **Model Loading Errors**
   - Check internet connectivity for model downloads
   - Verify sufficient disk space
   - Try the fallback model (distilgpt2)

2. **Memory Issues**
   - Reduce chunk sizes
   - Use smaller embedding models
   - Limit the number of documents

3. **Performance Issues**
   - Adjust top-k parameter
   - Use sparse search for keyword-heavy queries
   - Consider hardware upgrades

4. **Docker Issues**
   - Check Docker installation
   - Verify port availability
   - Check container logs

### Getting Help
- Check the logs in your Space's "Logs" tab
- Review the deployment guide for common solutions
- Create an issue in the project repository

## 🤝 Contributing

We welcome contributions! Please see our contributing guidelines for:
- Code style and standards
- Testing requirements
- Documentation updates
- Feature requests and bug reports

## 📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

## 🙏 Acknowledgments

- **Hugging Face** for providing the platform and models
- **FAISS** team for the efficient vector search library
- **Streamlit** team for the excellent web framework
- **OpenAI** for inspiring the RAG architecture

---

*Built with ❤️ for efficient document question-answering*

**Ready to explore your documents? Start asking questions! 🚀**