convAI / README.md
sinhapiyush86's picture
Upload 15 files
afad319 verified
---
title: RAG System with PDF Documents
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: docker
sdk_version: latest
app_file: app.py
pinned: false
app_port: 8501
---
# πŸ€– Conversational AI RAG System
A comprehensive Retrieval-Augmented Generation (RAG) system with advanced guard rails, built with Streamlit, FAISS, and Hugging Face models.
## πŸš€ Features
- **Hybrid Search**: Combines dense (FAISS) and sparse (BM25) retrieval for optimal results
- **Advanced Guard Rails**: Comprehensive safety and security measures
- **Multiple Models**: Support for Qwen 2.5 1.5B and distilgpt2 fallback
- **PDF Processing**: Intelligent document chunking and processing
- **Real-time Monitoring**: Performance metrics and system health checks
- **Docker Support**: Containerized deployment with Docker Compose
- **Hugging Face Spaces Ready**: Optimized for HF Spaces deployment
## πŸ—οΈ Architecture
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Streamlit UI │───▢│ RAG System │───▢│ Guard Rails β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ PDF Processor β”‚ β”‚ FAISS Index β”‚ β”‚ Language Model β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
## πŸ› οΈ Technology Stack
### Core Technologies
- **πŸ” Vector Database**: FAISS for efficient similarity search
- **πŸ“ Sparse Retrieval**: BM25 for keyword-based search
- **🧠 Embedding Model**: all-MiniLM-L6-v2 for document embeddings
- **πŸ€– Generative Model**: Qwen 2.5 1.5B for answer generation
- **🌐 UI Framework**: Streamlit for interactive interface
- **🐳 Containerization**: Docker for deployment
### Supporting Libraries
- **πŸ“Š Data Processing**: Pandas, NumPy for data manipulation
- **πŸ“„ PDF Handling**: PyPDF for document processing
- **πŸ”§ ML Utilities**: Scikit-learn for preprocessing
- **πŸ“ Logging**: Loguru for structured logging
- **⚑ Optimization**: Accelerate for model optimization
## πŸš€ Quick Start
### Local Development
1. **Clone and Setup**:
```bash
git clone <repository-url>
cd convAI
pip install -r requirements.txt
```
2. **Run the Application**:
```bash
streamlit run app.py
```
3. **Upload PDFs and Start Chatting**!
### Docker Deployment
1. **Build and Run**:
```bash
docker-compose up --build
```
2. **Access at**: http://localhost:8501
## 🌟 Hugging Face Spaces Deployment
This application is optimized for deployment on Hugging Face Spaces. The system automatically:
- Uses `/tmp` directories for cache storage (writable in HF Spaces)
- Configures environment variables for HF Spaces compatibility
- Handles permission issues automatically
- Optimizes model loading for HF Spaces environment
### HF Spaces Configuration
The application includes:
- **Cache Management**: All model caches stored in `/tmp` directories
- **Permission Handling**: Automatic fallback to writable directories
- **Environment Detection**: Adapts to HF Spaces runtime environment
- **Resource Optimization**: Efficient memory and CPU usage
### Deploy to HF Spaces
1. **Create a new Space** on Hugging Face
2. **Choose Docker** as the SDK
3. **Upload all files** from this repository
4. **The system will automatically**:
- Set up cache directories in `/tmp`
- Download and cache models
- Initialize the RAG system with guard rails
- Start the Streamlit interface
### HF Spaces Environment Variables
The system automatically configures:
```bash
HF_HOME=/tmp/huggingface
TRANSFORMERS_CACHE=/tmp/huggingface/transformers
TORCH_HOME=/tmp/torch
XDG_CACHE_HOME=/tmp
HF_HUB_CACHE=/tmp/huggingface/hub
```
## πŸ“– Usage Guide
### Document Upload
- **Automatic Loading**: PDF documents in the container are loaded automatically
- **Manual Upload**: Use the sidebar to upload additional PDF documents
- **Supported Formats**: PDF files with text content
### Search Methods
- **πŸ”€ Hybrid**: Combines vector similarity and keyword matching (recommended)
- **🎯 Dense**: Uses only vector similarity search
- **πŸ“ Sparse**: Uses only keyword-based BM25 search
### Query Interface
- **Natural Language**: Ask questions in plain English
- **Context Awareness**: System uses retrieved documents for context
- **Confidence Scores**: See how confident the system is in its answers
- **Source Citations**: View which documents were used for the answer
## βš™οΈ Configuration
### Environment Variables
```bash
# Model Configuration
EMBEDDING_MODEL=all-MiniLM-L6-v2
GENERATIVE_MODEL=Qwen/Qwen2.5-1.5B-Instruct
# Chunk Sizes
CHUNK_SIZES=100,400
# Vector Store Path
VECTOR_STORE_PATH=./vector_store
# Streamlit Configuration
STREAMLIT_SERVER_PORT=8501
STREAMLIT_SERVER_ADDRESS=0.0.0.0
```
### Performance Tuning
- **Chunk Sizes**: Adjust for different document types (smaller for technical docs, larger for narratives)
- **Top-k Results**: Increase for more comprehensive answers, decrease for faster responses
- **Model Selection**: Choose between Qwen 2.5 1.5B and distilgpt2 based on performance needs
## πŸ“Š Performance
### Optimization Features
- **Parallel Processing**: Documents are loaded concurrently for faster initialization
- **Optimized Search**: Hybrid retrieval combines the best of vector and keyword search
- **Memory Efficient**: Uses CPU-optimized models for deployment compatibility
- **Caching**: FAISS index and metadata are cached for faster subsequent queries
### Expected Performance
- **Document Loading**: ~2-5 seconds per PDF (depending on size)
- **Query Response**: ~1-3 seconds for typical questions
- **Memory Usage**: ~2-4GB RAM for typical document collections
- **Storage**: ~100MB per 1000 document chunks
## πŸ”§ Development
### Project Structure
```
convAI/
β”œβ”€β”€ app.py # Main Streamlit application
β”œβ”€β”€ rag_system.py # Core RAG system implementation
β”œβ”€β”€ pdf_processor.py # PDF processing utilities
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ Dockerfile # Container configuration
β”œβ”€β”€ docker-compose.yml # Multi-container setup
β”œβ”€β”€ README.md # This file
β”œβ”€β”€ DEPLOYMENT_GUIDE.md # Detailed deployment instructions
β”œβ”€β”€ test_deployment.py # Deployment testing script
β”œβ”€β”€ test_docker.py # Docker testing script
└── src/
└── streamlit_app.py # Sample Streamlit app
```
### Testing
```bash
# Test deployment readiness
python test_deployment.py
# Test Docker configuration
python test_docker.py
# Run local tests
streamlit run app.py
```
## πŸ› Troubleshooting
### Common Issues
1. **Model Loading Errors**
- Check internet connectivity for model downloads
- Verify sufficient disk space
- Try the fallback model (distilgpt2)
2. **Memory Issues**
- Reduce chunk sizes
- Use smaller embedding models
- Limit the number of documents
3. **Performance Issues**
- Adjust top-k parameter
- Use sparse search for keyword-heavy queries
- Consider hardware upgrades
4. **Docker Issues**
- Check Docker installation
- Verify port availability
- Check container logs
### Getting Help
- Check the logs in your Space's "Logs" tab
- Review the deployment guide for common solutions
- Create an issue in the project repository
## 🀝 Contributing
We welcome contributions! Please see our contributing guidelines for:
- Code style and standards
- Testing requirements
- Documentation updates
- Feature requests and bug reports
## πŸ“„ License
This project is licensed under the MIT License - see the LICENSE file for details.
## πŸ™ Acknowledgments
- **Hugging Face** for providing the platform and models
- **FAISS** team for the efficient vector search library
- **Streamlit** team for the excellent web framework
- **OpenAI** for inspiring the RAG architecture
---
*Built with ❀️ for efficient document question-answering*
**Ready to explore your documents? Start asking questions! πŸš€**