--- title: RAG System with PDF Documents emoji: πŸ€– colorFrom: blue colorTo: purple sdk: docker sdk_version: latest app_file: app.py pinned: false app_port: 8501 --- # πŸ€– Conversational AI RAG System A comprehensive Retrieval-Augmented Generation (RAG) system with advanced guard rails, built with Streamlit, FAISS, and Hugging Face models. ## πŸš€ Features - **Hybrid Search**: Combines dense (FAISS) and sparse (BM25) retrieval for optimal results - **Advanced Guard Rails**: Comprehensive safety and security measures - **Multiple Models**: Support for Qwen 2.5 1.5B and distilgpt2 fallback - **PDF Processing**: Intelligent document chunking and processing - **Real-time Monitoring**: Performance metrics and system health checks - **Docker Support**: Containerized deployment with Docker Compose - **Hugging Face Spaces Ready**: Optimized for HF Spaces deployment ## πŸ—οΈ Architecture ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Streamlit UI │───▢│ RAG System │───▢│ Guard Rails β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ PDF Processor β”‚ β”‚ FAISS Index β”‚ β”‚ Language Model β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ## πŸ› οΈ Technology Stack ### Core Technologies - **πŸ” Vector Database**: FAISS for efficient similarity search - **πŸ“ Sparse Retrieval**: BM25 for keyword-based search - **🧠 Embedding Model**: all-MiniLM-L6-v2 for document embeddings - **πŸ€– Generative Model**: Qwen 2.5 1.5B for answer generation - **🌐 UI Framework**: Streamlit for interactive interface - **🐳 Containerization**: Docker for deployment ### Supporting Libraries - **πŸ“Š Data Processing**: Pandas, NumPy for data manipulation - **πŸ“„ PDF Handling**: PyPDF for document processing - **πŸ”§ ML Utilities**: Scikit-learn for preprocessing - **πŸ“ Logging**: Loguru for structured logging - **⚑ Optimization**: Accelerate for model optimization ## πŸš€ Quick Start ### Local Development 1. **Clone and Setup**: ```bash git clone cd convAI pip install -r requirements.txt ``` 2. **Run the Application**: ```bash streamlit run app.py ``` 3. **Upload PDFs and Start Chatting**! ### Docker Deployment 1. **Build and Run**: ```bash docker-compose up --build ``` 2. **Access at**: http://localhost:8501 ## 🌟 Hugging Face Spaces Deployment This application is optimized for deployment on Hugging Face Spaces. The system automatically: - Uses `/tmp` directories for cache storage (writable in HF Spaces) - Configures environment variables for HF Spaces compatibility - Handles permission issues automatically - Optimizes model loading for HF Spaces environment ### HF Spaces Configuration The application includes: - **Cache Management**: All model caches stored in `/tmp` directories - **Permission Handling**: Automatic fallback to writable directories - **Environment Detection**: Adapts to HF Spaces runtime environment - **Resource Optimization**: Efficient memory and CPU usage ### Deploy to HF Spaces 1. **Create a new Space** on Hugging Face 2. **Choose Docker** as the SDK 3. **Upload all files** from this repository 4. **The system will automatically**: - Set up cache directories in `/tmp` - Download and cache models - Initialize the RAG system with guard rails - Start the Streamlit interface ### HF Spaces Environment Variables The system automatically configures: ```bash HF_HOME=/tmp/huggingface TRANSFORMERS_CACHE=/tmp/huggingface/transformers TORCH_HOME=/tmp/torch XDG_CACHE_HOME=/tmp HF_HUB_CACHE=/tmp/huggingface/hub ``` ## πŸ“– Usage Guide ### Document Upload - **Automatic Loading**: PDF documents in the container are loaded automatically - **Manual Upload**: Use the sidebar to upload additional PDF documents - **Supported Formats**: PDF files with text content ### Search Methods - **πŸ”€ Hybrid**: Combines vector similarity and keyword matching (recommended) - **🎯 Dense**: Uses only vector similarity search - **πŸ“ Sparse**: Uses only keyword-based BM25 search ### Query Interface - **Natural Language**: Ask questions in plain English - **Context Awareness**: System uses retrieved documents for context - **Confidence Scores**: See how confident the system is in its answers - **Source Citations**: View which documents were used for the answer ## βš™οΈ Configuration ### Environment Variables ```bash # Model Configuration EMBEDDING_MODEL=all-MiniLM-L6-v2 GENERATIVE_MODEL=Qwen/Qwen2.5-1.5B-Instruct # Chunk Sizes CHUNK_SIZES=100,400 # Vector Store Path VECTOR_STORE_PATH=./vector_store # Streamlit Configuration STREAMLIT_SERVER_PORT=8501 STREAMLIT_SERVER_ADDRESS=0.0.0.0 ``` ### Performance Tuning - **Chunk Sizes**: Adjust for different document types (smaller for technical docs, larger for narratives) - **Top-k Results**: Increase for more comprehensive answers, decrease for faster responses - **Model Selection**: Choose between Qwen 2.5 1.5B and distilgpt2 based on performance needs ## πŸ“Š Performance ### Optimization Features - **Parallel Processing**: Documents are loaded concurrently for faster initialization - **Optimized Search**: Hybrid retrieval combines the best of vector and keyword search - **Memory Efficient**: Uses CPU-optimized models for deployment compatibility - **Caching**: FAISS index and metadata are cached for faster subsequent queries ### Expected Performance - **Document Loading**: ~2-5 seconds per PDF (depending on size) - **Query Response**: ~1-3 seconds for typical questions - **Memory Usage**: ~2-4GB RAM for typical document collections - **Storage**: ~100MB per 1000 document chunks ## πŸ”§ Development ### Project Structure ``` convAI/ β”œβ”€β”€ app.py # Main Streamlit application β”œβ”€β”€ rag_system.py # Core RAG system implementation β”œβ”€β”€ pdf_processor.py # PDF processing utilities β”œβ”€β”€ requirements.txt # Python dependencies β”œβ”€β”€ Dockerfile # Container configuration β”œβ”€β”€ docker-compose.yml # Multi-container setup β”œβ”€β”€ README.md # This file β”œβ”€β”€ DEPLOYMENT_GUIDE.md # Detailed deployment instructions β”œβ”€β”€ test_deployment.py # Deployment testing script β”œβ”€β”€ test_docker.py # Docker testing script └── src/ └── streamlit_app.py # Sample Streamlit app ``` ### Testing ```bash # Test deployment readiness python test_deployment.py # Test Docker configuration python test_docker.py # Run local tests streamlit run app.py ``` ## πŸ› Troubleshooting ### Common Issues 1. **Model Loading Errors** - Check internet connectivity for model downloads - Verify sufficient disk space - Try the fallback model (distilgpt2) 2. **Memory Issues** - Reduce chunk sizes - Use smaller embedding models - Limit the number of documents 3. **Performance Issues** - Adjust top-k parameter - Use sparse search for keyword-heavy queries - Consider hardware upgrades 4. **Docker Issues** - Check Docker installation - Verify port availability - Check container logs ### Getting Help - Check the logs in your Space's "Logs" tab - Review the deployment guide for common solutions - Create an issue in the project repository ## 🀝 Contributing We welcome contributions! Please see our contributing guidelines for: - Code style and standards - Testing requirements - Documentation updates - Feature requests and bug reports ## πŸ“„ License This project is licensed under the MIT License - see the LICENSE file for details. ## πŸ™ Acknowledgments - **Hugging Face** for providing the platform and models - **FAISS** team for the efficient vector search library - **Streamlit** team for the excellent web framework - **OpenAI** for inspiring the RAG architecture --- *Built with ❀️ for efficient document question-answering* **Ready to explore your documents? Start asking questions! πŸš€**