Spaces:
Sleeping
Sleeping
metadata
title: RAG System with PDF Documents
emoji: π€
colorFrom: blue
colorTo: purple
sdk: docker
sdk_version: latest
app_file: app.py
pinned: false
app_port: 8501
π€ Conversational AI RAG System
A comprehensive Retrieval-Augmented Generation (RAG) system with advanced guard rails, built with Streamlit, FAISS, and Hugging Face models.
π Features
- Hybrid Search: Combines dense (FAISS) and sparse (BM25) retrieval for optimal results
- Advanced Guard Rails: Comprehensive safety and security measures
- Multiple Models: Support for Qwen 2.5 1.5B and distilgpt2 fallback
- PDF Processing: Intelligent document chunking and processing
- Real-time Monitoring: Performance metrics and system health checks
- Docker Support: Containerized deployment with Docker Compose
- Hugging Face Spaces Ready: Optimized for HF Spaces deployment
ποΈ Architecture
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Streamlit UI βββββΆβ RAG System βββββΆβ Guard Rails β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β PDF Processor β β FAISS Index β β Language Model β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
π οΈ Technology Stack
Core Technologies
- π Vector Database: FAISS for efficient similarity search
- π Sparse Retrieval: BM25 for keyword-based search
- π§ Embedding Model: all-MiniLM-L6-v2 for document embeddings
- π€ Generative Model: Qwen 2.5 1.5B for answer generation
- π UI Framework: Streamlit for interactive interface
- π³ Containerization: Docker for deployment
Supporting Libraries
- π Data Processing: Pandas, NumPy for data manipulation
- π PDF Handling: PyPDF for document processing
- π§ ML Utilities: Scikit-learn for preprocessing
- π Logging: Loguru for structured logging
- β‘ Optimization: Accelerate for model optimization
π Quick Start
Local Development
- Clone and Setup:
git clone <repository-url>
cd convAI
pip install -r requirements.txt
- Run the Application:
streamlit run app.py
- Upload PDFs and Start Chatting!
Docker Deployment
- Build and Run:
docker-compose up --build
- Access at: http://localhost:8501
π Hugging Face Spaces Deployment
This application is optimized for deployment on Hugging Face Spaces. The system automatically:
- Uses
/tmpdirectories for cache storage (writable in HF Spaces) - Configures environment variables for HF Spaces compatibility
- Handles permission issues automatically
- Optimizes model loading for HF Spaces environment
HF Spaces Configuration
The application includes:
- Cache Management: All model caches stored in
/tmpdirectories - Permission Handling: Automatic fallback to writable directories
- Environment Detection: Adapts to HF Spaces runtime environment
- Resource Optimization: Efficient memory and CPU usage
Deploy to HF Spaces
- Create a new Space on Hugging Face
- Choose Docker as the SDK
- Upload all files from this repository
- The system will automatically:
- Set up cache directories in
/tmp - Download and cache models
- Initialize the RAG system with guard rails
- Start the Streamlit interface
- Set up cache directories in
HF Spaces Environment Variables
The system automatically configures:
HF_HOME=/tmp/huggingface
TRANSFORMERS_CACHE=/tmp/huggingface/transformers
TORCH_HOME=/tmp/torch
XDG_CACHE_HOME=/tmp
HF_HUB_CACHE=/tmp/huggingface/hub
π Usage Guide
Document Upload
- Automatic Loading: PDF documents in the container are loaded automatically
- Manual Upload: Use the sidebar to upload additional PDF documents
- Supported Formats: PDF files with text content
Search Methods
- π Hybrid: Combines vector similarity and keyword matching (recommended)
- π― Dense: Uses only vector similarity search
- π Sparse: Uses only keyword-based BM25 search
Query Interface
- Natural Language: Ask questions in plain English
- Context Awareness: System uses retrieved documents for context
- Confidence Scores: See how confident the system is in its answers
- Source Citations: View which documents were used for the answer
βοΈ Configuration
Environment Variables
# Model Configuration
EMBEDDING_MODEL=all-MiniLM-L6-v2
GENERATIVE_MODEL=Qwen/Qwen2.5-1.5B-Instruct
# Chunk Sizes
CHUNK_SIZES=100,400
# Vector Store Path
VECTOR_STORE_PATH=./vector_store
# Streamlit Configuration
STREAMLIT_SERVER_PORT=8501
STREAMLIT_SERVER_ADDRESS=0.0.0.0
Performance Tuning
- Chunk Sizes: Adjust for different document types (smaller for technical docs, larger for narratives)
- Top-k Results: Increase for more comprehensive answers, decrease for faster responses
- Model Selection: Choose between Qwen 2.5 1.5B and distilgpt2 based on performance needs
π Performance
Optimization Features
- Parallel Processing: Documents are loaded concurrently for faster initialization
- Optimized Search: Hybrid retrieval combines the best of vector and keyword search
- Memory Efficient: Uses CPU-optimized models for deployment compatibility
- Caching: FAISS index and metadata are cached for faster subsequent queries
Expected Performance
- Document Loading: ~2-5 seconds per PDF (depending on size)
- Query Response: ~1-3 seconds for typical questions
- Memory Usage: ~2-4GB RAM for typical document collections
- Storage: ~100MB per 1000 document chunks
π§ Development
Project Structure
convAI/
βββ app.py # Main Streamlit application
βββ rag_system.py # Core RAG system implementation
βββ pdf_processor.py # PDF processing utilities
βββ requirements.txt # Python dependencies
βββ Dockerfile # Container configuration
βββ docker-compose.yml # Multi-container setup
βββ README.md # This file
βββ DEPLOYMENT_GUIDE.md # Detailed deployment instructions
βββ test_deployment.py # Deployment testing script
βββ test_docker.py # Docker testing script
βββ src/
βββ streamlit_app.py # Sample Streamlit app
Testing
# Test deployment readiness
python test_deployment.py
# Test Docker configuration
python test_docker.py
# Run local tests
streamlit run app.py
π Troubleshooting
Common Issues
Model Loading Errors
- Check internet connectivity for model downloads
- Verify sufficient disk space
- Try the fallback model (distilgpt2)
Memory Issues
- Reduce chunk sizes
- Use smaller embedding models
- Limit the number of documents
Performance Issues
- Adjust top-k parameter
- Use sparse search for keyword-heavy queries
- Consider hardware upgrades
Docker Issues
- Check Docker installation
- Verify port availability
- Check container logs
Getting Help
- Check the logs in your Space's "Logs" tab
- Review the deployment guide for common solutions
- Create an issue in the project repository
π€ Contributing
We welcome contributions! Please see our contributing guidelines for:
- Code style and standards
- Testing requirements
- Documentation updates
- Feature requests and bug reports
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Acknowledgments
- Hugging Face for providing the platform and models
- FAISS team for the efficient vector search library
- Streamlit team for the excellent web framework
- OpenAI for inspiring the RAG architecture
Built with β€οΈ for efficient document question-answering
Ready to explore your documents? Start asking questions! π