# ============================================================================= # RAG System Dependencies for Hugging Face Spaces Deployment # ============================================================================= # This file contains all the Python packages required for the RAG system # to function properly in a Docker container environment. # ============================================================================= # CORE WEB FRAMEWORK # ============================================================================= # Streamlit - Modern web framework for data applications # Provides the interactive web interface for the RAG system streamlit==1.28.1 # ============================================================================= # DEEP LEARNING & AI FRAMEWORKS # ============================================================================= # PyTorch - Deep learning framework for model inference # Required for running the language models (Qwen, distilgpt2) torch==2.1.0 # Transformers - Hugging Face library for pre-trained models # Provides access to language models and tokenizers transformers>=4.36.0 # ============================================================================= # EMBEDDING & VECTOR SEARCH # ============================================================================= # Sentence Transformers - Library for sentence embeddings # Used for converting text to vector representations sentence-transformers==2.2.2 # FAISS CPU - Facebook AI Similarity Search for vector indexing # Provides efficient similarity search for document retrieval faiss-cpu==1.7.4 # ============================================================================= # MACHINE LEARNING & DATA PROCESSING # ============================================================================= # Scikit-learn - Machine learning utilities # Used for data preprocessing and BM25 implementation scikit-learn==1.3.2 # Rank BM25 - Implementation of BM25 ranking algorithm # Provides keyword-based sparse retrieval functionality rank-bm25==0.2.2 # ============================================================================= # DOCUMENT PROCESSING # ============================================================================= # PyPDF - Modern PDF processing library # Used for extracting text and metadata from PDF documents pypdf==3.17.1 # ============================================================================= # DATA MANIPULATION & ANALYSIS # ============================================================================= # Pandas - Data manipulation and analysis library # Used for data structure management and processing pandas==2.1.3 # NumPy - Numerical computing library # Provides mathematical operations and array handling numpy==1.24.3 # ============================================================================= # UTILITIES & LOGGING # ============================================================================= # Loguru - Advanced logging library # Provides structured logging with better formatting and features loguru==0.7.2 # TQDM - Progress bar library # Shows progress for long-running operations tqdm==4.66.1 # ============================================================================= # MODEL OPTIMIZATION & DEPLOYMENT # ============================================================================= # Accelerate - Hugging Face library for model optimization # Helps with model loading and inference optimization accelerate==0.24.1 # Hugging Face Hub - Library for accessing Hugging Face models # Provides utilities for downloading and managing models huggingface-hub==0.19.4 # ============================================================================= # GUARD RAIL DEPENDENCIES # ============================================================================= # Additional libraries for enhanced security and validation # These are optional but recommended for production deployments