# 🚀 Hugging Face Spaces Deployment Guide This guide provides step-by-step instructions for deploying the RAG system on Hugging Face Spaces. ## 📋 Prerequisites - Hugging Face account - Git repository with the RAG system code - Basic understanding of Docker containers ## 🎯 Quick Deployment ### Step 1: Create a New Space 1. Go to [Hugging Face Spaces](https://huggingface.co/spaces) 2. Click **"Create new Space"** 3. Choose **"Docker"** as the SDK 4. Set **Space name** (e.g., `my-rag-system`) 5. Choose **Public** or **Private** visibility 6. Click **"Create Space"** ### Step 2: Upload Files Upload all files from this repository to your Space: ``` 📁 Your Space Repository ├── 📄 app.py # Main Streamlit application ├── 📄 rag_system.py # Core RAG system ├── 📄 pdf_processor.py # PDF processing utilities ├── 📄 guard_rails.py # Safety and security system ├── 📄 hf_spaces_config.py # HF Spaces configuration ├── 📄 requirements.txt # Python dependencies ├── 📄 Dockerfile # Container configuration ├── 📄 README.md # Project documentation ├── 📄 GUARD_RAILS_GUIDE.md # Guard rails documentation └── 📄 HF_SPACES_DEPLOYMENT.md # This deployment guide ``` ### Step 3: Configure Environment The system automatically detects HF Spaces environment and configures: - **Cache directories** in `/tmp` (writable in HF Spaces) - **Environment variables** for model loading - **Resource limits** optimized for HF Spaces - **Permission handling** for containerized environment ## 🔧 Configuration Details ### Automatic Environment Detection The system automatically detects HF Spaces using: ```python # Environment indicators 'SPACE_ID' in os.environ 'SPACE_HOST' in os.environ 'HF_HUB_ENDPOINT' in os.environ os.path.exists('/tmp/huggingface') ``` ### Cache Directory Setup ```bash # HF Spaces cache directories HF_HOME=/tmp/huggingface TRANSFORMERS_CACHE=/tmp/huggingface/transformers TORCH_HOME=/tmp/torch XDG_CACHE_HOME=/tmp HF_HUB_CACHE=/tmp/huggingface/hub ``` ### Model Configuration ```python # Optimized for HF Spaces embedding_model = 'all-MiniLM-L6-v2' # Fast, lightweight generative_model = 'Qwen/Qwen2.5-1.5B-Instruct' # Primary model fallback_model = 'distilgpt2' # Backup model ``` ## 🚀 Deployment Process ### 1. Initial Build When you first deploy, the system will: 1. **Download base image** (Python 3.11) 2. **Install dependencies** from `requirements.txt` 3. **Set up cache directories** in `/tmp` 4. **Download models** (embedding + language models) 5. **Initialize RAG system** with guard rails 6. **Start Streamlit server** on port 8501 ### 2. Model Download The system downloads these models: - **Embedding Model**: `all-MiniLM-L6-v2` (~90MB) - **Primary LLM**: `Qwen/Qwen2.5-1.5B-Instruct` (~3GB) - **Fallback LLM**: `distilgpt2` (~300MB) **Note**: First deployment may take 10-15 minutes due to model downloads. ### 3. System Initialization The RAG system initializes with: - **Guard rails enabled** for safety - **Vector store** in `./vector_store` - **PDF processing** ready - **Hybrid search** (FAISS + BM25) configured ## 📊 Resource Management ### Memory Usage - **Base system**: ~500MB - **Embedding model**: ~100MB - **Language model**: ~3GB - **Total**: ~3.6GB ### CPU Usage - **Model loading**: High (initial) - **Inference**: Medium - **Search**: Low ### Storage - **Models**: ~3.5GB - **Cache**: ~1GB - **Vector store**: Variable (based on documents) ## 🔍 Troubleshooting ### Common Issues #### 1. Permission Denied Errors **Error**: `[Errno 13] Permission denied: '/.cache'` **Solution**: The system automatically handles this by using `/tmp` directories. #### 2. Model Download Failures **Error**: `Failed to download model` **Solution**: - Check internet connectivity - Verify model names in configuration - Wait for retry (automatic) #### 3. Memory Issues **Error**: `Out of memory` **Solution**: - Use smaller models - Reduce batch sizes - Enable cache cleanup #### 4. Build Failures **Error**: `Docker build failed` **Solution**: - Check Dockerfile syntax - Verify all files are uploaded - Check requirements.txt format ### Debug Mode Enable debug logging by setting: ```python # In hf_spaces_config.py logging.basicConfig(level=logging.DEBUG) ``` ### Health Checks The system provides health check endpoints: - **System status**: `/health` - **Model status**: `/models` - **Cache status**: `/cache` ## 🔒 Security Features ### Guard Rails The system includes comprehensive guard rails: - **Input validation**: Query length, content filtering - **Output safety**: Response quality, hallucination detection - **Data privacy**: PII detection and masking - **System protection**: Rate limiting, resource monitoring ### Environment Isolation - **Containerized**: Isolated from host system - **Read-only**: File system protection - **Network**: Limited network access - **User**: Non-root user execution ## 📈 Performance Optimization ### Caching Strategy - **Model caching**: Persistent across restarts - **Vector caching**: FAISS index persistence - **Response caching**: Frequently asked questions ### Resource Optimization - **Memory**: Efficient model loading - **CPU**: Parallel processing - **Storage**: Automatic cleanup ### Monitoring - **Response times**: Real-time metrics - **Memory usage**: Resource monitoring - **Error rates**: System health tracking ## 🔄 Updates and Maintenance ### Updating Models 1. **Modify configuration** in `hf_spaces_config.py` 2. **Redeploy** the Space 3. **Models will re-download** automatically ### Updating Code 1. **Push changes** to your repository 2. **HF Spaces auto-rebuilds** the container 3. **System restarts** with new code ### Cache Management The system automatically: - **Cleans old cache** files - **Manages storage** usage - **Optimizes performance** ## 📞 Support ### Documentation - **README.md**: General project information - **GUARD_RAILS_GUIDE.md**: Safety system details - **This guide**: HF Spaces specific instructions ### Community - **Hugging Face Forums**: Community support - **GitHub Issues**: Bug reports and feature requests - **Discord**: Real-time help ## 🎉 Success Checklist - [ ] Space created successfully - [ ] All files uploaded - [ ] Build completed without errors - [ ] Models downloaded successfully - [ ] RAG system initialized - [ ] Streamlit interface accessible - [ ] Guard rails enabled - [ ] Test queries working - [ ] Performance acceptable ## 🚀 Next Steps After successful deployment: 1. **Test the system** with sample queries 2. **Upload documents** for RAG functionality 3. **Monitor performance** and resource usage 4. **Customize configuration** as needed 5. **Share your Space** with others --- **Happy Deploying! 🎉** Your RAG system is now ready to provide intelligent document question-answering capabilities on Hugging Face Spaces.