Spaces:
Sleeping
Sleeping
| # π Hugging Face Spaces Deployment Guide | |
| This guide provides step-by-step instructions for deploying the RAG system on Hugging Face Spaces. | |
| ## π Prerequisites | |
| - Hugging Face account | |
| - Git repository with the RAG system code | |
| - Basic understanding of Docker containers | |
| ## π― Quick Deployment | |
| ### Step 1: Create a New Space | |
| 1. Go to [Hugging Face Spaces](https://huggingface.co/spaces) | |
| 2. Click **"Create new Space"** | |
| 3. Choose **"Docker"** as the SDK | |
| 4. Set **Space name** (e.g., `my-rag-system`) | |
| 5. Choose **Public** or **Private** visibility | |
| 6. Click **"Create Space"** | |
| ### Step 2: Upload Files | |
| Upload all files from this repository to your Space: | |
| ``` | |
| π Your Space Repository | |
| βββ π app.py # Main Streamlit application | |
| βββ π rag_system.py # Core RAG system | |
| βββ π pdf_processor.py # PDF processing utilities | |
| βββ π guard_rails.py # Safety and security system | |
| βββ π hf_spaces_config.py # HF Spaces configuration | |
| βββ π requirements.txt # Python dependencies | |
| βββ π Dockerfile # Container configuration | |
| βββ π README.md # Project documentation | |
| βββ π GUARD_RAILS_GUIDE.md # Guard rails documentation | |
| βββ π HF_SPACES_DEPLOYMENT.md # This deployment guide | |
| ``` | |
| ### Step 3: Configure Environment | |
| The system automatically detects HF Spaces environment and configures: | |
| - **Cache directories** in `/tmp` (writable in HF Spaces) | |
| - **Environment variables** for model loading | |
| - **Resource limits** optimized for HF Spaces | |
| - **Permission handling** for containerized environment | |
| ## π§ Configuration Details | |
| ### Automatic Environment Detection | |
| The system automatically detects HF Spaces using: | |
| ```python | |
| # Environment indicators | |
| 'SPACE_ID' in os.environ | |
| 'SPACE_HOST' in os.environ | |
| 'HF_HUB_ENDPOINT' in os.environ | |
| os.path.exists('/tmp/huggingface') | |
| ``` | |
| ### Cache Directory Setup | |
| ```bash | |
| # HF Spaces cache directories | |
| HF_HOME=/tmp/huggingface | |
| TRANSFORMERS_CACHE=/tmp/huggingface/transformers | |
| TORCH_HOME=/tmp/torch | |
| XDG_CACHE_HOME=/tmp | |
| HF_HUB_CACHE=/tmp/huggingface/hub | |
| ``` | |
| ### Model Configuration | |
| ```python | |
| # Optimized for HF Spaces | |
| embedding_model = 'all-MiniLM-L6-v2' # Fast, lightweight | |
| generative_model = 'Qwen/Qwen2.5-1.5B-Instruct' # Primary model | |
| fallback_model = 'distilgpt2' # Backup model | |
| ``` | |
| ## π Deployment Process | |
| ### 1. Initial Build | |
| When you first deploy, the system will: | |
| 1. **Download base image** (Python 3.11) | |
| 2. **Install dependencies** from `requirements.txt` | |
| 3. **Set up cache directories** in `/tmp` | |
| 4. **Download models** (embedding + language models) | |
| 5. **Initialize RAG system** with guard rails | |
| 6. **Start Streamlit server** on port 8501 | |
| ### 2. Model Download | |
| The system downloads these models: | |
| - **Embedding Model**: `all-MiniLM-L6-v2` (~90MB) | |
| - **Primary LLM**: `Qwen/Qwen2.5-1.5B-Instruct` (~3GB) | |
| - **Fallback LLM**: `distilgpt2` (~300MB) | |
| **Note**: First deployment may take 10-15 minutes due to model downloads. | |
| ### 3. System Initialization | |
| The RAG system initializes with: | |
| - **Guard rails enabled** for safety | |
| - **Vector store** in `./vector_store` | |
| - **PDF processing** ready | |
| - **Hybrid search** (FAISS + BM25) configured | |
| ## π Resource Management | |
| ### Memory Usage | |
| - **Base system**: ~500MB | |
| - **Embedding model**: ~100MB | |
| - **Language model**: ~3GB | |
| - **Total**: ~3.6GB | |
| ### CPU Usage | |
| - **Model loading**: High (initial) | |
| - **Inference**: Medium | |
| - **Search**: Low | |
| ### Storage | |
| - **Models**: ~3.5GB | |
| - **Cache**: ~1GB | |
| - **Vector store**: Variable (based on documents) | |
| ## π Troubleshooting | |
| ### Common Issues | |
| #### 1. Permission Denied Errors | |
| **Error**: `[Errno 13] Permission denied: '/.cache'` | |
| **Solution**: The system automatically handles this by using `/tmp` directories. | |
| #### 2. Model Download Failures | |
| **Error**: `Failed to download model` | |
| **Solution**: | |
| - Check internet connectivity | |
| - Verify model names in configuration | |
| - Wait for retry (automatic) | |
| #### 3. Memory Issues | |
| **Error**: `Out of memory` | |
| **Solution**: | |
| - Use smaller models | |
| - Reduce batch sizes | |
| - Enable cache cleanup | |
| #### 4. Build Failures | |
| **Error**: `Docker build failed` | |
| **Solution**: | |
| - Check Dockerfile syntax | |
| - Verify all files are uploaded | |
| - Check requirements.txt format | |
| ### Debug Mode | |
| Enable debug logging by setting: | |
| ```python | |
| # In hf_spaces_config.py | |
| logging.basicConfig(level=logging.DEBUG) | |
| ``` | |
| ### Health Checks | |
| The system provides health check endpoints: | |
| - **System status**: `/health` | |
| - **Model status**: `/models` | |
| - **Cache status**: `/cache` | |
| ## π Security Features | |
| ### Guard Rails | |
| The system includes comprehensive guard rails: | |
| - **Input validation**: Query length, content filtering | |
| - **Output safety**: Response quality, hallucination detection | |
| - **Data privacy**: PII detection and masking | |
| - **System protection**: Rate limiting, resource monitoring | |
| ### Environment Isolation | |
| - **Containerized**: Isolated from host system | |
| - **Read-only**: File system protection | |
| - **Network**: Limited network access | |
| - **User**: Non-root user execution | |
| ## π Performance Optimization | |
| ### Caching Strategy | |
| - **Model caching**: Persistent across restarts | |
| - **Vector caching**: FAISS index persistence | |
| - **Response caching**: Frequently asked questions | |
| ### Resource Optimization | |
| - **Memory**: Efficient model loading | |
| - **CPU**: Parallel processing | |
| - **Storage**: Automatic cleanup | |
| ### Monitoring | |
| - **Response times**: Real-time metrics | |
| - **Memory usage**: Resource monitoring | |
| - **Error rates**: System health tracking | |
| ## π Updates and Maintenance | |
| ### Updating Models | |
| 1. **Modify configuration** in `hf_spaces_config.py` | |
| 2. **Redeploy** the Space | |
| 3. **Models will re-download** automatically | |
| ### Updating Code | |
| 1. **Push changes** to your repository | |
| 2. **HF Spaces auto-rebuilds** the container | |
| 3. **System restarts** with new code | |
| ### Cache Management | |
| The system automatically: | |
| - **Cleans old cache** files | |
| - **Manages storage** usage | |
| - **Optimizes performance** | |
| ## π Support | |
| ### Documentation | |
| - **README.md**: General project information | |
| - **GUARD_RAILS_GUIDE.md**: Safety system details | |
| - **This guide**: HF Spaces specific instructions | |
| ### Community | |
| - **Hugging Face Forums**: Community support | |
| - **GitHub Issues**: Bug reports and feature requests | |
| - **Discord**: Real-time help | |
| ## π Success Checklist | |
| - [ ] Space created successfully | |
| - [ ] All files uploaded | |
| - [ ] Build completed without errors | |
| - [ ] Models downloaded successfully | |
| - [ ] RAG system initialized | |
| - [ ] Streamlit interface accessible | |
| - [ ] Guard rails enabled | |
| - [ ] Test queries working | |
| - [ ] Performance acceptable | |
| ## π Next Steps | |
| After successful deployment: | |
| 1. **Test the system** with sample queries | |
| 2. **Upload documents** for RAG functionality | |
| 3. **Monitor performance** and resource usage | |
| 4. **Customize configuration** as needed | |
| 5. **Share your Space** with others | |
| --- | |
| **Happy Deploying! π** | |
| Your RAG system is now ready to provide intelligent document question-answering capabilities on Hugging Face Spaces. | |