convAI / HF_SPACES_DEPLOYMENT.md
sinhapiyush86's picture
Upload 15 files
afad319 verified
# πŸš€ Hugging Face Spaces Deployment Guide
This guide provides step-by-step instructions for deploying the RAG system on Hugging Face Spaces.
## πŸ“‹ Prerequisites
- Hugging Face account
- Git repository with the RAG system code
- Basic understanding of Docker containers
## 🎯 Quick Deployment
### Step 1: Create a New Space
1. Go to [Hugging Face Spaces](https://huggingface.co/spaces)
2. Click **"Create new Space"**
3. Choose **"Docker"** as the SDK
4. Set **Space name** (e.g., `my-rag-system`)
5. Choose **Public** or **Private** visibility
6. Click **"Create Space"**
### Step 2: Upload Files
Upload all files from this repository to your Space:
```
πŸ“ Your Space Repository
β”œβ”€β”€ πŸ“„ app.py # Main Streamlit application
β”œβ”€β”€ πŸ“„ rag_system.py # Core RAG system
β”œβ”€β”€ πŸ“„ pdf_processor.py # PDF processing utilities
β”œβ”€β”€ πŸ“„ guard_rails.py # Safety and security system
β”œβ”€β”€ πŸ“„ hf_spaces_config.py # HF Spaces configuration
β”œβ”€β”€ πŸ“„ requirements.txt # Python dependencies
β”œβ”€β”€ πŸ“„ Dockerfile # Container configuration
β”œβ”€β”€ πŸ“„ README.md # Project documentation
β”œβ”€β”€ πŸ“„ GUARD_RAILS_GUIDE.md # Guard rails documentation
└── πŸ“„ HF_SPACES_DEPLOYMENT.md # This deployment guide
```
### Step 3: Configure Environment
The system automatically detects HF Spaces environment and configures:
- **Cache directories** in `/tmp` (writable in HF Spaces)
- **Environment variables** for model loading
- **Resource limits** optimized for HF Spaces
- **Permission handling** for containerized environment
## πŸ”§ Configuration Details
### Automatic Environment Detection
The system automatically detects HF Spaces using:
```python
# Environment indicators
'SPACE_ID' in os.environ
'SPACE_HOST' in os.environ
'HF_HUB_ENDPOINT' in os.environ
os.path.exists('/tmp/huggingface')
```
### Cache Directory Setup
```bash
# HF Spaces cache directories
HF_HOME=/tmp/huggingface
TRANSFORMERS_CACHE=/tmp/huggingface/transformers
TORCH_HOME=/tmp/torch
XDG_CACHE_HOME=/tmp
HF_HUB_CACHE=/tmp/huggingface/hub
```
### Model Configuration
```python
# Optimized for HF Spaces
embedding_model = 'all-MiniLM-L6-v2' # Fast, lightweight
generative_model = 'Qwen/Qwen2.5-1.5B-Instruct' # Primary model
fallback_model = 'distilgpt2' # Backup model
```
## πŸš€ Deployment Process
### 1. Initial Build
When you first deploy, the system will:
1. **Download base image** (Python 3.11)
2. **Install dependencies** from `requirements.txt`
3. **Set up cache directories** in `/tmp`
4. **Download models** (embedding + language models)
5. **Initialize RAG system** with guard rails
6. **Start Streamlit server** on port 8501
### 2. Model Download
The system downloads these models:
- **Embedding Model**: `all-MiniLM-L6-v2` (~90MB)
- **Primary LLM**: `Qwen/Qwen2.5-1.5B-Instruct` (~3GB)
- **Fallback LLM**: `distilgpt2` (~300MB)
**Note**: First deployment may take 10-15 minutes due to model downloads.
### 3. System Initialization
The RAG system initializes with:
- **Guard rails enabled** for safety
- **Vector store** in `./vector_store`
- **PDF processing** ready
- **Hybrid search** (FAISS + BM25) configured
## πŸ“Š Resource Management
### Memory Usage
- **Base system**: ~500MB
- **Embedding model**: ~100MB
- **Language model**: ~3GB
- **Total**: ~3.6GB
### CPU Usage
- **Model loading**: High (initial)
- **Inference**: Medium
- **Search**: Low
### Storage
- **Models**: ~3.5GB
- **Cache**: ~1GB
- **Vector store**: Variable (based on documents)
## πŸ” Troubleshooting
### Common Issues
#### 1. Permission Denied Errors
**Error**: `[Errno 13] Permission denied: '/.cache'`
**Solution**: The system automatically handles this by using `/tmp` directories.
#### 2. Model Download Failures
**Error**: `Failed to download model`
**Solution**:
- Check internet connectivity
- Verify model names in configuration
- Wait for retry (automatic)
#### 3. Memory Issues
**Error**: `Out of memory`
**Solution**:
- Use smaller models
- Reduce batch sizes
- Enable cache cleanup
#### 4. Build Failures
**Error**: `Docker build failed`
**Solution**:
- Check Dockerfile syntax
- Verify all files are uploaded
- Check requirements.txt format
### Debug Mode
Enable debug logging by setting:
```python
# In hf_spaces_config.py
logging.basicConfig(level=logging.DEBUG)
```
### Health Checks
The system provides health check endpoints:
- **System status**: `/health`
- **Model status**: `/models`
- **Cache status**: `/cache`
## πŸ”’ Security Features
### Guard Rails
The system includes comprehensive guard rails:
- **Input validation**: Query length, content filtering
- **Output safety**: Response quality, hallucination detection
- **Data privacy**: PII detection and masking
- **System protection**: Rate limiting, resource monitoring
### Environment Isolation
- **Containerized**: Isolated from host system
- **Read-only**: File system protection
- **Network**: Limited network access
- **User**: Non-root user execution
## πŸ“ˆ Performance Optimization
### Caching Strategy
- **Model caching**: Persistent across restarts
- **Vector caching**: FAISS index persistence
- **Response caching**: Frequently asked questions
### Resource Optimization
- **Memory**: Efficient model loading
- **CPU**: Parallel processing
- **Storage**: Automatic cleanup
### Monitoring
- **Response times**: Real-time metrics
- **Memory usage**: Resource monitoring
- **Error rates**: System health tracking
## πŸ”„ Updates and Maintenance
### Updating Models
1. **Modify configuration** in `hf_spaces_config.py`
2. **Redeploy** the Space
3. **Models will re-download** automatically
### Updating Code
1. **Push changes** to your repository
2. **HF Spaces auto-rebuilds** the container
3. **System restarts** with new code
### Cache Management
The system automatically:
- **Cleans old cache** files
- **Manages storage** usage
- **Optimizes performance**
## πŸ“ž Support
### Documentation
- **README.md**: General project information
- **GUARD_RAILS_GUIDE.md**: Safety system details
- **This guide**: HF Spaces specific instructions
### Community
- **Hugging Face Forums**: Community support
- **GitHub Issues**: Bug reports and feature requests
- **Discord**: Real-time help
## πŸŽ‰ Success Checklist
- [ ] Space created successfully
- [ ] All files uploaded
- [ ] Build completed without errors
- [ ] Models downloaded successfully
- [ ] RAG system initialized
- [ ] Streamlit interface accessible
- [ ] Guard rails enabled
- [ ] Test queries working
- [ ] Performance acceptable
## πŸš€ Next Steps
After successful deployment:
1. **Test the system** with sample queries
2. **Upload documents** for RAG functionality
3. **Monitor performance** and resource usage
4. **Customize configuration** as needed
5. **Share your Space** with others
---
**Happy Deploying! πŸŽ‰**
Your RAG system is now ready to provide intelligent document question-answering capabilities on Hugging Face Spaces.