Spaces:

sinhapiyush86
/

convAI

Sleeping

App Files Files Community

convAI / HF_SPACES_DEPLOYMENT.md

sinhapiyush86

Upload 15 files

afad319 verified 6 months ago

preview code

raw

history blame contribute delete

7.1 kB

	# 🚀 Hugging Face Spaces Deployment Guide

	This guide provides step-by-step instructions for deploying the RAG system on Hugging Face Spaces.

	## 📋 Prerequisites

	- Hugging Face account
	- Git repository with the RAG system code
	- Basic understanding of Docker containers

	## 🎯 Quick Deployment

	### Step 1: Create a New Space

	1. Go to [Hugging Face Spaces](https://huggingface.co/spaces)
	2. Click "Create new Space"
	3. Choose "Docker" as the SDK
	4. Set Space name (e.g., `my-rag-system`)
	5. Choose Public or Private visibility
	6. Click "Create Space"

	### Step 2: Upload Files

	Upload all files from this repository to your Space:

	```
	📁 Your Space Repository
	├── 📄 app.py # Main Streamlit application
	├── 📄 rag_system.py # Core RAG system
	├── 📄 pdf_processor.py # PDF processing utilities
	├── 📄 guard_rails.py # Safety and security system
	├── 📄 hf_spaces_config.py # HF Spaces configuration
	├── 📄 requirements.txt # Python dependencies
	├── 📄 Dockerfile # Container configuration
	├── 📄 README.md # Project documentation
	├── 📄 GUARD_RAILS_GUIDE.md # Guard rails documentation
	└── 📄 HF_SPACES_DEPLOYMENT.md # This deployment guide
	```

	### Step 3: Configure Environment

	The system automatically detects HF Spaces environment and configures:

	- Cache directories in `/tmp` (writable in HF Spaces)
	- Environment variables for model loading
	- Resource limits optimized for HF Spaces
	- Permission handling for containerized environment

	## 🔧 Configuration Details

	### Automatic Environment Detection

	The system automatically detects HF Spaces using:

	```python
	# Environment indicators
	'SPACE_ID' in os.environ
	'SPACE_HOST' in os.environ
	'HF_HUB_ENDPOINT' in os.environ
	os.path.exists('/tmp/huggingface')
	```

	### Cache Directory Setup

	```bash
	# HF Spaces cache directories
	HF_HOME=/tmp/huggingface
	TRANSFORMERS_CACHE=/tmp/huggingface/transformers
	TORCH_HOME=/tmp/torch
	XDG_CACHE_HOME=/tmp
	HF_HUB_CACHE=/tmp/huggingface/hub
	```

	### Model Configuration

	```python
	# Optimized for HF Spaces
	embedding_model = 'all-MiniLM-L6-v2' # Fast, lightweight
	generative_model = 'Qwen/Qwen2.5-1.5B-Instruct' # Primary model
	fallback_model = 'distilgpt2' # Backup model
	```

	## 🚀 Deployment Process

	### 1. Initial Build

	When you first deploy, the system will:

	1. Download base image (Python 3.11)
	2. Install dependencies from `requirements.txt`
	3. Set up cache directories in `/tmp`
	4. Download models (embedding + language models)
	5. Initialize RAG system with guard rails
	6. Start Streamlit server on port 8501

	### 2. Model Download

	The system downloads these models:

	- Embedding Model: `all-MiniLM-L6-v2` (~90MB)
	- Primary LLM: `Qwen/Qwen2.5-1.5B-Instruct` (~3GB)
	- Fallback LLM: `distilgpt2` (~300MB)

	Note: First deployment may take 10-15 minutes due to model downloads.

	### 3. System Initialization

	The RAG system initializes with:

	- Guard rails enabled for safety
	- Vector store in `./vector_store`
	- PDF processing ready
	- Hybrid search (FAISS + BM25) configured

	## 📊 Resource Management

	### Memory Usage

	- Base system: ~500MB
	- Embedding model: ~100MB
	- Language model: ~3GB
	- Total: ~3.6GB

	### CPU Usage

	- Model loading: High (initial)
	- Inference: Medium
	- Search: Low

	### Storage

	- Models: ~3.5GB
	- Cache: ~1GB
	- Vector store: Variable (based on documents)

	## 🔍 Troubleshooting

	### Common Issues

	#### 1. Permission Denied Errors

	Error: `[Errno 13] Permission denied: '/.cache'`

	Solution: The system automatically handles this by using `/tmp` directories.

	#### 2. Model Download Failures

	Error: `Failed to download model`

	Solution:
	- Check internet connectivity
	- Verify model names in configuration
	- Wait for retry (automatic)

	#### 3. Memory Issues

	Error: `Out of memory`

	Solution:
	- Use smaller models
	- Reduce batch sizes
	- Enable cache cleanup

	#### 4. Build Failures

	Error: `Docker build failed`

	Solution:
	- Check Dockerfile syntax
	- Verify all files are uploaded
	- Check requirements.txt format

	### Debug Mode

	Enable debug logging by setting:

	```python
	# In hf_spaces_config.py
	logging.basicConfig(level=logging.DEBUG)
	```

	### Health Checks

	The system provides health check endpoints:

	- System status: `/health`
	- Model status: `/models`
	- Cache status: `/cache`

	## 🔒 Security Features

	### Guard Rails

	The system includes comprehensive guard rails:

	- Input validation: Query length, content filtering
	- Output safety: Response quality, hallucination detection
	- Data privacy: PII detection and masking
	- System protection: Rate limiting, resource monitoring

	### Environment Isolation

	- Containerized: Isolated from host system
	- Read-only: File system protection
	- Network: Limited network access
	- User: Non-root user execution

	## 📈 Performance Optimization

	### Caching Strategy

	- Model caching: Persistent across restarts
	- Vector caching: FAISS index persistence
	- Response caching: Frequently asked questions

	### Resource Optimization

	- Memory: Efficient model loading
	- CPU: Parallel processing
	- Storage: Automatic cleanup

	### Monitoring

	- Response times: Real-time metrics
	- Memory usage: Resource monitoring
	- Error rates: System health tracking

	## 🔄 Updates and Maintenance

	### Updating Models

	1. Modify configuration in `hf_spaces_config.py`
	2. Redeploy the Space
	3. Models will re-download automatically

	### Updating Code

	1. Push changes to your repository
	2. HF Spaces auto-rebuilds the container
	3. System restarts with new code

	### Cache Management

	The system automatically:

	- Cleans old cache files
	- Manages storage usage
	- Optimizes performance

	## 📞 Support

	### Documentation

	- README.md: General project information
	- GUARD_RAILS_GUIDE.md: Safety system details
	- This guide: HF Spaces specific instructions

	### Community

	- Hugging Face Forums: Community support
	- GitHub Issues: Bug reports and feature requests
	- Discord: Real-time help

	## 🎉 Success Checklist

	- [ ] Space created successfully
	- [ ] All files uploaded
	- [ ] Build completed without errors
	- [ ] Models downloaded successfully
	- [ ] RAG system initialized
	- [ ] Streamlit interface accessible
	- [ ] Guard rails enabled
	- [ ] Test queries working
	- [ ] Performance acceptable

	## 🚀 Next Steps

	After successful deployment:

	1. Test the system with sample queries
	2. Upload documents for RAG functionality
	3. Monitor performance and resource usage
	4. Customize configuration as needed
	5. Share your Space with others

	---

	Happy Deploying! 🎉

	Your RAG system is now ready to provide intelligent document question-answering capabilities on Hugging Face Spaces.