Spaces:
Sleeping
π Hugging Face Spaces Deployment Guide
This guide provides step-by-step instructions for deploying the RAG system on Hugging Face Spaces.
π Prerequisites
- Hugging Face account
- Git repository with the RAG system code
- Basic understanding of Docker containers
π― Quick Deployment
Step 1: Create a New Space
- Go to Hugging Face Spaces
- Click "Create new Space"
- Choose "Docker" as the SDK
- Set Space name (e.g.,
my-rag-system) - Choose Public or Private visibility
- Click "Create Space"
Step 2: Upload Files
Upload all files from this repository to your Space:
π Your Space Repository
βββ π app.py # Main Streamlit application
βββ π rag_system.py # Core RAG system
βββ π pdf_processor.py # PDF processing utilities
βββ π guard_rails.py # Safety and security system
βββ π hf_spaces_config.py # HF Spaces configuration
βββ π requirements.txt # Python dependencies
βββ π Dockerfile # Container configuration
βββ π README.md # Project documentation
βββ π GUARD_RAILS_GUIDE.md # Guard rails documentation
βββ π HF_SPACES_DEPLOYMENT.md # This deployment guide
Step 3: Configure Environment
The system automatically detects HF Spaces environment and configures:
- Cache directories in
/tmp(writable in HF Spaces) - Environment variables for model loading
- Resource limits optimized for HF Spaces
- Permission handling for containerized environment
π§ Configuration Details
Automatic Environment Detection
The system automatically detects HF Spaces using:
# Environment indicators
'SPACE_ID' in os.environ
'SPACE_HOST' in os.environ
'HF_HUB_ENDPOINT' in os.environ
os.path.exists('/tmp/huggingface')
Cache Directory Setup
# HF Spaces cache directories
HF_HOME=/tmp/huggingface
TRANSFORMERS_CACHE=/tmp/huggingface/transformers
TORCH_HOME=/tmp/torch
XDG_CACHE_HOME=/tmp
HF_HUB_CACHE=/tmp/huggingface/hub
Model Configuration
# Optimized for HF Spaces
embedding_model = 'all-MiniLM-L6-v2' # Fast, lightweight
generative_model = 'Qwen/Qwen2.5-1.5B-Instruct' # Primary model
fallback_model = 'distilgpt2' # Backup model
π Deployment Process
1. Initial Build
When you first deploy, the system will:
- Download base image (Python 3.11)
- Install dependencies from
requirements.txt - Set up cache directories in
/tmp - Download models (embedding + language models)
- Initialize RAG system with guard rails
- Start Streamlit server on port 8501
2. Model Download
The system downloads these models:
- Embedding Model:
all-MiniLM-L6-v2(~90MB) - Primary LLM:
Qwen/Qwen2.5-1.5B-Instruct(~3GB) - Fallback LLM:
distilgpt2(~300MB)
Note: First deployment may take 10-15 minutes due to model downloads.
3. System Initialization
The RAG system initializes with:
- Guard rails enabled for safety
- Vector store in
./vector_store - PDF processing ready
- Hybrid search (FAISS + BM25) configured
π Resource Management
Memory Usage
- Base system: ~500MB
- Embedding model: ~100MB
- Language model: ~3GB
- Total: ~3.6GB
CPU Usage
- Model loading: High (initial)
- Inference: Medium
- Search: Low
Storage
- Models: ~3.5GB
- Cache: ~1GB
- Vector store: Variable (based on documents)
π Troubleshooting
Common Issues
1. Permission Denied Errors
Error: [Errno 13] Permission denied: '/.cache'
Solution: The system automatically handles this by using /tmp directories.
2. Model Download Failures
Error: Failed to download model
Solution:
- Check internet connectivity
- Verify model names in configuration
- Wait for retry (automatic)
3. Memory Issues
Error: Out of memory
Solution:
- Use smaller models
- Reduce batch sizes
- Enable cache cleanup
4. Build Failures
Error: Docker build failed
Solution:
- Check Dockerfile syntax
- Verify all files are uploaded
- Check requirements.txt format
Debug Mode
Enable debug logging by setting:
# In hf_spaces_config.py
logging.basicConfig(level=logging.DEBUG)
Health Checks
The system provides health check endpoints:
- System status:
/health - Model status:
/models - Cache status:
/cache
π Security Features
Guard Rails
The system includes comprehensive guard rails:
- Input validation: Query length, content filtering
- Output safety: Response quality, hallucination detection
- Data privacy: PII detection and masking
- System protection: Rate limiting, resource monitoring
Environment Isolation
- Containerized: Isolated from host system
- Read-only: File system protection
- Network: Limited network access
- User: Non-root user execution
π Performance Optimization
Caching Strategy
- Model caching: Persistent across restarts
- Vector caching: FAISS index persistence
- Response caching: Frequently asked questions
Resource Optimization
- Memory: Efficient model loading
- CPU: Parallel processing
- Storage: Automatic cleanup
Monitoring
- Response times: Real-time metrics
- Memory usage: Resource monitoring
- Error rates: System health tracking
π Updates and Maintenance
Updating Models
- Modify configuration in
hf_spaces_config.py - Redeploy the Space
- Models will re-download automatically
Updating Code
- Push changes to your repository
- HF Spaces auto-rebuilds the container
- System restarts with new code
Cache Management
The system automatically:
- Cleans old cache files
- Manages storage usage
- Optimizes performance
π Support
Documentation
- README.md: General project information
- GUARD_RAILS_GUIDE.md: Safety system details
- This guide: HF Spaces specific instructions
Community
- Hugging Face Forums: Community support
- GitHub Issues: Bug reports and feature requests
- Discord: Real-time help
π Success Checklist
- Space created successfully
- All files uploaded
- Build completed without errors
- Models downloaded successfully
- RAG system initialized
- Streamlit interface accessible
- Guard rails enabled
- Test queries working
- Performance acceptable
π Next Steps
After successful deployment:
- Test the system with sample queries
- Upload documents for RAG functionality
- Monitor performance and resource usage
- Customize configuration as needed
- Share your Space with others
Happy Deploying! π
Your RAG system is now ready to provide intelligent document question-answering capabilities on Hugging Face Spaces.