convAI / HF_SPACES_DEPLOYMENT.md
sinhapiyush86's picture
Upload 15 files
afad319 verified

πŸš€ Hugging Face Spaces Deployment Guide

This guide provides step-by-step instructions for deploying the RAG system on Hugging Face Spaces.

πŸ“‹ Prerequisites

  • Hugging Face account
  • Git repository with the RAG system code
  • Basic understanding of Docker containers

🎯 Quick Deployment

Step 1: Create a New Space

  1. Go to Hugging Face Spaces
  2. Click "Create new Space"
  3. Choose "Docker" as the SDK
  4. Set Space name (e.g., my-rag-system)
  5. Choose Public or Private visibility
  6. Click "Create Space"

Step 2: Upload Files

Upload all files from this repository to your Space:

πŸ“ Your Space Repository
β”œβ”€β”€ πŸ“„ app.py                    # Main Streamlit application
β”œβ”€β”€ πŸ“„ rag_system.py             # Core RAG system
β”œβ”€β”€ πŸ“„ pdf_processor.py          # PDF processing utilities
β”œβ”€β”€ πŸ“„ guard_rails.py            # Safety and security system
β”œβ”€β”€ πŸ“„ hf_spaces_config.py       # HF Spaces configuration
β”œβ”€β”€ πŸ“„ requirements.txt          # Python dependencies
β”œβ”€β”€ πŸ“„ Dockerfile                # Container configuration
β”œβ”€β”€ πŸ“„ README.md                 # Project documentation
β”œβ”€β”€ πŸ“„ GUARD_RAILS_GUIDE.md     # Guard rails documentation
└── πŸ“„ HF_SPACES_DEPLOYMENT.md   # This deployment guide

Step 3: Configure Environment

The system automatically detects HF Spaces environment and configures:

  • Cache directories in /tmp (writable in HF Spaces)
  • Environment variables for model loading
  • Resource limits optimized for HF Spaces
  • Permission handling for containerized environment

πŸ”§ Configuration Details

Automatic Environment Detection

The system automatically detects HF Spaces using:

# Environment indicators
'SPACE_ID' in os.environ
'SPACE_HOST' in os.environ
'HF_HUB_ENDPOINT' in os.environ
os.path.exists('/tmp/huggingface')

Cache Directory Setup

# HF Spaces cache directories
HF_HOME=/tmp/huggingface
TRANSFORMERS_CACHE=/tmp/huggingface/transformers
TORCH_HOME=/tmp/torch
XDG_CACHE_HOME=/tmp
HF_HUB_CACHE=/tmp/huggingface/hub

Model Configuration

# Optimized for HF Spaces
embedding_model = 'all-MiniLM-L6-v2'      # Fast, lightweight
generative_model = 'Qwen/Qwen2.5-1.5B-Instruct'  # Primary model
fallback_model = 'distilgpt2'             # Backup model

πŸš€ Deployment Process

1. Initial Build

When you first deploy, the system will:

  1. Download base image (Python 3.11)
  2. Install dependencies from requirements.txt
  3. Set up cache directories in /tmp
  4. Download models (embedding + language models)
  5. Initialize RAG system with guard rails
  6. Start Streamlit server on port 8501

2. Model Download

The system downloads these models:

  • Embedding Model: all-MiniLM-L6-v2 (~90MB)
  • Primary LLM: Qwen/Qwen2.5-1.5B-Instruct (~3GB)
  • Fallback LLM: distilgpt2 (~300MB)

Note: First deployment may take 10-15 minutes due to model downloads.

3. System Initialization

The RAG system initializes with:

  • Guard rails enabled for safety
  • Vector store in ./vector_store
  • PDF processing ready
  • Hybrid search (FAISS + BM25) configured

πŸ“Š Resource Management

Memory Usage

  • Base system: ~500MB
  • Embedding model: ~100MB
  • Language model: ~3GB
  • Total: ~3.6GB

CPU Usage

  • Model loading: High (initial)
  • Inference: Medium
  • Search: Low

Storage

  • Models: ~3.5GB
  • Cache: ~1GB
  • Vector store: Variable (based on documents)

πŸ” Troubleshooting

Common Issues

1. Permission Denied Errors

Error: [Errno 13] Permission denied: '/.cache'

Solution: The system automatically handles this by using /tmp directories.

2. Model Download Failures

Error: Failed to download model

Solution:

  • Check internet connectivity
  • Verify model names in configuration
  • Wait for retry (automatic)

3. Memory Issues

Error: Out of memory

Solution:

  • Use smaller models
  • Reduce batch sizes
  • Enable cache cleanup

4. Build Failures

Error: Docker build failed

Solution:

  • Check Dockerfile syntax
  • Verify all files are uploaded
  • Check requirements.txt format

Debug Mode

Enable debug logging by setting:

# In hf_spaces_config.py
logging.basicConfig(level=logging.DEBUG)

Health Checks

The system provides health check endpoints:

  • System status: /health
  • Model status: /models
  • Cache status: /cache

πŸ”’ Security Features

Guard Rails

The system includes comprehensive guard rails:

  • Input validation: Query length, content filtering
  • Output safety: Response quality, hallucination detection
  • Data privacy: PII detection and masking
  • System protection: Rate limiting, resource monitoring

Environment Isolation

  • Containerized: Isolated from host system
  • Read-only: File system protection
  • Network: Limited network access
  • User: Non-root user execution

πŸ“ˆ Performance Optimization

Caching Strategy

  • Model caching: Persistent across restarts
  • Vector caching: FAISS index persistence
  • Response caching: Frequently asked questions

Resource Optimization

  • Memory: Efficient model loading
  • CPU: Parallel processing
  • Storage: Automatic cleanup

Monitoring

  • Response times: Real-time metrics
  • Memory usage: Resource monitoring
  • Error rates: System health tracking

πŸ”„ Updates and Maintenance

Updating Models

  1. Modify configuration in hf_spaces_config.py
  2. Redeploy the Space
  3. Models will re-download automatically

Updating Code

  1. Push changes to your repository
  2. HF Spaces auto-rebuilds the container
  3. System restarts with new code

Cache Management

The system automatically:

  • Cleans old cache files
  • Manages storage usage
  • Optimizes performance

πŸ“ž Support

Documentation

  • README.md: General project information
  • GUARD_RAILS_GUIDE.md: Safety system details
  • This guide: HF Spaces specific instructions

Community

  • Hugging Face Forums: Community support
  • GitHub Issues: Bug reports and feature requests
  • Discord: Real-time help

πŸŽ‰ Success Checklist

  • Space created successfully
  • All files uploaded
  • Build completed without errors
  • Models downloaded successfully
  • RAG system initialized
  • Streamlit interface accessible
  • Guard rails enabled
  • Test queries working
  • Performance acceptable

πŸš€ Next Steps

After successful deployment:

  1. Test the system with sample queries
  2. Upload documents for RAG functionality
  3. Monitor performance and resource usage
  4. Customize configuration as needed
  5. Share your Space with others

Happy Deploying! πŸŽ‰

Your RAG system is now ready to provide intelligent document question-answering capabilities on Hugging Face Spaces.