# 🚀 Hugging Face Spaces Deployment Guide

This guide provides step-by-step instructions for deploying the RAG system on Hugging Face Spaces.

## 📋 Prerequisites

- Hugging Face account
- Git repository with the RAG system code
- Basic understanding of Docker containers

## 🎯 Quick Deployment

### Step 1: Create a New Space

1. Go to [Hugging Face Spaces](https://huggingface.co/spaces)
2. Click **"Create new Space"**
3. Choose **"Docker"** as the SDK
4. Set **Space name** (e.g., `my-rag-system`)
5. Choose **Public** or **Private** visibility
6. Click **"Create Space"**

### Step 2: Upload Files

Upload all files from this repository to your Space:

```
📁 Your Space Repository
├── 📄 app.py                    # Main Streamlit application
├── 📄 rag_system.py             # Core RAG system
├── 📄 pdf_processor.py          # PDF processing utilities
├── 📄 guard_rails.py            # Safety and security system
├── 📄 hf_spaces_config.py       # HF Spaces configuration
├── 📄 requirements.txt          # Python dependencies
├── 📄 Dockerfile                # Container configuration
├── 📄 README.md                 # Project documentation
├── 📄 GUARD_RAILS_GUIDE.md     # Guard rails documentation
└── 📄 HF_SPACES_DEPLOYMENT.md   # This deployment guide
```

### Step 3: Configure Environment

The system automatically detects HF Spaces environment and configures:

- **Cache directories** in `/tmp` (writable in HF Spaces)
- **Environment variables** for model loading
- **Resource limits** optimized for HF Spaces
- **Permission handling** for containerized environment

## 🔧 Configuration Details

### Automatic Environment Detection

The system automatically detects HF Spaces using:

```python
# Environment indicators
'SPACE_ID' in os.environ
'SPACE_HOST' in os.environ
'HF_HUB_ENDPOINT' in os.environ
os.path.exists('/tmp/huggingface')
```

### Cache Directory Setup

```bash
# HF Spaces cache directories
HF_HOME=/tmp/huggingface
TRANSFORMERS_CACHE=/tmp/huggingface/transformers
TORCH_HOME=/tmp/torch
XDG_CACHE_HOME=/tmp
HF_HUB_CACHE=/tmp/huggingface/hub
```

### Model Configuration

```python
# Optimized for HF Spaces
embedding_model = 'all-MiniLM-L6-v2'      # Fast, lightweight
generative_model = 'Qwen/Qwen2.5-1.5B-Instruct'  # Primary model
fallback_model = 'distilgpt2'             # Backup model
```

## 🚀 Deployment Process

### 1. Initial Build

When you first deploy, the system will:

1. **Download base image** (Python 3.11)
2. **Install dependencies** from `requirements.txt`
3. **Set up cache directories** in `/tmp`
4. **Download models** (embedding + language models)
5. **Initialize RAG system** with guard rails
6. **Start Streamlit server** on port 8501

### 2. Model Download

The system downloads these models:

- **Embedding Model**: `all-MiniLM-L6-v2` (~90MB)
- **Primary LLM**: `Qwen/Qwen2.5-1.5B-Instruct` (~3GB)
- **Fallback LLM**: `distilgpt2` (~300MB)

**Note**: First deployment may take 10-15 minutes due to model downloads.

### 3. System Initialization

The RAG system initializes with:

- **Guard rails enabled** for safety
- **Vector store** in `./vector_store`
- **PDF processing** ready
- **Hybrid search** (FAISS + BM25) configured

## 📊 Resource Management

### Memory Usage

- **Base system**: ~500MB
- **Embedding model**: ~100MB
- **Language model**: ~3GB
- **Total**: ~3.6GB

### CPU Usage

- **Model loading**: High (initial)
- **Inference**: Medium
- **Search**: Low

### Storage

- **Models**: ~3.5GB
- **Cache**: ~1GB
- **Vector store**: Variable (based on documents)

## 🔍 Troubleshooting

### Common Issues

#### 1. Permission Denied Errors

**Error**: `[Errno 13] Permission denied: '/.cache'`

**Solution**: The system automatically handles this by using `/tmp` directories.

#### 2. Model Download Failures

**Error**: `Failed to download model`

**Solution**: 
- Check internet connectivity
- Verify model names in configuration
- Wait for retry (automatic)

#### 3. Memory Issues

**Error**: `Out of memory`

**Solution**:
- Use smaller models
- Reduce batch sizes
- Enable cache cleanup

#### 4. Build Failures

**Error**: `Docker build failed`

**Solution**:
- Check Dockerfile syntax
- Verify all files are uploaded
- Check requirements.txt format

### Debug Mode

Enable debug logging by setting:

```python
# In hf_spaces_config.py
logging.basicConfig(level=logging.DEBUG)
```

### Health Checks

The system provides health check endpoints:

- **System status**: `/health`
- **Model status**: `/models`
- **Cache status**: `/cache`

## 🔒 Security Features

### Guard Rails

The system includes comprehensive guard rails:

- **Input validation**: Query length, content filtering
- **Output safety**: Response quality, hallucination detection
- **Data privacy**: PII detection and masking
- **System protection**: Rate limiting, resource monitoring

### Environment Isolation

- **Containerized**: Isolated from host system
- **Read-only**: File system protection
- **Network**: Limited network access
- **User**: Non-root user execution

## 📈 Performance Optimization

### Caching Strategy

- **Model caching**: Persistent across restarts
- **Vector caching**: FAISS index persistence
- **Response caching**: Frequently asked questions

### Resource Optimization

- **Memory**: Efficient model loading
- **CPU**: Parallel processing
- **Storage**: Automatic cleanup

### Monitoring

- **Response times**: Real-time metrics
- **Memory usage**: Resource monitoring
- **Error rates**: System health tracking

## 🔄 Updates and Maintenance

### Updating Models

1. **Modify configuration** in `hf_spaces_config.py`
2. **Redeploy** the Space
3. **Models will re-download** automatically

### Updating Code

1. **Push changes** to your repository
2. **HF Spaces auto-rebuilds** the container
3. **System restarts** with new code

### Cache Management

The system automatically:

- **Cleans old cache** files
- **Manages storage** usage
- **Optimizes performance**

## 📞 Support

### Documentation

- **README.md**: General project information
- **GUARD_RAILS_GUIDE.md**: Safety system details
- **This guide**: HF Spaces specific instructions

### Community

- **Hugging Face Forums**: Community support
- **GitHub Issues**: Bug reports and feature requests
- **Discord**: Real-time help

## 🎉 Success Checklist

- [ ] Space created successfully
- [ ] All files uploaded
- [ ] Build completed without errors
- [ ] Models downloaded successfully
- [ ] RAG system initialized
- [ ] Streamlit interface accessible
- [ ] Guard rails enabled
- [ ] Test queries working
- [ ] Performance acceptable

## 🚀 Next Steps

After successful deployment:

1. **Test the system** with sample queries
2. **Upload documents** for RAG functionality
3. **Monitor performance** and resource usage
4. **Customize configuration** as needed
5. **Share your Space** with others

---

**Happy Deploying! 🎉**

Your RAG system is now ready to provide intelligent document question-answering capabilities on Hugging Face Spaces.