# Hugging Face Download-at-Runtime Strategy

## Overview

This document explains how to implement a "Download at Runtime" strategy for your WASH CFM Topic Classifier model using `huggingface_hub`. This approach allows you to bypass the 1GB storage limit in Hugging Face Spaces by hosting your model in a separate Hugging Face repository and downloading it at runtime.

## Why Use Download-at-Runtime?

1. **Space Constraint Resolution**: Hugging Face Spaces have a 1GB storage limit for uploaded files
2. **Model Reusability**: Host your model once and reuse it across multiple applications
3. **Version Control**: Leverage Hugging Face's built-in version control for model updates
4. **Efficient Caching**: Models are cached locally after first download
5. **Scalability**: Easy to update models without redeploying the entire Space

## Implementation Details

### Key Components

#### 1. Dependencies
The implementation requires `huggingface_hub>=0.16.0` added to your requirements:

```txt
huggingface_hub>=0.16.0
```

#### 2. Configuration
Configure your Hugging Face repository details at the top of `app.py`:

```python
# CONFIGURATION SECTION
HF_REPO_ID = "your-username/wash-cfm-classifier"  # Your model repository
HF_MODEL_CACHE_DIR = "./model_cache"  # Local cache directory
```

#### 3. Download Function
The core download logic uses `snapshot_download()` from `huggingface_hub`:

```python
from huggingface_hub import snapshot_download

model_path = snapshot_download(
    repo_id=HF_REPO_ID,
    cache_dir=HF_MODEL_CACHE_DIR,
    resume_download=True,      # Resume interrupted downloads
    local_files_only=False     # Force download if not cached
)
```

### Key Features

1. **Intelligent Caching**: 
   - Models are cached in `HF_MODEL_CACHE_DIR`
   - Subsequent runs use cached versions
   - No repeated downloads

2. **Resume Capability**: 
   - `resume_download=True` handles interrupted downloads
   - Useful for large models and unstable connections

3. **Error Handling**: 
   - Comprehensive error messages for troubleshooting
   - Network connectivity checks
   - Repository access validation

4. **Performance Optimization**:
   - LRU caching prevents model reloading
   - Device-aware inference (CPU/GPU/MPS)

## Step-by-Step Implementation

### Step 1: Upload Your Model to Hugging Face

1. **Create a Hugging Face Account** (if you don't have one)
2. **Create a New Model Repository**:
   - Go to https://huggingface.co/new
   - Name it appropriately (e.g., `your-username/wash-cfm-classifier`)
   - Make it **Public** (required for Spaces)
   - Upload your model files:
     - `model.safetensors`
     - `config.json`
     - `tokenizer.json`
     - `tokenizer_config.json`
     - `special_tokens_map.json`

### Step 2: Update Configuration

Edit the configuration section in `app.py`:

```python
HF_REPO_ID = "your-username/wash-cfm-classifier"  # Replace with your actual repo
```

### Step 3: Install Dependencies

Add to your `requirements.txt`:

```txt
huggingface_hub>=0.16.0
```

### Step 4: Deploy to Hugging Face Space

1. **Create or update your Hugging Face Space**
2. **Upload your modified files** (app.py with download logic)
3. **The Space will automatically**:
   - Install dependencies from requirements.txt
   - Download the model on first run
   - Cache it for subsequent runs

## How It Works

### First Run
```
1. User accesses the Space
2. app.py imports huggingface_hub
3. load_model() function calls snapshot_download()
4. Model downloads from Hugging Face Hub (~500MB)
5. Model loads into memory
6. First prediction takes longer (download + load time)
```

### Subsequent Runs
```
1. User accesses the Space
2. load_model() function checks cache
3. Model loads from local cache (~5-10 seconds)
4. Predictions are fast
```

## Benefits vs Local Storage

| Aspect | Local Storage | Download-at-Runtime |
|--------|---------------|---------------------|
| **Initial Load Time** | Instant | 30-60 seconds (first run) |
| **Subsequent Runs** | Instant | Fast (cached) |
| **Space Usage** | Counts toward 1GB limit | Minimal (just cache) |
| **Model Updates** | Manual reupload | Automatic from repo |
| **Scalability** | Limited by Space size | Unlimited |

## Troubleshooting

### Common Issues and Solutions

1. **Repository Not Found**
   ```
   Error: Repository 'username/repo-name' not found
   Solution: Verify repo ID and ensure repository is public
   ```

2. **Download Timeout**
   ```
   Error: Download interrupted
   Solution: The resume_download=True handles this automatically
   ```

3. **Authentication Issues**
   ```
   Error: Access denied
   Solution: Ensure repository is public or use access tokens
   ```

4. **Disk Space**
   ```
   Error: No space left on device
   Solution: Clean cache or use external storage
   ```

### Debug Commands

To test your setup locally:

```python
from huggingface_hub import snapshot_download

# Test download
path = snapshot_download(
    repo_id="your-username/wash-cfm-classifier",
    cache_dir="./test_cache"
)
print(f"Model downloaded to: {path}")
```

## Advanced Options

### 1. Progressive Loading
For very large models, consider loading components separately:

```python
from huggingface_hub import hf_hub_download

# Download individual files
config_path = hf_hub_download(
    repo_id=HF_REPO_ID,
    filename="config.json",
    cache_dir=HF_MODEL_CACHE_DIR
)
```

### 2. Custom Cache Location
Use persistent storage for Hugging Face Spaces:

```python
# Use /tmp or mounted storage for better persistence
HF_MODEL_CACHE_DIR = "/tmp/model_cache"
```

### 3. Model Versioning
Pin specific model versions:

```python
from huggingface_hub import snapshot_download

model_path = snapshot_download(
    repo_id=HF_REPO_ID,
    revision="v1.0",  # Specific version
    cache_dir=HF_MODEL_CACHE_DIR
)
```

## Performance Considerations

### First Run Optimization
- **Download Time**: 30-60 seconds for ~500MB model
- **Load Time**: 10-15 seconds for model initialization
- **Total**: ~1-2 minutes for first prediction

### Cached Run Performance
- **Load Time**: 5-10 seconds (from cache)
- **Prediction**: <1 second per inference

### Memory Usage
- **Model Loading**: ~2-3GB RAM during inference
- **Cached Storage**: ~500MB disk space
- **Peak Usage**: Higher during initial download

## Best Practices

1. **Repository Setup**:
   - Use clear, descriptive repository names
   - Include model cards (README.md) with usage instructions
   - Tag releases for version control

2. **Error Handling**:
   - Implement graceful fallbacks
   - Provide clear error messages to users
   - Log download progress for debugging

3. **User Experience**:
   - Show download progress indicators
   - Cache models efficiently
   - Handle network failures gracefully

4. **Security**:
   - Use public repositories for Spaces
   - Validate model integrity
   - Implement proper access controls

## Conclusion

The Download-at-Runtime strategy successfully addresses the Hugging Face Spaces 1GB limit by:

✅ **Eliminating storage constraints**  
✅ **Enabling model reuse across applications**  
✅ **Providing efficient caching mechanisms**  
✅ **Maintaining good performance after initial setup**  
✅ **Offering built-in version control**

This approach is ideal for production applications where model size exceeds Space limits but network connectivity is reliable.

---

*For questions or issues, refer to the [huggingface_hub documentation](https://huggingface.co/docs/huggingface_hub/index) or create an issue in your repository.*