cfm_topic_classifier / HF_DOWNLOAD_STRATEGY.md
ibagur's picture
Complete history cleanup and model removal
cb2ce22
# Hugging Face Download-at-Runtime Strategy
## Overview
This document explains how to implement a "Download at Runtime" strategy for your WASH CFM Topic Classifier model using `huggingface_hub`. This approach allows you to bypass the 1GB storage limit in Hugging Face Spaces by hosting your model in a separate Hugging Face repository and downloading it at runtime.
## Why Use Download-at-Runtime?
1. **Space Constraint Resolution**: Hugging Face Spaces have a 1GB storage limit for uploaded files
2. **Model Reusability**: Host your model once and reuse it across multiple applications
3. **Version Control**: Leverage Hugging Face's built-in version control for model updates
4. **Efficient Caching**: Models are cached locally after first download
5. **Scalability**: Easy to update models without redeploying the entire Space
## Implementation Details
### Key Components
#### 1. Dependencies
The implementation requires `huggingface_hub>=0.16.0` added to your requirements:
```txt
huggingface_hub>=0.16.0
```
#### 2. Configuration
Configure your Hugging Face repository details at the top of `app.py`:
```python
# CONFIGURATION SECTION
HF_REPO_ID = "your-username/wash-cfm-classifier" # Your model repository
HF_MODEL_CACHE_DIR = "./model_cache" # Local cache directory
```
#### 3. Download Function
The core download logic uses `snapshot_download()` from `huggingface_hub`:
```python
from huggingface_hub import snapshot_download
model_path = snapshot_download(
repo_id=HF_REPO_ID,
cache_dir=HF_MODEL_CACHE_DIR,
resume_download=True, # Resume interrupted downloads
local_files_only=False # Force download if not cached
)
```
### Key Features
1. **Intelligent Caching**:
- Models are cached in `HF_MODEL_CACHE_DIR`
- Subsequent runs use cached versions
- No repeated downloads
2. **Resume Capability**:
- `resume_download=True` handles interrupted downloads
- Useful for large models and unstable connections
3. **Error Handling**:
- Comprehensive error messages for troubleshooting
- Network connectivity checks
- Repository access validation
4. **Performance Optimization**:
- LRU caching prevents model reloading
- Device-aware inference (CPU/GPU/MPS)
## Step-by-Step Implementation
### Step 1: Upload Your Model to Hugging Face
1. **Create a Hugging Face Account** (if you don't have one)
2. **Create a New Model Repository**:
- Go to https://huggingface.co/new
- Name it appropriately (e.g., `your-username/wash-cfm-classifier`)
- Make it **Public** (required for Spaces)
- Upload your model files:
- `model.safetensors`
- `config.json`
- `tokenizer.json`
- `tokenizer_config.json`
- `special_tokens_map.json`
### Step 2: Update Configuration
Edit the configuration section in `app.py`:
```python
HF_REPO_ID = "your-username/wash-cfm-classifier" # Replace with your actual repo
```
### Step 3: Install Dependencies
Add to your `requirements.txt`:
```txt
huggingface_hub>=0.16.0
```
### Step 4: Deploy to Hugging Face Space
1. **Create or update your Hugging Face Space**
2. **Upload your modified files** (app.py with download logic)
3. **The Space will automatically**:
- Install dependencies from requirements.txt
- Download the model on first run
- Cache it for subsequent runs
## How It Works
### First Run
```
1. User accesses the Space
2. app.py imports huggingface_hub
3. load_model() function calls snapshot_download()
4. Model downloads from Hugging Face Hub (~500MB)
5. Model loads into memory
6. First prediction takes longer (download + load time)
```
### Subsequent Runs
```
1. User accesses the Space
2. load_model() function checks cache
3. Model loads from local cache (~5-10 seconds)
4. Predictions are fast
```
## Benefits vs Local Storage
| Aspect | Local Storage | Download-at-Runtime |
|--------|---------------|---------------------|
| **Initial Load Time** | Instant | 30-60 seconds (first run) |
| **Subsequent Runs** | Instant | Fast (cached) |
| **Space Usage** | Counts toward 1GB limit | Minimal (just cache) |
| **Model Updates** | Manual reupload | Automatic from repo |
| **Scalability** | Limited by Space size | Unlimited |
## Troubleshooting
### Common Issues and Solutions
1. **Repository Not Found**
```
Error: Repository 'username/repo-name' not found
Solution: Verify repo ID and ensure repository is public
```
2. **Download Timeout**
```
Error: Download interrupted
Solution: The resume_download=True handles this automatically
```
3. **Authentication Issues**
```
Error: Access denied
Solution: Ensure repository is public or use access tokens
```
4. **Disk Space**
```
Error: No space left on device
Solution: Clean cache or use external storage
```
### Debug Commands
To test your setup locally:
```python
from huggingface_hub import snapshot_download
# Test download
path = snapshot_download(
repo_id="your-username/wash-cfm-classifier",
cache_dir="./test_cache"
)
print(f"Model downloaded to: {path}")
```
## Advanced Options
### 1. Progressive Loading
For very large models, consider loading components separately:
```python
from huggingface_hub import hf_hub_download
# Download individual files
config_path = hf_hub_download(
repo_id=HF_REPO_ID,
filename="config.json",
cache_dir=HF_MODEL_CACHE_DIR
)
```
### 2. Custom Cache Location
Use persistent storage for Hugging Face Spaces:
```python
# Use /tmp or mounted storage for better persistence
HF_MODEL_CACHE_DIR = "/tmp/model_cache"
```
### 3. Model Versioning
Pin specific model versions:
```python
from huggingface_hub import snapshot_download
model_path = snapshot_download(
repo_id=HF_REPO_ID,
revision="v1.0", # Specific version
cache_dir=HF_MODEL_CACHE_DIR
)
```
## Performance Considerations
### First Run Optimization
- **Download Time**: 30-60 seconds for ~500MB model
- **Load Time**: 10-15 seconds for model initialization
- **Total**: ~1-2 minutes for first prediction
### Cached Run Performance
- **Load Time**: 5-10 seconds (from cache)
- **Prediction**: <1 second per inference
### Memory Usage
- **Model Loading**: ~2-3GB RAM during inference
- **Cached Storage**: ~500MB disk space
- **Peak Usage**: Higher during initial download
## Best Practices
1. **Repository Setup**:
- Use clear, descriptive repository names
- Include model cards (README.md) with usage instructions
- Tag releases for version control
2. **Error Handling**:
- Implement graceful fallbacks
- Provide clear error messages to users
- Log download progress for debugging
3. **User Experience**:
- Show download progress indicators
- Cache models efficiently
- Handle network failures gracefully
4. **Security**:
- Use public repositories for Spaces
- Validate model integrity
- Implement proper access controls
## Conclusion
The Download-at-Runtime strategy successfully addresses the Hugging Face Spaces 1GB limit by:
βœ… **Eliminating storage constraints**
βœ… **Enabling model reuse across applications**
βœ… **Providing efficient caching mechanisms**
βœ… **Maintaining good performance after initial setup**
βœ… **Offering built-in version control**
This approach is ideal for production applications where model size exceeds Space limits but network connectivity is reliable.
---
*For questions or issues, refer to the [huggingface_hub documentation](https://huggingface.co/docs/huggingface_hub/index) or create an issue in your repository.*