# Hugging Face Download-at-Runtime Strategy ## Overview This document explains how to implement a "Download at Runtime" strategy for your WASH CFM Topic Classifier model using `huggingface_hub`. This approach allows you to bypass the 1GB storage limit in Hugging Face Spaces by hosting your model in a separate Hugging Face repository and downloading it at runtime. ## Why Use Download-at-Runtime? 1. **Space Constraint Resolution**: Hugging Face Spaces have a 1GB storage limit for uploaded files 2. **Model Reusability**: Host your model once and reuse it across multiple applications 3. **Version Control**: Leverage Hugging Face's built-in version control for model updates 4. **Efficient Caching**: Models are cached locally after first download 5. **Scalability**: Easy to update models without redeploying the entire Space ## Implementation Details ### Key Components #### 1. Dependencies The implementation requires `huggingface_hub>=0.16.0` added to your requirements: ```txt huggingface_hub>=0.16.0 ``` #### 2. Configuration Configure your Hugging Face repository details at the top of `app.py`: ```python # CONFIGURATION SECTION HF_REPO_ID = "your-username/wash-cfm-classifier" # Your model repository HF_MODEL_CACHE_DIR = "./model_cache" # Local cache directory ``` #### 3. Download Function The core download logic uses `snapshot_download()` from `huggingface_hub`: ```python from huggingface_hub import snapshot_download model_path = snapshot_download( repo_id=HF_REPO_ID, cache_dir=HF_MODEL_CACHE_DIR, resume_download=True, # Resume interrupted downloads local_files_only=False # Force download if not cached ) ``` ### Key Features 1. **Intelligent Caching**: - Models are cached in `HF_MODEL_CACHE_DIR` - Subsequent runs use cached versions - No repeated downloads 2. **Resume Capability**: - `resume_download=True` handles interrupted downloads - Useful for large models and unstable connections 3. **Error Handling**: - Comprehensive error messages for troubleshooting - Network connectivity checks - Repository access validation 4. **Performance Optimization**: - LRU caching prevents model reloading - Device-aware inference (CPU/GPU/MPS) ## Step-by-Step Implementation ### Step 1: Upload Your Model to Hugging Face 1. **Create a Hugging Face Account** (if you don't have one) 2. **Create a New Model Repository**: - Go to https://huggingface.co/new - Name it appropriately (e.g., `your-username/wash-cfm-classifier`) - Make it **Public** (required for Spaces) - Upload your model files: - `model.safetensors` - `config.json` - `tokenizer.json` - `tokenizer_config.json` - `special_tokens_map.json` ### Step 2: Update Configuration Edit the configuration section in `app.py`: ```python HF_REPO_ID = "your-username/wash-cfm-classifier" # Replace with your actual repo ``` ### Step 3: Install Dependencies Add to your `requirements.txt`: ```txt huggingface_hub>=0.16.0 ``` ### Step 4: Deploy to Hugging Face Space 1. **Create or update your Hugging Face Space** 2. **Upload your modified files** (app.py with download logic) 3. **The Space will automatically**: - Install dependencies from requirements.txt - Download the model on first run - Cache it for subsequent runs ## How It Works ### First Run ``` 1. User accesses the Space 2. app.py imports huggingface_hub 3. load_model() function calls snapshot_download() 4. Model downloads from Hugging Face Hub (~500MB) 5. Model loads into memory 6. First prediction takes longer (download + load time) ``` ### Subsequent Runs ``` 1. User accesses the Space 2. load_model() function checks cache 3. Model loads from local cache (~5-10 seconds) 4. Predictions are fast ``` ## Benefits vs Local Storage | Aspect | Local Storage | Download-at-Runtime | |--------|---------------|---------------------| | **Initial Load Time** | Instant | 30-60 seconds (first run) | | **Subsequent Runs** | Instant | Fast (cached) | | **Space Usage** | Counts toward 1GB limit | Minimal (just cache) | | **Model Updates** | Manual reupload | Automatic from repo | | **Scalability** | Limited by Space size | Unlimited | ## Troubleshooting ### Common Issues and Solutions 1. **Repository Not Found** ``` Error: Repository 'username/repo-name' not found Solution: Verify repo ID and ensure repository is public ``` 2. **Download Timeout** ``` Error: Download interrupted Solution: The resume_download=True handles this automatically ``` 3. **Authentication Issues** ``` Error: Access denied Solution: Ensure repository is public or use access tokens ``` 4. **Disk Space** ``` Error: No space left on device Solution: Clean cache or use external storage ``` ### Debug Commands To test your setup locally: ```python from huggingface_hub import snapshot_download # Test download path = snapshot_download( repo_id="your-username/wash-cfm-classifier", cache_dir="./test_cache" ) print(f"Model downloaded to: {path}") ``` ## Advanced Options ### 1. Progressive Loading For very large models, consider loading components separately: ```python from huggingface_hub import hf_hub_download # Download individual files config_path = hf_hub_download( repo_id=HF_REPO_ID, filename="config.json", cache_dir=HF_MODEL_CACHE_DIR ) ``` ### 2. Custom Cache Location Use persistent storage for Hugging Face Spaces: ```python # Use /tmp or mounted storage for better persistence HF_MODEL_CACHE_DIR = "/tmp/model_cache" ``` ### 3. Model Versioning Pin specific model versions: ```python from huggingface_hub import snapshot_download model_path = snapshot_download( repo_id=HF_REPO_ID, revision="v1.0", # Specific version cache_dir=HF_MODEL_CACHE_DIR ) ``` ## Performance Considerations ### First Run Optimization - **Download Time**: 30-60 seconds for ~500MB model - **Load Time**: 10-15 seconds for model initialization - **Total**: ~1-2 minutes for first prediction ### Cached Run Performance - **Load Time**: 5-10 seconds (from cache) - **Prediction**: <1 second per inference ### Memory Usage - **Model Loading**: ~2-3GB RAM during inference - **Cached Storage**: ~500MB disk space - **Peak Usage**: Higher during initial download ## Best Practices 1. **Repository Setup**: - Use clear, descriptive repository names - Include model cards (README.md) with usage instructions - Tag releases for version control 2. **Error Handling**: - Implement graceful fallbacks - Provide clear error messages to users - Log download progress for debugging 3. **User Experience**: - Show download progress indicators - Cache models efficiently - Handle network failures gracefully 4. **Security**: - Use public repositories for Spaces - Validate model integrity - Implement proper access controls ## Conclusion The Download-at-Runtime strategy successfully addresses the Hugging Face Spaces 1GB limit by: ✅ **Eliminating storage constraints** ✅ **Enabling model reuse across applications** ✅ **Providing efficient caching mechanisms** ✅ **Maintaining good performance after initial setup** ✅ **Offering built-in version control** This approach is ideal for production applications where model size exceeds Space limits but network connectivity is reliable. --- *For questions or issues, refer to the [huggingface_hub documentation](https://huggingface.co/docs/huggingface_hub/index) or create an issue in your repository.*