Spaces:
Sleeping
Sleeping
| # Hugging Face Download-at-Runtime Strategy | |
| ## Overview | |
| This document explains how to implement a "Download at Runtime" strategy for your WASH CFM Topic Classifier model using `huggingface_hub`. This approach allows you to bypass the 1GB storage limit in Hugging Face Spaces by hosting your model in a separate Hugging Face repository and downloading it at runtime. | |
| ## Why Use Download-at-Runtime? | |
| 1. **Space Constraint Resolution**: Hugging Face Spaces have a 1GB storage limit for uploaded files | |
| 2. **Model Reusability**: Host your model once and reuse it across multiple applications | |
| 3. **Version Control**: Leverage Hugging Face's built-in version control for model updates | |
| 4. **Efficient Caching**: Models are cached locally after first download | |
| 5. **Scalability**: Easy to update models without redeploying the entire Space | |
| ## Implementation Details | |
| ### Key Components | |
| #### 1. Dependencies | |
| The implementation requires `huggingface_hub>=0.16.0` added to your requirements: | |
| ```txt | |
| huggingface_hub>=0.16.0 | |
| ``` | |
| #### 2. Configuration | |
| Configure your Hugging Face repository details at the top of `app.py`: | |
| ```python | |
| # CONFIGURATION SECTION | |
| HF_REPO_ID = "your-username/wash-cfm-classifier" # Your model repository | |
| HF_MODEL_CACHE_DIR = "./model_cache" # Local cache directory | |
| ``` | |
| #### 3. Download Function | |
| The core download logic uses `snapshot_download()` from `huggingface_hub`: | |
| ```python | |
| from huggingface_hub import snapshot_download | |
| model_path = snapshot_download( | |
| repo_id=HF_REPO_ID, | |
| cache_dir=HF_MODEL_CACHE_DIR, | |
| resume_download=True, # Resume interrupted downloads | |
| local_files_only=False # Force download if not cached | |
| ) | |
| ``` | |
| ### Key Features | |
| 1. **Intelligent Caching**: | |
| - Models are cached in `HF_MODEL_CACHE_DIR` | |
| - Subsequent runs use cached versions | |
| - No repeated downloads | |
| 2. **Resume Capability**: | |
| - `resume_download=True` handles interrupted downloads | |
| - Useful for large models and unstable connections | |
| 3. **Error Handling**: | |
| - Comprehensive error messages for troubleshooting | |
| - Network connectivity checks | |
| - Repository access validation | |
| 4. **Performance Optimization**: | |
| - LRU caching prevents model reloading | |
| - Device-aware inference (CPU/GPU/MPS) | |
| ## Step-by-Step Implementation | |
| ### Step 1: Upload Your Model to Hugging Face | |
| 1. **Create a Hugging Face Account** (if you don't have one) | |
| 2. **Create a New Model Repository**: | |
| - Go to https://huggingface.co/new | |
| - Name it appropriately (e.g., `your-username/wash-cfm-classifier`) | |
| - Make it **Public** (required for Spaces) | |
| - Upload your model files: | |
| - `model.safetensors` | |
| - `config.json` | |
| - `tokenizer.json` | |
| - `tokenizer_config.json` | |
| - `special_tokens_map.json` | |
| ### Step 2: Update Configuration | |
| Edit the configuration section in `app.py`: | |
| ```python | |
| HF_REPO_ID = "your-username/wash-cfm-classifier" # Replace with your actual repo | |
| ``` | |
| ### Step 3: Install Dependencies | |
| Add to your `requirements.txt`: | |
| ```txt | |
| huggingface_hub>=0.16.0 | |
| ``` | |
| ### Step 4: Deploy to Hugging Face Space | |
| 1. **Create or update your Hugging Face Space** | |
| 2. **Upload your modified files** (app.py with download logic) | |
| 3. **The Space will automatically**: | |
| - Install dependencies from requirements.txt | |
| - Download the model on first run | |
| - Cache it for subsequent runs | |
| ## How It Works | |
| ### First Run | |
| ``` | |
| 1. User accesses the Space | |
| 2. app.py imports huggingface_hub | |
| 3. load_model() function calls snapshot_download() | |
| 4. Model downloads from Hugging Face Hub (~500MB) | |
| 5. Model loads into memory | |
| 6. First prediction takes longer (download + load time) | |
| ``` | |
| ### Subsequent Runs | |
| ``` | |
| 1. User accesses the Space | |
| 2. load_model() function checks cache | |
| 3. Model loads from local cache (~5-10 seconds) | |
| 4. Predictions are fast | |
| ``` | |
| ## Benefits vs Local Storage | |
| | Aspect | Local Storage | Download-at-Runtime | | |
| |--------|---------------|---------------------| | |
| | **Initial Load Time** | Instant | 30-60 seconds (first run) | | |
| | **Subsequent Runs** | Instant | Fast (cached) | | |
| | **Space Usage** | Counts toward 1GB limit | Minimal (just cache) | | |
| | **Model Updates** | Manual reupload | Automatic from repo | | |
| | **Scalability** | Limited by Space size | Unlimited | | |
| ## Troubleshooting | |
| ### Common Issues and Solutions | |
| 1. **Repository Not Found** | |
| ``` | |
| Error: Repository 'username/repo-name' not found | |
| Solution: Verify repo ID and ensure repository is public | |
| ``` | |
| 2. **Download Timeout** | |
| ``` | |
| Error: Download interrupted | |
| Solution: The resume_download=True handles this automatically | |
| ``` | |
| 3. **Authentication Issues** | |
| ``` | |
| Error: Access denied | |
| Solution: Ensure repository is public or use access tokens | |
| ``` | |
| 4. **Disk Space** | |
| ``` | |
| Error: No space left on device | |
| Solution: Clean cache or use external storage | |
| ``` | |
| ### Debug Commands | |
| To test your setup locally: | |
| ```python | |
| from huggingface_hub import snapshot_download | |
| # Test download | |
| path = snapshot_download( | |
| repo_id="your-username/wash-cfm-classifier", | |
| cache_dir="./test_cache" | |
| ) | |
| print(f"Model downloaded to: {path}") | |
| ``` | |
| ## Advanced Options | |
| ### 1. Progressive Loading | |
| For very large models, consider loading components separately: | |
| ```python | |
| from huggingface_hub import hf_hub_download | |
| # Download individual files | |
| config_path = hf_hub_download( | |
| repo_id=HF_REPO_ID, | |
| filename="config.json", | |
| cache_dir=HF_MODEL_CACHE_DIR | |
| ) | |
| ``` | |
| ### 2. Custom Cache Location | |
| Use persistent storage for Hugging Face Spaces: | |
| ```python | |
| # Use /tmp or mounted storage for better persistence | |
| HF_MODEL_CACHE_DIR = "/tmp/model_cache" | |
| ``` | |
| ### 3. Model Versioning | |
| Pin specific model versions: | |
| ```python | |
| from huggingface_hub import snapshot_download | |
| model_path = snapshot_download( | |
| repo_id=HF_REPO_ID, | |
| revision="v1.0", # Specific version | |
| cache_dir=HF_MODEL_CACHE_DIR | |
| ) | |
| ``` | |
| ## Performance Considerations | |
| ### First Run Optimization | |
| - **Download Time**: 30-60 seconds for ~500MB model | |
| - **Load Time**: 10-15 seconds for model initialization | |
| - **Total**: ~1-2 minutes for first prediction | |
| ### Cached Run Performance | |
| - **Load Time**: 5-10 seconds (from cache) | |
| - **Prediction**: <1 second per inference | |
| ### Memory Usage | |
| - **Model Loading**: ~2-3GB RAM during inference | |
| - **Cached Storage**: ~500MB disk space | |
| - **Peak Usage**: Higher during initial download | |
| ## Best Practices | |
| 1. **Repository Setup**: | |
| - Use clear, descriptive repository names | |
| - Include model cards (README.md) with usage instructions | |
| - Tag releases for version control | |
| 2. **Error Handling**: | |
| - Implement graceful fallbacks | |
| - Provide clear error messages to users | |
| - Log download progress for debugging | |
| 3. **User Experience**: | |
| - Show download progress indicators | |
| - Cache models efficiently | |
| - Handle network failures gracefully | |
| 4. **Security**: | |
| - Use public repositories for Spaces | |
| - Validate model integrity | |
| - Implement proper access controls | |
| ## Conclusion | |
| The Download-at-Runtime strategy successfully addresses the Hugging Face Spaces 1GB limit by: | |
| β **Eliminating storage constraints** | |
| β **Enabling model reuse across applications** | |
| β **Providing efficient caching mechanisms** | |
| β **Maintaining good performance after initial setup** | |
| β **Offering built-in version control** | |
| This approach is ideal for production applications where model size exceeds Space limits but network connectivity is reliable. | |
| --- | |
| *For questions or issues, refer to the [huggingface_hub documentation](https://huggingface.co/docs/huggingface_hub/index) or create an issue in your repository.* |