Spaces:
Sleeping
Hugging Face Download-at-Runtime Strategy
Overview
This document explains how to implement a "Download at Runtime" strategy for your WASH CFM Topic Classifier model using huggingface_hub. This approach allows you to bypass the 1GB storage limit in Hugging Face Spaces by hosting your model in a separate Hugging Face repository and downloading it at runtime.
Why Use Download-at-Runtime?
- Space Constraint Resolution: Hugging Face Spaces have a 1GB storage limit for uploaded files
- Model Reusability: Host your model once and reuse it across multiple applications
- Version Control: Leverage Hugging Face's built-in version control for model updates
- Efficient Caching: Models are cached locally after first download
- Scalability: Easy to update models without redeploying the entire Space
Implementation Details
Key Components
1. Dependencies
The implementation requires huggingface_hub>=0.16.0 added to your requirements:
huggingface_hub>=0.16.0
2. Configuration
Configure your Hugging Face repository details at the top of app.py:
# CONFIGURATION SECTION
HF_REPO_ID = "your-username/wash-cfm-classifier" # Your model repository
HF_MODEL_CACHE_DIR = "./model_cache" # Local cache directory
3. Download Function
The core download logic uses snapshot_download() from huggingface_hub:
from huggingface_hub import snapshot_download
model_path = snapshot_download(
repo_id=HF_REPO_ID,
cache_dir=HF_MODEL_CACHE_DIR,
resume_download=True, # Resume interrupted downloads
local_files_only=False # Force download if not cached
)
Key Features
Intelligent Caching:
- Models are cached in
HF_MODEL_CACHE_DIR - Subsequent runs use cached versions
- No repeated downloads
- Models are cached in
Resume Capability:
resume_download=Truehandles interrupted downloads- Useful for large models and unstable connections
Error Handling:
- Comprehensive error messages for troubleshooting
- Network connectivity checks
- Repository access validation
Performance Optimization:
- LRU caching prevents model reloading
- Device-aware inference (CPU/GPU/MPS)
Step-by-Step Implementation
Step 1: Upload Your Model to Hugging Face
- Create a Hugging Face Account (if you don't have one)
- Create a New Model Repository:
- Go to https://huggingface.co/new
- Name it appropriately (e.g.,
your-username/wash-cfm-classifier) - Make it Public (required for Spaces)
- Upload your model files:
model.safetensorsconfig.jsontokenizer.jsontokenizer_config.jsonspecial_tokens_map.json
Step 2: Update Configuration
Edit the configuration section in app.py:
HF_REPO_ID = "your-username/wash-cfm-classifier" # Replace with your actual repo
Step 3: Install Dependencies
Add to your requirements.txt:
huggingface_hub>=0.16.0
Step 4: Deploy to Hugging Face Space
- Create or update your Hugging Face Space
- Upload your modified files (app.py with download logic)
- The Space will automatically:
- Install dependencies from requirements.txt
- Download the model on first run
- Cache it for subsequent runs
How It Works
First Run
1. User accesses the Space
2. app.py imports huggingface_hub
3. load_model() function calls snapshot_download()
4. Model downloads from Hugging Face Hub (~500MB)
5. Model loads into memory
6. First prediction takes longer (download + load time)
Subsequent Runs
1. User accesses the Space
2. load_model() function checks cache
3. Model loads from local cache (~5-10 seconds)
4. Predictions are fast
Benefits vs Local Storage
| Aspect | Local Storage | Download-at-Runtime |
|---|---|---|
| Initial Load Time | Instant | 30-60 seconds (first run) |
| Subsequent Runs | Instant | Fast (cached) |
| Space Usage | Counts toward 1GB limit | Minimal (just cache) |
| Model Updates | Manual reupload | Automatic from repo |
| Scalability | Limited by Space size | Unlimited |
Troubleshooting
Common Issues and Solutions
Repository Not Found
Error: Repository 'username/repo-name' not found Solution: Verify repo ID and ensure repository is publicDownload Timeout
Error: Download interrupted Solution: The resume_download=True handles this automaticallyAuthentication Issues
Error: Access denied Solution: Ensure repository is public or use access tokensDisk Space
Error: No space left on device Solution: Clean cache or use external storage
Debug Commands
To test your setup locally:
from huggingface_hub import snapshot_download
# Test download
path = snapshot_download(
repo_id="your-username/wash-cfm-classifier",
cache_dir="./test_cache"
)
print(f"Model downloaded to: {path}")
Advanced Options
1. Progressive Loading
For very large models, consider loading components separately:
from huggingface_hub import hf_hub_download
# Download individual files
config_path = hf_hub_download(
repo_id=HF_REPO_ID,
filename="config.json",
cache_dir=HF_MODEL_CACHE_DIR
)
2. Custom Cache Location
Use persistent storage for Hugging Face Spaces:
# Use /tmp or mounted storage for better persistence
HF_MODEL_CACHE_DIR = "/tmp/model_cache"
3. Model Versioning
Pin specific model versions:
from huggingface_hub import snapshot_download
model_path = snapshot_download(
repo_id=HF_REPO_ID,
revision="v1.0", # Specific version
cache_dir=HF_MODEL_CACHE_DIR
)
Performance Considerations
First Run Optimization
- Download Time: 30-60 seconds for ~500MB model
- Load Time: 10-15 seconds for model initialization
- Total: ~1-2 minutes for first prediction
Cached Run Performance
- Load Time: 5-10 seconds (from cache)
- Prediction: <1 second per inference
Memory Usage
- Model Loading: ~2-3GB RAM during inference
- Cached Storage: ~500MB disk space
- Peak Usage: Higher during initial download
Best Practices
Repository Setup:
- Use clear, descriptive repository names
- Include model cards (README.md) with usage instructions
- Tag releases for version control
Error Handling:
- Implement graceful fallbacks
- Provide clear error messages to users
- Log download progress for debugging
User Experience:
- Show download progress indicators
- Cache models efficiently
- Handle network failures gracefully
Security:
- Use public repositories for Spaces
- Validate model integrity
- Implement proper access controls
Conclusion
The Download-at-Runtime strategy successfully addresses the Hugging Face Spaces 1GB limit by:
β
Eliminating storage constraints
β
Enabling model reuse across applications
β
Providing efficient caching mechanisms
β
Maintaining good performance after initial setup
β
Offering built-in version control
This approach is ideal for production applications where model size exceeds Space limits but network connectivity is reliable.
For questions or issues, refer to the huggingface_hub documentation or create an issue in your repository.