cfm_topic_classifier / HF_DOWNLOAD_STRATEGY.md
ibagur's picture
Complete history cleanup and model removal
cb2ce22

Hugging Face Download-at-Runtime Strategy

Overview

This document explains how to implement a "Download at Runtime" strategy for your WASH CFM Topic Classifier model using huggingface_hub. This approach allows you to bypass the 1GB storage limit in Hugging Face Spaces by hosting your model in a separate Hugging Face repository and downloading it at runtime.

Why Use Download-at-Runtime?

  1. Space Constraint Resolution: Hugging Face Spaces have a 1GB storage limit for uploaded files
  2. Model Reusability: Host your model once and reuse it across multiple applications
  3. Version Control: Leverage Hugging Face's built-in version control for model updates
  4. Efficient Caching: Models are cached locally after first download
  5. Scalability: Easy to update models without redeploying the entire Space

Implementation Details

Key Components

1. Dependencies

The implementation requires huggingface_hub>=0.16.0 added to your requirements:

huggingface_hub>=0.16.0

2. Configuration

Configure your Hugging Face repository details at the top of app.py:

# CONFIGURATION SECTION
HF_REPO_ID = "your-username/wash-cfm-classifier"  # Your model repository
HF_MODEL_CACHE_DIR = "./model_cache"  # Local cache directory

3. Download Function

The core download logic uses snapshot_download() from huggingface_hub:

from huggingface_hub import snapshot_download

model_path = snapshot_download(
    repo_id=HF_REPO_ID,
    cache_dir=HF_MODEL_CACHE_DIR,
    resume_download=True,      # Resume interrupted downloads
    local_files_only=False     # Force download if not cached
)

Key Features

  1. Intelligent Caching:

    • Models are cached in HF_MODEL_CACHE_DIR
    • Subsequent runs use cached versions
    • No repeated downloads
  2. Resume Capability:

    • resume_download=True handles interrupted downloads
    • Useful for large models and unstable connections
  3. Error Handling:

    • Comprehensive error messages for troubleshooting
    • Network connectivity checks
    • Repository access validation
  4. Performance Optimization:

    • LRU caching prevents model reloading
    • Device-aware inference (CPU/GPU/MPS)

Step-by-Step Implementation

Step 1: Upload Your Model to Hugging Face

  1. Create a Hugging Face Account (if you don't have one)
  2. Create a New Model Repository:
    • Go to https://huggingface.co/new
    • Name it appropriately (e.g., your-username/wash-cfm-classifier)
    • Make it Public (required for Spaces)
    • Upload your model files:
      • model.safetensors
      • config.json
      • tokenizer.json
      • tokenizer_config.json
      • special_tokens_map.json

Step 2: Update Configuration

Edit the configuration section in app.py:

HF_REPO_ID = "your-username/wash-cfm-classifier"  # Replace with your actual repo

Step 3: Install Dependencies

Add to your requirements.txt:

huggingface_hub>=0.16.0

Step 4: Deploy to Hugging Face Space

  1. Create or update your Hugging Face Space
  2. Upload your modified files (app.py with download logic)
  3. The Space will automatically:
    • Install dependencies from requirements.txt
    • Download the model on first run
    • Cache it for subsequent runs

How It Works

First Run

1. User accesses the Space
2. app.py imports huggingface_hub
3. load_model() function calls snapshot_download()
4. Model downloads from Hugging Face Hub (~500MB)
5. Model loads into memory
6. First prediction takes longer (download + load time)

Subsequent Runs

1. User accesses the Space
2. load_model() function checks cache
3. Model loads from local cache (~5-10 seconds)
4. Predictions are fast

Benefits vs Local Storage

Aspect Local Storage Download-at-Runtime
Initial Load Time Instant 30-60 seconds (first run)
Subsequent Runs Instant Fast (cached)
Space Usage Counts toward 1GB limit Minimal (just cache)
Model Updates Manual reupload Automatic from repo
Scalability Limited by Space size Unlimited

Troubleshooting

Common Issues and Solutions

  1. Repository Not Found

    Error: Repository 'username/repo-name' not found
    Solution: Verify repo ID and ensure repository is public
    
  2. Download Timeout

    Error: Download interrupted
    Solution: The resume_download=True handles this automatically
    
  3. Authentication Issues

    Error: Access denied
    Solution: Ensure repository is public or use access tokens
    
  4. Disk Space

    Error: No space left on device
    Solution: Clean cache or use external storage
    

Debug Commands

To test your setup locally:

from huggingface_hub import snapshot_download

# Test download
path = snapshot_download(
    repo_id="your-username/wash-cfm-classifier",
    cache_dir="./test_cache"
)
print(f"Model downloaded to: {path}")

Advanced Options

1. Progressive Loading

For very large models, consider loading components separately:

from huggingface_hub import hf_hub_download

# Download individual files
config_path = hf_hub_download(
    repo_id=HF_REPO_ID,
    filename="config.json",
    cache_dir=HF_MODEL_CACHE_DIR
)

2. Custom Cache Location

Use persistent storage for Hugging Face Spaces:

# Use /tmp or mounted storage for better persistence
HF_MODEL_CACHE_DIR = "/tmp/model_cache"

3. Model Versioning

Pin specific model versions:

from huggingface_hub import snapshot_download

model_path = snapshot_download(
    repo_id=HF_REPO_ID,
    revision="v1.0",  # Specific version
    cache_dir=HF_MODEL_CACHE_DIR
)

Performance Considerations

First Run Optimization

  • Download Time: 30-60 seconds for ~500MB model
  • Load Time: 10-15 seconds for model initialization
  • Total: ~1-2 minutes for first prediction

Cached Run Performance

  • Load Time: 5-10 seconds (from cache)
  • Prediction: <1 second per inference

Memory Usage

  • Model Loading: ~2-3GB RAM during inference
  • Cached Storage: ~500MB disk space
  • Peak Usage: Higher during initial download

Best Practices

  1. Repository Setup:

    • Use clear, descriptive repository names
    • Include model cards (README.md) with usage instructions
    • Tag releases for version control
  2. Error Handling:

    • Implement graceful fallbacks
    • Provide clear error messages to users
    • Log download progress for debugging
  3. User Experience:

    • Show download progress indicators
    • Cache models efficiently
    • Handle network failures gracefully
  4. Security:

    • Use public repositories for Spaces
    • Validate model integrity
    • Implement proper access controls

Conclusion

The Download-at-Runtime strategy successfully addresses the Hugging Face Spaces 1GB limit by:

βœ… Eliminating storage constraints
βœ… Enabling model reuse across applications
βœ… Providing efficient caching mechanisms
βœ… Maintaining good performance after initial setup
βœ… Offering built-in version control

This approach is ideal for production applications where model size exceeds Space limits but network connectivity is reliable.


For questions or issues, refer to the huggingface_hub documentation or create an issue in your repository.