Spaces:

ibagur
/

cfm_topic_classifier

Sleeping

App Files Files Community

cfm_topic_classifier / HF_DOWNLOAD_STRATEGY.md

ibagur

Complete history cleanup and model removal

cb2ce22 11 days ago

preview code

raw

history blame contribute delete

7.54 kB

Hugging Face Download-at-Runtime Strategy

Overview

This document explains how to implement a "Download at Runtime" strategy for your WASH CFM Topic Classifier model using huggingface_hub. This approach allows you to bypass the 1GB storage limit in Hugging Face Spaces by hosting your model in a separate Hugging Face repository and downloading it at runtime.

Why Use Download-at-Runtime?

Space Constraint Resolution: Hugging Face Spaces have a 1GB storage limit for uploaded files
Model Reusability: Host your model once and reuse it across multiple applications
Version Control: Leverage Hugging Face's built-in version control for model updates
Efficient Caching: Models are cached locally after first download
Scalability: Easy to update models without redeploying the entire Space

Implementation Details

Key Components

1. Dependencies

The implementation requires huggingface_hub>=0.16.0 added to your requirements:

huggingface_hub>=0.16.0

2. Configuration

Configure your Hugging Face repository details at the top of app.py:

# CONFIGURATION SECTION
HF_REPO_ID = "your-username/wash-cfm-classifier"  # Your model repository
HF_MODEL_CACHE_DIR = "./model_cache"  # Local cache directory

3. Download Function

The core download logic uses snapshot_download() from huggingface_hub:

from huggingface_hub import snapshot_download

model_path = snapshot_download(
    repo_id=HF_REPO_ID,
    cache_dir=HF_MODEL_CACHE_DIR,
    resume_download=True,      # Resume interrupted downloads
    local_files_only=False     # Force download if not cached
)

Key Features

Intelligent Caching:
- Models are cached in HF_MODEL_CACHE_DIR
- Subsequent runs use cached versions
- No repeated downloads
Resume Capability:
- resume_download=True handles interrupted downloads
- Useful for large models and unstable connections
Error Handling:
- Comprehensive error messages for troubleshooting
- Network connectivity checks
- Repository access validation
Performance Optimization:
- LRU caching prevents model reloading
- Device-aware inference (CPU/GPU/MPS)

Step-by-Step Implementation

Step 1: Upload Your Model to Hugging Face

Create a Hugging Face Account (if you don't have one)
Create a New Model Repository:
- Go to https://huggingface.co/new
- Name it appropriately (e.g., your-username/wash-cfm-classifier)
- Make it Public (required for Spaces)
- Upload your model files:
  - model.safetensors
  - config.json
  - tokenizer.json
  - tokenizer_config.json
  - special_tokens_map.json

Step 2: Update Configuration

Edit the configuration section in app.py:

HF_REPO_ID = "your-username/wash-cfm-classifier"  # Replace with your actual repo

Step 3: Install Dependencies

Add to your requirements.txt:

huggingface_hub>=0.16.0

Step 4: Deploy to Hugging Face Space

Create or update your Hugging Face Space
Upload your modified files (app.py with download logic)
The Space will automatically:
- Install dependencies from requirements.txt
- Download the model on first run
- Cache it for subsequent runs

How It Works

First Run

1. User accesses the Space
2. app.py imports huggingface_hub
3. load_model() function calls snapshot_download()
4. Model downloads from Hugging Face Hub (~500MB)
5. Model loads into memory
6. First prediction takes longer (download + load time)

Subsequent Runs

1. User accesses the Space
2. load_model() function checks cache
3. Model loads from local cache (~5-10 seconds)
4. Predictions are fast

Benefits vs Local Storage

Aspect	Local Storage	Download-at-Runtime
Initial Load Time	Instant	30-60 seconds (first run)
Subsequent Runs	Instant	Fast (cached)
Space Usage	Counts toward 1GB limit	Minimal (just cache)
Model Updates	Manual reupload	Automatic from repo
Scalability	Limited by Space size	Unlimited

Troubleshooting

Common Issues and Solutions

Repository Not Found

Error: Repository 'username/repo-name' not found
Solution: Verify repo ID and ensure repository is public

Download Timeout

Error: Download interrupted
Solution: The resume_download=True handles this automatically

Authentication Issues

Error: Access denied
Solution: Ensure repository is public or use access tokens

Disk Space

Error: No space left on device
Solution: Clean cache or use external storage

Debug Commands

To test your setup locally:

from huggingface_hub import snapshot_download

# Test download
path = snapshot_download(
    repo_id="your-username/wash-cfm-classifier",
    cache_dir="./test_cache"
)
print(f"Model downloaded to: {path}")

Advanced Options

1. Progressive Loading

For very large models, consider loading components separately:

from huggingface_hub import hf_hub_download

# Download individual files
config_path = hf_hub_download(
    repo_id=HF_REPO_ID,
    filename="config.json",
    cache_dir=HF_MODEL_CACHE_DIR
)

2. Custom Cache Location

Use persistent storage for Hugging Face Spaces:

# Use /tmp or mounted storage for better persistence
HF_MODEL_CACHE_DIR = "/tmp/model_cache"

3. Model Versioning

Pin specific model versions:

from huggingface_hub import snapshot_download

model_path = snapshot_download(
    repo_id=HF_REPO_ID,
    revision="v1.0",  # Specific version
    cache_dir=HF_MODEL_CACHE_DIR
)

Performance Considerations

First Run Optimization

Download Time: 30-60 seconds for ~500MB model
Load Time: 10-15 seconds for model initialization
Total: ~1-2 minutes for first prediction

Cached Run Performance

Load Time: 5-10 seconds (from cache)
Prediction: <1 second per inference

Memory Usage

Model Loading: ~2-3GB RAM during inference
Cached Storage: ~500MB disk space
Peak Usage: Higher during initial download

Best Practices

Repository Setup:
- Use clear, descriptive repository names
- Include model cards (README.md) with usage instructions
- Tag releases for version control
Error Handling:
- Implement graceful fallbacks
- Provide clear error messages to users
- Log download progress for debugging
User Experience:
- Show download progress indicators
- Cache models efficiently
- Handle network failures gracefully
Security:
- Use public repositories for Spaces
- Validate model integrity
- Implement proper access controls

Conclusion

The Download-at-Runtime strategy successfully addresses the Hugging Face Spaces 1GB limit by:

✅ Eliminating storage constraints
✅ Enabling model reuse across applications
✅ Providing efficient caching mechanisms
✅ Maintaining good performance after initial setup
✅ Offering built-in version control

This approach is ideal for production applications where model size exceeds Space limits but network connectivity is reliable.

For questions or issues, refer to the huggingface_hub documentation or create an issue in your repository.