Spaces:

ibagur
/

cfm_topic_classifier

Sleeping

App Files Files Community

cfm_topic_classifier / HF_DOWNLOAD_STRATEGY.md

ibagur

Complete history cleanup and model removal

cb2ce22 11 days ago

preview code

raw

history blame contribute delete

7.54 kB

	# Hugging Face Download-at-Runtime Strategy

	## Overview

	This document explains how to implement a "Download at Runtime" strategy for your WASH CFM Topic Classifier model using `huggingface_hub`. This approach allows you to bypass the 1GB storage limit in Hugging Face Spaces by hosting your model in a separate Hugging Face repository and downloading it at runtime.

	## Why Use Download-at-Runtime?

	1. Space Constraint Resolution: Hugging Face Spaces have a 1GB storage limit for uploaded files
	2. Model Reusability: Host your model once and reuse it across multiple applications
	3. Version Control: Leverage Hugging Face's built-in version control for model updates
	4. Efficient Caching: Models are cached locally after first download
	5. Scalability: Easy to update models without redeploying the entire Space

	## Implementation Details

	### Key Components

	#### 1. Dependencies
	The implementation requires `huggingface_hub>=0.16.0` added to your requirements:

	```txt
	huggingface_hub>=0.16.0
	```

	#### 2. Configuration
	Configure your Hugging Face repository details at the top of `app.py`:

	```python
	# CONFIGURATION SECTION
	HF_REPO_ID = "your-username/wash-cfm-classifier" # Your model repository
	HF_MODEL_CACHE_DIR = "./model_cache" # Local cache directory
	```

	#### 3. Download Function
	The core download logic uses `snapshot_download()` from `huggingface_hub`:

	```python
	from huggingface_hub import snapshot_download

	model_path = snapshot_download(
	repo_id=HF_REPO_ID,
	cache_dir=HF_MODEL_CACHE_DIR,
	resume_download=True, # Resume interrupted downloads
	local_files_only=False # Force download if not cached
	)
	```

	### Key Features

	1. Intelligent Caching:
	- Models are cached in `HF_MODEL_CACHE_DIR`
	- Subsequent runs use cached versions
	- No repeated downloads

	2. Resume Capability:
	- `resume_download=True` handles interrupted downloads
	- Useful for large models and unstable connections

	3. Error Handling:
	- Comprehensive error messages for troubleshooting
	- Network connectivity checks
	- Repository access validation

	4. Performance Optimization:
	- LRU caching prevents model reloading
	- Device-aware inference (CPU/GPU/MPS)

	## Step-by-Step Implementation

	### Step 1: Upload Your Model to Hugging Face

	1. Create a Hugging Face Account (if you don't have one)
	2. Create a New Model Repository:
	- Go to https://huggingface.co/new
	- Name it appropriately (e.g., `your-username/wash-cfm-classifier`)
	- Make it Public (required for Spaces)
	- Upload your model files:
	- `model.safetensors`
	- `config.json`
	- `tokenizer.json`
	- `tokenizer_config.json`
	- `special_tokens_map.json`

	### Step 2: Update Configuration

	Edit the configuration section in `app.py`:

	```python
	HF_REPO_ID = "your-username/wash-cfm-classifier" # Replace with your actual repo
	```

	### Step 3: Install Dependencies

	Add to your `requirements.txt`:

	```txt
	huggingface_hub>=0.16.0
	```

	### Step 4: Deploy to Hugging Face Space

	1. Create or update your Hugging Face Space
	2. Upload your modified files (app.py with download logic)
	3. The Space will automatically:
	- Install dependencies from requirements.txt
	- Download the model on first run
	- Cache it for subsequent runs

	## How It Works

	### First Run
	```
	1. User accesses the Space
	2. app.py imports huggingface_hub
	3. load_model() function calls snapshot_download()
	4. Model downloads from Hugging Face Hub (~500MB)
	5. Model loads into memory
	6. First prediction takes longer (download + load time)
	```

	### Subsequent Runs
	```
	1. User accesses the Space
	2. load_model() function checks cache
	3. Model loads from local cache (~5-10 seconds)
	4. Predictions are fast
	```

	## Benefits vs Local Storage

	\| Aspect \| Local Storage \| Download-at-Runtime \|
	\|--------\|---------------\|---------------------\|
	\| Initial Load Time \| Instant \| 30-60 seconds (first run) \|
	\| Subsequent Runs \| Instant \| Fast (cached) \|
	\| Space Usage \| Counts toward 1GB limit \| Minimal (just cache) \|
	\| Model Updates \| Manual reupload \| Automatic from repo \|
	\| Scalability \| Limited by Space size \| Unlimited \|

	## Troubleshooting

	### Common Issues and Solutions

	1. Repository Not Found
	```
	Error: Repository 'username/repo-name' not found
	Solution: Verify repo ID and ensure repository is public
	```

	2. Download Timeout
	```
	Error: Download interrupted
	Solution: The resume_download=True handles this automatically
	```

	3. Authentication Issues
	```
	Error: Access denied
	Solution: Ensure repository is public or use access tokens
	```

	4. Disk Space
	```
	Error: No space left on device
	Solution: Clean cache or use external storage
	```

	### Debug Commands

	To test your setup locally:

	```python
	from huggingface_hub import snapshot_download

	# Test download
	path = snapshot_download(
	repo_id="your-username/wash-cfm-classifier",
	cache_dir="./test_cache"
	)
	print(f"Model downloaded to: {path}")
	```

	## Advanced Options

	### 1. Progressive Loading
	For very large models, consider loading components separately:

	```python
	from huggingface_hub import hf_hub_download

	# Download individual files
	config_path = hf_hub_download(
	repo_id=HF_REPO_ID,
	filename="config.json",
	cache_dir=HF_MODEL_CACHE_DIR
	)
	```

	### 2. Custom Cache Location
	Use persistent storage for Hugging Face Spaces:

	```python
	# Use /tmp or mounted storage for better persistence
	HF_MODEL_CACHE_DIR = "/tmp/model_cache"
	```

	### 3. Model Versioning
	Pin specific model versions:

	```python
	from huggingface_hub import snapshot_download

	model_path = snapshot_download(
	repo_id=HF_REPO_ID,
	revision="v1.0", # Specific version
	cache_dir=HF_MODEL_CACHE_DIR
	)
	```

	## Performance Considerations

	### First Run Optimization
	- Download Time: 30-60 seconds for ~500MB model
	- Load Time: 10-15 seconds for model initialization
	- Total: ~1-2 minutes for first prediction

	### Cached Run Performance
	- Load Time: 5-10 seconds (from cache)
	- Prediction: <1 second per inference

	### Memory Usage
	- Model Loading: ~2-3GB RAM during inference
	- Cached Storage: ~500MB disk space
	- Peak Usage: Higher during initial download

	## Best Practices

	1. Repository Setup:
	- Use clear, descriptive repository names
	- Include model cards (README.md) with usage instructions
	- Tag releases for version control

	2. Error Handling:
	- Implement graceful fallbacks
	- Provide clear error messages to users
	- Log download progress for debugging

	3. User Experience:
	- Show download progress indicators
	- Cache models efficiently
	- Handle network failures gracefully

	4. Security:
	- Use public repositories for Spaces
	- Validate model integrity
	- Implement proper access controls

	## Conclusion

	The Download-at-Runtime strategy successfully addresses the Hugging Face Spaces 1GB limit by:

	✅ Eliminating storage constraints
	✅ Enabling model reuse across applications
	✅ Providing efficient caching mechanisms
	✅ Maintaining good performance after initial setup
	✅ Offering built-in version control

	This approach is ideal for production applications where model size exceeds Space limits but network connectivity is reliable.

	---

	For questions or issues, refer to the [huggingface_hub documentation](https://huggingface.co/docs/huggingface_hub/index) or create an issue in your repository.