cfm_topic_classifier / DEPLOYMENT_GUIDE.md
ibagur's picture
Complete history cleanup and model removal
cb2ce22

Hugging Face Spaces Deployment Guide

Quick Start for Download-at-Runtime

This guide walks you through deploying your WASH CFM Topic Classifier to Hugging Face Spaces using the Download-at-Runtime strategy.

Prerequisites

  1. βœ… Your model files are ready (.safetensors, config files, tokenizer files)
  2. βœ… You have a Hugging Face account
  3. βœ… You've created a public model repository on Hugging Face Hub

Step 1: Upload Model to Hugging Face Hub

1.1 Create Model Repository

  1. Go to huggingface.co/new
  2. Select "Model" tab
  3. Repository name: wash-cfm-classifier (or your preferred name)
  4. Make it Public (required for Spaces)
  5. Click "Create a new model"

1.2 Upload Model Files

Upload these files to your repository:

πŸ“ wash-cfm-classifier/
β”œβ”€β”€ model.safetensors          (~400-500MB)
β”œβ”€β”€ config.json                (~1KB)
β”œβ”€β”€ tokenizer.json            (~2-3MB)
β”œβ”€β”€ tokenizer_config.json     (~1KB)
└── special_tokens_map.json   (~1KB)

Methods to upload:

  • Web Interface: Drag and drop files
  • Git LFS: For command-line users
  • Python Script: Use huggingface_hub library

1.3 Add Model Card (Optional but Recommended)

Create a README.md in your repository:

# WASH CFM Topic Classifier

A fine-tuned ModernBERT model for classifying WASH (Water, Sanitation, and Hygiene) feedback into topic categories.

## Usage

```python
from transformers import pipeline

classifier = pipeline("text-classification", 
                     model="your-username/wash-cfm-classifier")
result = classifier("The water pump is broken")

Model Details

  • Base Model: modernbert-large
  • Fine-tuned on: WASH CFM feedback data
  • Task: Multi-label text classification
  • Labels: [Add your actual labels]

## Step 2: Update Your Application Code

### 2.1 Configuration

In `app.py`, update the configuration section:

```python
# CONFIGURATION SECTION
HF_REPO_ID = "your-username/wash-cfm-classifier"  # ← Replace with your repo
HF_MODEL_CACHE_DIR = "./model_cache"  # Cache directory

2.2 Verify Dependencies

Ensure requirements.txt includes:

huggingface_hub>=0.16.0
torch>=2.0.0
transformers>=4.30.0
gradio>=4.0.0

Step 3: Create Hugging Face Space

3.1 Create New Space

  1. Go to huggingface.co/spaces
  2. Click "Create new Space"
  3. Fill in details:
    • Space name: wash-cfm-classifier (or your choice)
    • License: apache-2.0 (or your preference)
    • Hardware: CPU basic (sufficient for this model)
    • Visibility: Public

3.2 Choose SDK

Select "Gradio" as your SDK.

3.3 Upload Files

Upload these files to your Space repository:

πŸ“ wash-cfm-classifier-space/
β”œβ”€β”€ app.py                    # Your main application
β”œβ”€β”€ requirements.txt          # Dependencies
β”œβ”€β”€ README.md                 # Space documentation
└── .gitattributes           # Optional: for large file handling

Step 4: Space Configuration

4.1 Hardware Recommendations

For your model size (~500MB):

  • CPU Basic: βœ… Sufficient (free tier)
  • CPU Upgrade: ⚑ Faster inference
  • GPU: πŸš€ Only if needed for larger models

4.2 Environment Variables (Optional)

In Space Settings β†’ Environment, add:

HF_HOME=/tmp/.cache/huggingface
TRANSFORMERS_CACHE=/tmp/.cache/transformers

This ensures cache directories have sufficient space.

4.3 Build Logs

Monitor the "Logs" tab for:

  • βœ… Successful dependency installation
  • βœ… Model download progress
  • βœ… Application startup

Step 5: Testing and Validation

5.1 First Run

The first run will:

  1. Install dependencies (~2-3 minutes)
  2. Download model (~1-2 minutes, depending on connection)
  3. Start application (~30 seconds)

Expected timeline: 5-7 minutes for first successful run.

5.2 Subsequent Runs

After caching:

  • Startup time: ~10-15 seconds
  • Prediction time: <1 second per request

5.3 Verification Checklist

  • Space builds successfully (green βœ… in status)
  • Model downloads without errors
  • Web interface loads
  • Sample predictions work
  • Performance is acceptable

Step 6: Optimization and Monitoring

6.1 Performance Monitoring

Monitor these metrics:

  • Build time: First deployment duration
  • Download time: Model download duration
  • Inference time: Response latency
  • Memory usage: RAM consumption

6.2 Common Issues and Solutions

Issue: "Model download timeout"

# Solution: Use faster hardware tier or optimize cache
HF_MODEL_CACHE_DIR = "/tmp/model_cache"

Issue: "Out of memory"

# Solution: Use smaller hardware or optimize model loading
device = torch.device("cpu")  # Force CPU if GPU memory insufficient

Issue: "Repository not found"

# Solution: Verify repository ID and visibility
HF_REPO_ID = "exact-username/exact-repo-name"  # Case sensitive

6.3 Space Management

Regular maintenance:

  • Monitor disk usage in cache directory
  • Update model versions by changing repository revision
  • Scale hardware based on usage patterns

Version updates:

  • Update model in your Hub repository
  • Space automatically uses latest version (or specify revision)

Step 7: Production Considerations

7.1 Security

  • βœ… Use public repositories for Spaces
  • βœ… Validate model integrity
  • βœ… Implement proper error handling
  • βœ… Monitor for unusual access patterns

7.2 Reliability

  • βœ… Implement retry logic for downloads
  • βœ… Add fallback mechanisms
  • βœ… Monitor network connectivity
  • βœ… Set up alerts for failures

7.3 Scalability

  • Multiple Spaces: Same model, different interfaces
  • Load Balancing: Distribute across multiple hardware tiers
  • Caching Strategy: Optimize for your usage patterns

Troubleshooting Guide

Build Failures

Error Solution
pip install failed Check requirements.txt syntax
torch install failed Verify Python version compatibility
Memory limit exceeded Reduce model size or upgrade hardware

Runtime Failures

Error Solution
Download interrupted Network issues - will auto-resume
Model not found Verify repository ID and visibility
CUDA out of memory Use CPU fallback or upgrade hardware

Performance Issues

Issue Solution
Slow first run Normal - model download required
High memory usage Consider hardware upgrade
Slow predictions Optimize model or upgrade hardware

Success Metrics

Your deployment is successful when:

  • βœ… Space builds without errors
  • βœ… Model downloads and loads successfully
  • βœ… Web interface is responsive
  • βœ… Predictions are accurate and fast
  • βœ… Resource usage is within limits

Next Steps

  1. Monitor Performance: Track usage and optimize as needed
  2. User Feedback: Collect feedback and iterate
  3. Feature Updates: Add new features or model improvements
  4. Scaling: Consider multiple spaces or hardware upgrades

πŸŽ‰ Congratulations! Your WASH CFM Topic Classifier is now deployed to Hugging Face Spaces with Download-at-Runtime functionality, bypassing the 1GB storage limit while maintaining excellent performance.

For additional help, consult: