voice_recognition / DEPLOYMENT_GUIDE.md
msmaje's picture
Create DEPLOYMENT_GUIDE.md
a9331f1 verified

A newer version of the Gradio SDK is available: 6.4.0

Upgrade

πŸš€ Deployment Guide for Hugging Face Spaces

This guide will walk you through deploying your voice recognition model to Hugging Face Spaces.

πŸ“‹ Prerequisites

  1. Hugging Face Account: Create an account at huggingface.co
  2. Model File: Your trained model voice_recognition_fullmodel.pth
  3. Git LFS: For handling large model files

πŸ—‚οΈ File Structure

Your deployment should have this structure:

your-voice-recognition-space/
β”œβ”€β”€ app.py                           # Main Gradio application
β”œβ”€β”€ requirements.txt                 # Python dependencies
β”œβ”€β”€ README.md                       # Project documentation
β”œβ”€β”€ voice_recognition_fullmodel.pth # Your trained model
β”œβ”€β”€ Dockerfile                      # Optional: for custom container
β”œβ”€β”€ .gitignore                      # Git ignore file
└── DEPLOYMENT_GUIDE.md            # This file

πŸ”§ Step-by-Step Deployment

Step 1: Create a New Space

  1. Go to huggingface.co/new-space
  2. Choose a name for your space (e.g., voice-recognition-security)
  3. Select Gradio as the SDK
  4. Choose Public or Private visibility
  5. Click Create Space

Step 2: Prepare Your Files

  1. Update app.py: Make sure the user classes in the label encoder match your trained model:

    # In app.py, update this line with your actual user classes
    all_users = ['user1', 'user2', 'user3', 'user4', 'user5', 'user6', 'user7']
    
  2. Model File: Ensure your voice_recognition_fullmodel.pth is in the root directory

  3. Test Locally (Optional):

    pip install -r requirements.txt
    python app.py
    

Step 3: Upload Files

Option A: Web Interface

  1. Go to your space's page
  2. Click Files tab
  3. Upload each file individually
  4. For the model file, you might need to use Git LFS (see Option B)

Option B: Git Clone (Recommended for large files)

  1. Clone your space repository:

    git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
    cd YOUR_SPACE_NAME
    
  2. Set up Git LFS for large files:

    git lfs install
    git lfs track "*.pth"
    git add .gitattributes
    
  3. Add all your files:

    cp /path/to/your/files/* .
    git add .
    git commit -m "Initial deployment of voice recognition system"
    git push
    

Step 4: Configure Space Settings

  1. Go to your space's Settings tab
  2. Hardware:
    • For CPU inference: Basic (free)
    • For faster processing: CPU Upgrade ($0.05/hour)
  3. Timeout: Set to appropriate value (default is usually fine)
  4. Visibility: Adjust as needed

Step 5: Monitor Deployment

  1. Your space will automatically build after pushing files
  2. Check the Logs tab for any errors
  3. Once built, your space will be available at: https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME

πŸ› Troubleshooting

Common Issues and Solutions

1. Model Loading Errors

Problem: FileNotFoundError or model loading issues Solution:

  • Ensure voice_recognition_fullmodel.pth is in the root directory
  • Check file size limits (use Git LFS for files >10MB)
  • Verify model architecture matches training code

2. Dependency Issues

Problem: Import errors or package conflicts Solution:

  • Update requirements.txt with exact versions
  • Test locally with a clean virtual environment
  • Check for GPU-specific packages if using CPU deployment

3. Memory Issues

Problem: OutOfMemoryError during model loading Solution:

  • Use CPU-only inference: map_location='cpu'
  • Consider model quantization for smaller size
  • Upgrade to a higher memory tier

4. Audio Processing Errors

Problem: Librosa or audio processing failures Solution:

  • Install system audio libraries in Dockerfile
  • Add error handling for unsupported formats
  • Test with various audio file types

Example Error Fixes

Fix 1: Model Architecture Mismatch

# In app.py, add this fallback loading method
try:
    model = torch.load('voice_recognition_fullmodel.pth', map_location=device)
    model.eval()
except Exception as e:
    print(f"Loading full model failed: {e}")
    # Create model architecture and load state dict
    model = TransferLearningModel(len(all_users))
    state_dict = torch.load('voice_recognition_fullmodel.pth', map_location=device)
    model.load_state_dict(state_dict)
    model.eval()

Fix 2: Audio Format Support

# Add more robust audio loading
def load_audio_robust(file_path):
    try:
        audio, sr = librosa.load(file_path, res_type='kaiser_fast')
        return audio, sr
    except Exception as e1:
        try:
            import soundfile as sf
            audio, sr = sf.read(file_path)
            if len(audio.shape) > 1:
                audio = audio[:, 0]  # Take first channel
            return audio, sr
        except Exception as e2:
            raise Exception(f"Could not load audio: {e1}, {e2}")

πŸ”’ Security Considerations

For Production Deployment:

  1. Environment Variables: Store sensitive config in space secrets
  2. Rate Limiting: Implement request throttling
  3. Input Validation: Validate audio file types and sizes
  4. Logging: Add comprehensive logging for security monitoring

πŸ“Š Performance Optimization

Tips for Better Performance:

  1. Model Optimization:

    # Quantize model for smaller size and faster inference
    model = torch.quantization.quantize_dynamic(
        model, {torch.nn.Linear}, dtype=torch.qint8
    )
    
  2. Caching: Cache model loading and feature extraction

  3. Batch Processing: Process multiple files if needed

  4. Memory Management: Clear unused variables

🎯 Testing Your Deployment

Test Cases:

  1. Valid User Audio: Test with authorized user samples
  2. Invalid User Audio: Test with unauthorized samples
  3. Various Formats: Test .wav, .mp3, .flac files
  4. Edge Cases: Empty files, very short/long audio
  5. Noise Tests: Test with background noise

Validation Script:

def test_deployment():
    # Test cases for your deployed model
    test_cases = [
        ("valid_user1.wav", True),
        ("invalid_user.wav", False),
        ("noisy_audio.mp3", True),  # Should still work
    ]
    
    for audio_file, expected_access in test_cases:
        result = predict_voice(audio_file)
        print(f"File: {audio_file}, Expected: {expected_access}, Got: {result[0]}")

πŸ“ž Support

If you encounter issues:

  1. Check the Hugging Face Spaces documentation
  2. Review the logs in your space's Logs tab
  3. Join the Hugging Face Discord for community support
  4. Open an issue in this repository

Good luck with your deployment! πŸš€