infinitetalk / DEPLOYMENT.md
ShalomKing's picture
Upload folder using huggingface_hub
38572a2 verified

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

InfiniteTalk - Deployment Guide

Prerequisites

  1. HuggingFace Account: Sign up at https://huggingface.co
  2. Git & Git LFS: Install from https://git-scm.com
  3. HuggingFace CLI (optional but recommended):
    pip install huggingface_hub
    huggingface-cli login
    

Deployment Steps

Option 1: Web UI (Easiest)

  1. Create New Space

    • Go to https://huggingface.co/new-space
    • Space name: infinitetalk (or your choice)
    • License: apache-2.0
    • SDK: Gradio
    • Hardware: ZeroGPU (free tier available!)
    • Click "Create Space"
  2. Upload Files

    • Click "Files" tab in your new Space
    • Upload all files from this directory:
      • README.md (with YAML metadata)
      • app.py
      • requirements.txt
      • packages.txt
      • .gitignore
      • src/ folder
      • wan/ folder
      • utils/ folder
      • assets/ folder (optional)
      • examples/ folder (optional)
      • LICENSE.txt
  3. Wait for Build

    • Space will automatically build
    • First build takes 5-10 minutes (installing dependencies)
    • Check "Logs" tab for build progress
    • Watch for any error messages
  4. Test Your Space

    • Once built, the Space will show "Running"
    • First generation will download models (~2-3 minutes)
    • Try with example images/audio

Option 2: Git (Advanced)

  1. Clone Your Space

    git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
    cd YOUR_SPACE_NAME
    
  2. Copy Files

    # From your local infinitetalk-hf-space directory
    cp -r /path/to/infinitetalk-hf-space/* .
    
  3. Commit and Push

    git add .
    git commit -m "Initial InfiniteTalk Space deployment"
    git push
    
  4. Monitor Build

    • Go to your Space URL
    • Check "Logs" for build progress

Option 3: CLI Upload

# From this directory
huggingface-cli upload YOUR_USERNAME/YOUR_SPACE_NAME . --repo-type=space

Troubleshooting

Build Fails with Flash-Attn Error

Symptom: flash-attn compilation fails

Solutions:

  1. Try adding to requirements.txt:

    flash-attn==2.7.4.post1 --no-build-isolation
    
  2. Or use Dockerfile approach (create Dockerfile):

    FROM nvidia/cuda:12.1.0-devel-ubuntu22.04
    
    RUN apt-get update && apt-get install -y \
        python3.10 python3-pip git ffmpeg build-essential libsndfile1
    
    WORKDIR /app
    
    # Install PyTorch first
    RUN pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1
    
    # Install flash-attn with pre-built wheels
    RUN pip install flash-attn==2.7.4.post1 --no-build-isolation
    
    # Copy and install requirements
    COPY requirements.txt .
    RUN pip install -r requirements.txt
    
    # Copy application
    COPY . .
    
    CMD ["python3", "app.py"]
    

Models Not Downloading

Symptom: "Model download failed" error

Solutions:

  1. Check HuggingFace is not down: https://status.huggingface.co
  2. Add HF_TOKEN secret in Space settings (for private models)
  3. Check model repository IDs in utils/model_loader.py

Out of Memory (OOM) Errors

Symptom: "CUDA out of memory"

Solutions:

  1. Reduce resolution (use 480p instead of 720p)
  2. Reduce diffusion steps (try 30 instead of 40)
  3. Process shorter videos
  4. Check utils/gpu_manager.py settings

Space Stuck in "Building"

Symptom: Build takes >15 minutes

Solutions:

  1. Check "Logs" tab for errors
  2. Flash-attn compilation can take 10+ minutes
  3. If timeout, try Dockerfile approach
  4. Consider pre-built flash-attn wheels

ZeroGPU Quota Exceeded

Symptom: "GPU quota exceeded"

Solutions:

  1. Free Tier: Wait for quota to refill (1 ZeroGPU second = 30 real seconds)
  2. Upgrade to PRO: $9/month for 8× quota
  3. Apply for Grant: Community GPU Grant for innovative projects
  4. Optimize generation time (reduce steps, use 480p)

Post-Deployment

Monitor Usage

  • Check "Logs" tab regularly
  • Monitor GPU quota in Space settings
  • Watch for user error reports in Community tab

Update Space

# Make changes locally
git add .
git commit -m "Update: [description]"
git push

Space will automatically rebuild on push.

Add Examples

Upload example images and audio to examples/ folder to help users get started quickly.

Enable Discussions

In Space settings, enable "Discussions" to get user feedback.

Apply for Community GPU Grant

If your Space is popular and useful:

  1. Go to Space Settings
  2. Click "Apply for community GPU grant"
  3. Explain your project's value to the community

Hardware Options

Free ZeroGPU

  • Cost: FREE
  • Limits: 300s per session, 600s max quota
  • Best for: Testing, light usage, demos
  • GPU: H200 with 70GB VRAM

PRO ZeroGPU

  • Cost: $9/month
  • Benefits: 8× quota, priority queue, 10 Spaces
  • Best for: Regular usage, public demos

Dedicated GPU (Paid)

  • T4 (16GB): $0.60/hour - Too small for InfiniteTalk
  • A10G (24GB): $1.05/hour - Minimum viable
  • A100 (40GB): $3.00/hour - Overkill but works
  • Best for: Private, dedicated instances

Performance Expectations

First Generation

  • Model download: 2-3 minutes
  • Generation (10s video, 480p): 40 seconds
  • Total: ~3-4 minutes

Subsequent Generations

  • Generation (10s video, 480p): 35-40 seconds
  • Generation (10s video, 720p): 60-70 seconds

Free Tier Usage

  • ~3-5 generations per quota period (600s ZeroGPU)
  • Quota refills gradually (1 ZeroGPU second per 30 real seconds)

Support

Success Checklist

  • Space builds without errors
  • Models download successfully on first run
  • Example image-to-video generation works
  • Example video dubbing works
  • No OOM errors with 480p
  • GPU memory is cleaned up between runs
  • Gradio UI is responsive
  • Examples are loaded and working
  • README displays correctly
  • Space doesn't crash after multiple uses

Good luck with your deployment! 🚀