Team_Sparks / docs /HF_SPACES_GUIDE.md
KeithXD's picture
Upload folder using huggingface_hub
4702dbb verified

Hugging Face Spaces Deployment Guide

What is Hugging Face Spaces?

Hugging Face Spaces is a free hosting platform for machine learning demos and applications. It allows you to:

  • βœ… Deploy web apps for free (with resource limits)
  • βœ… Set environment variables and secrets securely
  • βœ… Use Docker for full customization
  • βœ… Get a public URL accessible worldwide
  • βœ… Integrate with GitHub for continuous deployment

Key Features

  • Free tier: 2 vCPU, 8GB RAM per Space
  • Public/Private: Choose visibility level
  • Auto-builds: Redeploy on GitHub push (with GitHub integration)
  • Secrets management: Store API tokens securely
  • Multiple SDK support: Gradio, Streamlit, Docker, Python

How Does Hugging Face Spaces Work?

1. Creation Phase

You create a new Space and choose an SDK (Gradio, Streamlit, Docker, etc.)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Hugging Face Spaces Dashboard          β”‚
β”‚  β”œβ”€ Create New Space                    β”‚
β”‚  β”œβ”€ Choose SDK: Docker ← [We use this] β”‚
β”‚  β”œβ”€ Set Name: audit-repair-env          β”‚
β”‚  β”œβ”€ Set License: MIT                    β”‚
β”‚  └─ Create                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

2. Build Phase

HF Spaces pulls your code (from GitHub) and builds a Docker image

GitHub Repo              Hugging Face Spaces
    β”‚                           β”‚
    β”œβ”€ Dockerfile     ────→    Build Server
    β”œβ”€ requirements.txt        β”‚
    β”œβ”€ inference.py      Builds Docker Image
    β”œβ”€ server.py         Creates Container
    └─ demo.py           Allocates Resources
                         β”‚
                      Pushes to Registry

3. Runtime Phase

The container runs on HF's infrastructure with:

  • Assigned vCPU/RAM
  • Public HTTP endpoint
  • Environment variables & secrets
Public URL
    β”‚
    β”œβ”€ https://huggingface.co/spaces/username/audit-repair-env
    β”‚
    β”œβ”€ Routes to Container
    β”‚     β”œβ”€ :7860 (Gradio Demo)
    β”‚     └─ :8000 (FastAPI Server - optional)
    β”‚
    └─ Processes Requests
        β”œβ”€ Receives HTTP request
        β”œβ”€ Runs inference.py / demo.py
        └─ Returns response

4. Lifecycle

  • Sleeping: Space goes to sleep after 48 hours of inactivity
  • Paused: You can manually pause spaces
  • Running: Active and processing requests
  • Error: Logs visible in Space page

Step-by-Step Deployment

Step 1: Prepare Your GitHub Repository

Requirement: Public GitHub repo with your code

git init
git add .
git commit -m "Initial commit"
git remote add origin https://github.com/YOUR_USERNAME/audit-repair-env.git
git branch -M main
git push -u origin main

File checklist:

  • βœ… inference.py (root directory)
  • βœ… server.py
  • βœ… tasks.py
  • βœ… requirements.txt
  • βœ… demo.py
  • βœ… Dockerfile
  • βœ… README.md

Step 2: Create Hugging Face Spaces

  1. Go to huggingface.co/spaces
  2. Click "Create new Space"
  3. Fill in:
    • Owner: Your HF username
    • Space name: audit-repair-env (or your choice)
    • License: MIT
    • SDK: Docker ← IMPORTANT
  4. Click "Create Space"

Step 3: Connect to GitHub (Auto-Deployment)

In your Space Settings:

  1. Go to Space β†’ Settings (gear icon)
  2. Scroll to "Linked Repository"
  3. Click "Link a repository"
  4. Select your GitHub repo: username/audit-repair-env
  5. Choose "Simple" or "Sync" mode
    • Simple: Manual redeploy via button
    • Sync: Auto-redeploy on GitHub push (recommended)

Step 4: Set Environment Variables & Secrets

In Space Settings:

  1. Scroll to "Repository secrets"

  2. Click "Add secret"

  3. Add:

    Name: HF_TOKEN
    Value: hf_your_actual_token_here
    
  4. Add:

    Name: API_BASE_URL
    Value: https://router.huggingface.co/v1
    
  5. Add:

    Name: MODEL_NAME
    Value: Qwen/Qwen2.5-72B-Instruct
    

⚠️ NOTE: These secrets are only passed to Docker at build-time. If they need to be runtime-only, use the .dockerfile method.

Step 5: Check Logs & Verify Deployment

  1. Go to your Space URL: https://huggingface.co/spaces/username/audit-repair-env
  2. Click "Logs" tab to see build output
  3. Wait for status: "Running"
  4. Click the "App" link to access your demo

Dockerfile Setup for Spaces

Your Dockerfile should be:

FROM python:3.10-slim

WORKDIR /app

# Copy everything
COPY . .

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Expose port for Gradio (or FastAPI)
EXPOSE 7860

# Run Gradio demo by default
CMD ["python", "demo.py"]

Alternative (run both server + demo):

FROM python:3.10-slim

WORKDIR /app
COPY . .
RUN pip install --no-cache-dir -r requirements.txt

EXPOSE 7860 8000

# Create startup script
RUN echo '#!/bin/bash\npython server.py &\npython demo.py' > /app/start.sh
RUN chmod +x /app/start.sh

CMD ["/app/start.sh"]

Troubleshooting Common Issues

Issue: "Build Failed"

❌ Docker build failed

Fixes:

  1. Check Logs tab for error messages
  2. Verify requirements.txt syntax
  3. Ensure Dockerfile references correct files
  4. Check for permission issues

Issue: "Application Error" on Load

❌ Application Error: Connection refused

Fixes:

  1. Verify app runs on 0.0.0.0:7860
  2. Check environment variables are set
  3. Look at Space Logs for exceptions
  4. Ensure HF_TOKEN is valid

Issue: "HF_TOKEN not valid"

❌ Error initializing client: Invalid token

Fixes:

  1. Generate new token at huggingface.co/settings/tokens
  2. Make sure it has API access
  3. Update secret in Space Settings
  4. Rebuild Space

Issue: "Model not found"

❌ Error: MODEL_NAME 'Qwen/Qwen2.5-72B-Instruct' not found

Fixes:

  1. Verify model exists on Hugging Face Hub
  2. Check if you have access (private models need approval)
  3. Use inference API endpoint instead:
    API_BASE_URL=https://api-inference.huggingface.co/v1
    
  4. Ensure HF_TOKEN is set

Issue: "Out of Memory"

❌ Killed due to resource limit

Fixes:

  • Free tier is 2 vCPU / 8GB RAM
  • Reduce model size
  • Use a smaller LLM (e.g., mistral-7b)
  • Consider upgrading to upgrade (usually not needed)
  • Optimize inference batch size

Issue: Space Falls Asleep

⚠️ This space has been sleeping for 48 hours

Explanation: HF Spaces sleep after inactivity to save resources

Solutions:

  1. Upgrade to paid tier (stays warm)
  2. Add uptime monitoring (pings Space regularly)
  3. Use HF Pro subscription

Performance Optimization

For Spaces with Free Tier (2 vCPU, 8GB RAM)

1. Use Quantized Models

# Instead of full precision 72B
MODEL_NAME = "Qwen/Qwen2.5-32B-Instruct-GGUF"  # Smaller, quantized

2. Cache Client

@cache
def get_openai_client():
    return OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN)

3. Limit Request Size

MAX_TOKENS = 150  # Reduce from 300
TEMPERATURE = 0.1  # Lower temp = faster convergence

4. Async Requests (if multiple concurrent users)

import asyncio
# Use async/await for non-blocking I/O

Real-World Example: Workflow

1. Developer makes changes locally
   β”œβ”€ git commit -am "Fix HF_TOKEN validation"
   └─ git push origin main

2. GitHub notifies HF Spaces
   β”œβ”€ HF detects push to linked repo
   └─ Triggers automatic build

3. HF Spaces builds Docker image
   β”œβ”€ Pulls latest code from main branch
   β”œβ”€ Runs: pip install -r requirements.txt
   β”œβ”€ Loads secrets (HF_TOKEN, API_BASE_URL, etc.)
   └─ Runs: python demo.py

4. Container starts running
   β”œβ”€ Gradio interface initializes on :7860
   β”œβ”€ FastAPI server (optional) on :8000
   └─ Public URL becomes active

5. User accesses Space URL
   β”œβ”€ Browser loads Gradio interface
   β”œβ”€ User selects task (easy/medium/hard)
   β”œβ”€ Clicks "Run Inference"
   └─ inference.py executes with LLM calls

6. LLM calls routed via:
   API_BASE_URL (huggingface.co/v1)
       ↓
   HF Token used for authentication
       ↓
   Model (Qwen/Qwen2.5-72B-Instruct) queried
       ↓
   Response returned to inference.py
       ↓
   Results shown in Gradio UI

Security Best Practices

βœ… DO

  • Set HF_TOKEN as a secret in Space settings
  • Use .gitignore to prevent token from being committed:
    .env
    .env.local
    *.key
    secrets/
    
  • Validate all user inputs
  • Use HTTPS (handled by HF automatically)

❌ DON'T

  • Commit API keys to GitHub
  • Expose secrets in logs
  • Store sensitive data in code
  • Leave Space public if handling private data

Next Steps

  1. Verify locally first:

    export HF_TOKEN="your_token"
    export API_BASE_URL="https://router.huggingface.co/v1"
    python inference.py  # Run submission tests
    python demo.py       # Test Gradio UI
    
  2. Push to GitHub:

    git add -A
    git commit -m "Ready for HF Spaces deployment"
    git push origin main
    
  3. Create & Link Space:

    • Create Space on HF
    • Link GitHub repo
    • Set secrets in Settings
    • Wait for build
  4. Test on Spaces:

    • Access public URL
    • Run test inference
    • Share link with community

Additional Resources


Good luck with your submission! πŸš€