Hugging Face Spaces Deployment Guide
What is Hugging Face Spaces?
Hugging Face Spaces is a free hosting platform for machine learning demos and applications. It allows you to:
- β Deploy web apps for free (with resource limits)
- β Set environment variables and secrets securely
- β Use Docker for full customization
- β Get a public URL accessible worldwide
- β Integrate with GitHub for continuous deployment
Key Features
- Free tier: 2 vCPU, 8GB RAM per Space
- Public/Private: Choose visibility level
- Auto-builds: Redeploy on GitHub push (with GitHub integration)
- Secrets management: Store API tokens securely
- Multiple SDK support: Gradio, Streamlit, Docker, Python
How Does Hugging Face Spaces Work?
1. Creation Phase
You create a new Space and choose an SDK (Gradio, Streamlit, Docker, etc.)
βββββββββββββββββββββββββββββββββββββββββββ
β Hugging Face Spaces Dashboard β
β ββ Create New Space β
β ββ Choose SDK: Docker β [We use this] β
β ββ Set Name: audit-repair-env β
β ββ Set License: MIT β
β ββ Create β
βββββββββββββββββββββββββββββββββββββββββββ
2. Build Phase
HF Spaces pulls your code (from GitHub) and builds a Docker image
GitHub Repo Hugging Face Spaces
β β
ββ Dockerfile βββββ Build Server
ββ requirements.txt β
ββ inference.py Builds Docker Image
ββ server.py Creates Container
ββ demo.py Allocates Resources
β
Pushes to Registry
3. Runtime Phase
The container runs on HF's infrastructure with:
- Assigned vCPU/RAM
- Public HTTP endpoint
- Environment variables & secrets
Public URL
β
ββ https://huggingface.co/spaces/username/audit-repair-env
β
ββ Routes to Container
β ββ :7860 (Gradio Demo)
β ββ :8000 (FastAPI Server - optional)
β
ββ Processes Requests
ββ Receives HTTP request
ββ Runs inference.py / demo.py
ββ Returns response
4. Lifecycle
- Sleeping: Space goes to sleep after 48 hours of inactivity
- Paused: You can manually pause spaces
- Running: Active and processing requests
- Error: Logs visible in Space page
Step-by-Step Deployment
Step 1: Prepare Your GitHub Repository
Requirement: Public GitHub repo with your code
git init
git add .
git commit -m "Initial commit"
git remote add origin https://github.com/YOUR_USERNAME/audit-repair-env.git
git branch -M main
git push -u origin main
File checklist:
- β
inference.py(root directory) - β
server.py - β
tasks.py - β
requirements.txt - β
demo.py - β
Dockerfile - β
README.md
Step 2: Create Hugging Face Spaces
- Go to huggingface.co/spaces
- Click "Create new Space"
- Fill in:
- Owner: Your HF username
- Space name:
audit-repair-env(or your choice) - License: MIT
- SDK: Docker β IMPORTANT
- Click "Create Space"
Step 3: Connect to GitHub (Auto-Deployment)
In your Space Settings:
- Go to Space β Settings (gear icon)
- Scroll to "Linked Repository"
- Click "Link a repository"
- Select your GitHub repo:
username/audit-repair-env - Choose "Simple" or "Sync" mode
- Simple: Manual redeploy via button
- Sync: Auto-redeploy on GitHub push (recommended)
Step 4: Set Environment Variables & Secrets
In Space Settings:
Scroll to "Repository secrets"
Click "Add secret"
Add:
Name: HF_TOKEN Value: hf_your_actual_token_hereAdd:
Name: API_BASE_URL Value: https://router.huggingface.co/v1Add:
Name: MODEL_NAME Value: Qwen/Qwen2.5-72B-Instruct
β οΈ NOTE: These secrets are only passed to Docker at build-time. If they need to be runtime-only, use the .dockerfile method.
Step 5: Check Logs & Verify Deployment
- Go to your Space URL:
https://huggingface.co/spaces/username/audit-repair-env - Click "Logs" tab to see build output
- Wait for status: "Running"
- Click the "App" link to access your demo
Dockerfile Setup for Spaces
Your Dockerfile should be:
FROM python:3.10-slim
WORKDIR /app
# Copy everything
COPY . .
# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Expose port for Gradio (or FastAPI)
EXPOSE 7860
# Run Gradio demo by default
CMD ["python", "demo.py"]
Alternative (run both server + demo):
FROM python:3.10-slim
WORKDIR /app
COPY . .
RUN pip install --no-cache-dir -r requirements.txt
EXPOSE 7860 8000
# Create startup script
RUN echo '#!/bin/bash\npython server.py &\npython demo.py' > /app/start.sh
RUN chmod +x /app/start.sh
CMD ["/app/start.sh"]
Troubleshooting Common Issues
Issue: "Build Failed"
β Docker build failed
Fixes:
- Check Logs tab for error messages
- Verify
requirements.txtsyntax - Ensure
Dockerfilereferences correct files - Check for permission issues
Issue: "Application Error" on Load
β Application Error: Connection refused
Fixes:
- Verify app runs on
0.0.0.0:7860 - Check environment variables are set
- Look at Space Logs for exceptions
- Ensure HF_TOKEN is valid
Issue: "HF_TOKEN not valid"
β Error initializing client: Invalid token
Fixes:
- Generate new token at huggingface.co/settings/tokens
- Make sure it has API access
- Update secret in Space Settings
- Rebuild Space
Issue: "Model not found"
β Error: MODEL_NAME 'Qwen/Qwen2.5-72B-Instruct' not found
Fixes:
- Verify model exists on Hugging Face Hub
- Check if you have access (private models need approval)
- Use inference API endpoint instead:
API_BASE_URL=https://api-inference.huggingface.co/v1 - Ensure HF_TOKEN is set
Issue: "Out of Memory"
β Killed due to resource limit
Fixes:
- Free tier is 2 vCPU / 8GB RAM
- Reduce model size
- Use a smaller LLM (e.g.,
mistral-7b) - Consider upgrading to upgrade (usually not needed)
- Optimize inference batch size
Issue: Space Falls Asleep
β οΈ This space has been sleeping for 48 hours
Explanation: HF Spaces sleep after inactivity to save resources
Solutions:
- Upgrade to paid tier (stays warm)
- Add uptime monitoring (pings Space regularly)
- Use HF Pro subscription
Performance Optimization
For Spaces with Free Tier (2 vCPU, 8GB RAM)
1. Use Quantized Models
# Instead of full precision 72B
MODEL_NAME = "Qwen/Qwen2.5-32B-Instruct-GGUF" # Smaller, quantized
2. Cache Client
@cache
def get_openai_client():
return OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN)
3. Limit Request Size
MAX_TOKENS = 150 # Reduce from 300
TEMPERATURE = 0.1 # Lower temp = faster convergence
4. Async Requests (if multiple concurrent users)
import asyncio
# Use async/await for non-blocking I/O
Real-World Example: Workflow
1. Developer makes changes locally
ββ git commit -am "Fix HF_TOKEN validation"
ββ git push origin main
2. GitHub notifies HF Spaces
ββ HF detects push to linked repo
ββ Triggers automatic build
3. HF Spaces builds Docker image
ββ Pulls latest code from main branch
ββ Runs: pip install -r requirements.txt
ββ Loads secrets (HF_TOKEN, API_BASE_URL, etc.)
ββ Runs: python demo.py
4. Container starts running
ββ Gradio interface initializes on :7860
ββ FastAPI server (optional) on :8000
ββ Public URL becomes active
5. User accesses Space URL
ββ Browser loads Gradio interface
ββ User selects task (easy/medium/hard)
ββ Clicks "Run Inference"
ββ inference.py executes with LLM calls
6. LLM calls routed via:
API_BASE_URL (huggingface.co/v1)
β
HF Token used for authentication
β
Model (Qwen/Qwen2.5-72B-Instruct) queried
β
Response returned to inference.py
β
Results shown in Gradio UI
Security Best Practices
β DO
- Set HF_TOKEN as a secret in Space settings
- Use
.gitignoreto prevent token from being committed:.env .env.local *.key secrets/ - Validate all user inputs
- Use HTTPS (handled by HF automatically)
β DON'T
- Commit API keys to GitHub
- Expose secrets in logs
- Store sensitive data in code
- Leave Space public if handling private data
Next Steps
Verify locally first:
export HF_TOKEN="your_token" export API_BASE_URL="https://router.huggingface.co/v1" python inference.py # Run submission tests python demo.py # Test Gradio UIPush to GitHub:
git add -A git commit -m "Ready for HF Spaces deployment" git push origin mainCreate & Link Space:
- Create Space on HF
- Link GitHub repo
- Set secrets in Settings
- Wait for build
Test on Spaces:
- Access public URL
- Run test inference
- Share link with community
Additional Resources
- Hugging Face Spaces Docs
- Docker Spaces Guide
- Gradio Documentation
- OpenAI Python Client
- HF Inference API Docs
Good luck with your submission! π