# Hugging Face Spaces Deployment Guide ## What is Hugging Face Spaces? **Hugging Face Spaces** is a free hosting platform for machine learning demos and applications. It allows you to: - ✅ Deploy web apps for free (with resource limits) - ✅ Set environment variables and secrets securely - ✅ Use Docker for full customization - ✅ Get a public URL accessible worldwide - ✅ Integrate with GitHub for continuous deployment ### Key Features - **Free tier**: 2 vCPU, 8GB RAM per Space - **Public/Private**: Choose visibility level - **Auto-builds**: Redeploy on GitHub push (with GitHub integration) - **Secrets management**: Store API tokens securely - **Multiple SDK support**: Gradio, Streamlit, Docker, Python --- ## How Does Hugging Face Spaces Work? ### 1. **Creation Phase** You create a new Space and choose an SDK (Gradio, Streamlit, Docker, etc.) ``` ┌─────────────────────────────────────────┐ │ Hugging Face Spaces Dashboard │ │ ├─ Create New Space │ │ ├─ Choose SDK: Docker ← [We use this] │ │ ├─ Set Name: audit-repair-env │ │ ├─ Set License: MIT │ │ └─ Create │ └─────────────────────────────────────────┘ ``` ### 2. **Build Phase** HF Spaces pulls your code (from GitHub) and builds a Docker image ``` GitHub Repo Hugging Face Spaces │ │ ├─ Dockerfile ────→ Build Server ├─ requirements.txt │ ├─ inference.py Builds Docker Image ├─ server.py Creates Container └─ demo.py Allocates Resources │ Pushes to Registry ``` ### 3. **Runtime Phase** The container runs on HF's infrastructure with: - Assigned vCPU/RAM - Public HTTP endpoint - Environment variables & secrets ``` Public URL │ ├─ https://huggingface.co/spaces/username/audit-repair-env │ ├─ Routes to Container │ ├─ :7860 (Gradio Demo) │ └─ :8000 (FastAPI Server - optional) │ └─ Processes Requests ├─ Receives HTTP request ├─ Runs inference.py / demo.py └─ Returns response ``` ### 4. **Lifecycle** - **Sleeping**: Space goes to sleep after 48 hours of inactivity - **Paused**: You can manually pause spaces - **Running**: Active and processing requests - **Error**: Logs visible in Space page --- ## Step-by-Step Deployment ### Step 1: Prepare Your GitHub Repository **Requirement**: Public GitHub repo with your code ```bash git init git add . git commit -m "Initial commit" git remote add origin https://github.com/YOUR_USERNAME/audit-repair-env.git git branch -M main git push -u origin main ``` **File checklist**: - ✅ `inference.py` (root directory) - ✅ `server.py` - ✅ `tasks.py` - ✅ `requirements.txt` - ✅ `demo.py` - ✅ `Dockerfile` - ✅ `README.md` ### Step 2: Create Hugging Face Spaces 1. Go to [huggingface.co/spaces](https://huggingface.co/spaces) 2. Click **"Create new Space"** 3. Fill in: - **Owner**: Your HF username - **Space name**: `audit-repair-env` (or your choice) - **License**: MIT - **SDK**: Docker ← **IMPORTANT** 4. Click **"Create Space"** ### Step 3: Connect to GitHub (Auto-Deployment) In your **Space Settings**: 1. Go to **Space** → **Settings** (gear icon) 2. Scroll to **"Linked Repository"** 3. Click **"Link a repository"** 4. Select your GitHub repo: `username/audit-repair-env` 5. Choose **"Simple"** or **"Sync"** mode - **Simple**: Manual redeploy via button - **Sync**: Auto-redeploy on GitHub push (recommended) ### Step 4: Set Environment Variables & Secrets In **Space Settings**: 1. Scroll to **"Repository secrets"** 2. Click **"Add secret"** 3. Add: ``` Name: HF_TOKEN Value: hf_your_actual_token_here ``` 4. Add: ``` Name: API_BASE_URL Value: https://router.huggingface.co/v1 ``` 5. Add: ``` Name: MODEL_NAME Value: Qwen/Qwen2.5-72B-Instruct ``` **⚠️ NOTE**: These secrets are only passed to Docker at build-time. If they need to be runtime-only, use the `.dockerfile` method. ### Step 5: Check Logs & Verify Deployment 1. Go to your Space URL: `https://huggingface.co/spaces/username/audit-repair-env` 2. Click **"Logs"** tab to see build output 3. Wait for status: **"Running"** 4. Click the **"App"** link to access your demo --- ## Dockerfile Setup for Spaces Your `Dockerfile` should be: ```dockerfile FROM python:3.10-slim WORKDIR /app # Copy everything COPY . . # Install dependencies RUN pip install --no-cache-dir -r requirements.txt # Expose port for Gradio (or FastAPI) EXPOSE 7860 # Run Gradio demo by default CMD ["python", "demo.py"] ``` **Alternative** (run both server + demo): ```dockerfile FROM python:3.10-slim WORKDIR /app COPY . . RUN pip install --no-cache-dir -r requirements.txt EXPOSE 7860 8000 # Create startup script RUN echo '#!/bin/bash\npython server.py &\npython demo.py' > /app/start.sh RUN chmod +x /app/start.sh CMD ["/app/start.sh"] ``` --- ## Troubleshooting Common Issues ### Issue: "Build Failed" ``` ❌ Docker build failed ``` **Fixes**: 1. Check Logs tab for error messages 2. Verify `requirements.txt` syntax 3. Ensure `Dockerfile` references correct files 4. Check for permission issues ### Issue: "Application Error" on Load ``` ❌ Application Error: Connection refused ``` **Fixes**: 1. Verify app runs on `0.0.0.0:7860` 2. Check environment variables are set 3. Look at Space Logs for exceptions 4. Ensure HF_TOKEN is valid ### Issue: "HF_TOKEN not valid" ``` ❌ Error initializing client: Invalid token ``` **Fixes**: 1. Generate new token at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) 2. Make sure it has API access 3. Update secret in Space Settings 4. Rebuild Space ### Issue: "Model not found" ``` ❌ Error: MODEL_NAME 'Qwen/Qwen2.5-72B-Instruct' not found ``` **Fixes**: 1. Verify model exists on Hugging Face Hub 2. Check if you have access (private models need approval) 3. Use inference API endpoint instead: ``` API_BASE_URL=https://api-inference.huggingface.co/v1 ``` 4. Ensure HF_TOKEN is set ### Issue: "Out of Memory" ``` ❌ Killed due to resource limit ``` **Fixes**: - Free tier is 2 vCPU / 8GB RAM - Reduce model size - Use a smaller LLM (e.g., `mistral-7b`) - Consider upgrading to upgrade (usually not needed) - Optimize inference batch size ### Issue: Space Falls Asleep ``` ⚠️ This space has been sleeping for 48 hours ``` **Explanation**: HF Spaces sleep after inactivity to save resources **Solutions**: 1. Upgrade to paid tier (stays warm) 2. Add uptime monitoring (pings Space regularly) 3. Use HF Pro subscription --- ## Performance Optimization ### For Spaces with Free Tier (2 vCPU, 8GB RAM) **1. Use Quantized Models** ```python # Instead of full precision 72B MODEL_NAME = "Qwen/Qwen2.5-32B-Instruct-GGUF" # Smaller, quantized ``` **2. Cache Client** ```python @cache def get_openai_client(): return OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN) ``` **3. Limit Request Size** ```python MAX_TOKENS = 150 # Reduce from 300 TEMPERATURE = 0.1 # Lower temp = faster convergence ``` **4. Async Requests** (if multiple concurrent users) ```python import asyncio # Use async/await for non-blocking I/O ``` --- ## Real-World Example: Workflow ``` 1. Developer makes changes locally ├─ git commit -am "Fix HF_TOKEN validation" └─ git push origin main 2. GitHub notifies HF Spaces ├─ HF detects push to linked repo └─ Triggers automatic build 3. HF Spaces builds Docker image ├─ Pulls latest code from main branch ├─ Runs: pip install -r requirements.txt ├─ Loads secrets (HF_TOKEN, API_BASE_URL, etc.) └─ Runs: python demo.py 4. Container starts running ├─ Gradio interface initializes on :7860 ├─ FastAPI server (optional) on :8000 └─ Public URL becomes active 5. User accesses Space URL ├─ Browser loads Gradio interface ├─ User selects task (easy/medium/hard) ├─ Clicks "Run Inference" └─ inference.py executes with LLM calls 6. LLM calls routed via: API_BASE_URL (huggingface.co/v1) ↓ HF Token used for authentication ↓ Model (Qwen/Qwen2.5-72B-Instruct) queried ↓ Response returned to inference.py ↓ Results shown in Gradio UI ``` --- ## Security Best Practices ### ✅ DO - Set HF_TOKEN as a **secret** in Space settings - Use `.gitignore` to prevent token from being committed: ``` .env .env.local *.key secrets/ ``` - Validate all user inputs - Use HTTPS (handled by HF automatically) ### ❌ DON'T - Commit API keys to GitHub - Expose secrets in logs - Store sensitive data in code - Leave Space public if handling private data --- ## Next Steps 1. **Verify locally first**: ```bash export HF_TOKEN="your_token" export API_BASE_URL="https://router.huggingface.co/v1" python inference.py # Run submission tests python demo.py # Test Gradio UI ``` 2. **Push to GitHub**: ```bash git add -A git commit -m "Ready for HF Spaces deployment" git push origin main ``` 3. **Create & Link Space**: - Create Space on HF - Link GitHub repo - Set secrets in Settings - Wait for build 4. **Test on Spaces**: - Access public URL - Run test inference - Share link with community --- ## Additional Resources - [Hugging Face Spaces Docs](https://huggingface.co/docs/hub/spaces) - [Docker Spaces Guide](https://huggingface.co/docs/hub/spaces-config-reference#docker) - [Gradio Documentation](https://www.gradio.app/) - [OpenAI Python Client](https://github.com/openai/openai-python) - [HF Inference API Docs](https://huggingface.co/docs/api-inference) --- **Good luck with your submission! 🚀**