File size: 10,550 Bytes

4702dbb

# Hugging Face Spaces Deployment Guide

## What is Hugging Face Spaces?

**Hugging Face Spaces** is a free hosting platform for machine learning demos and applications. It allows you to:

- ✅ Deploy web apps for free (with resource limits)
- ✅ Set environment variables and secrets securely
- ✅ Use Docker for full customization
- ✅ Get a public URL accessible worldwide
- ✅ Integrate with GitHub for continuous deployment

### Key Features
- **Free tier**: 2 vCPU, 8GB RAM per Space
- **Public/Private**: Choose visibility level
- **Auto-builds**: Redeploy on GitHub push (with GitHub integration)
- **Secrets management**: Store API tokens securely
- **Multiple SDK support**: Gradio, Streamlit, Docker, Python

---

## How Does Hugging Face Spaces Work?

### 1. **Creation Phase**
You create a new Space and choose an SDK (Gradio, Streamlit, Docker, etc.)

```

┌─────────────────────────────────────────┐

│  Hugging Face Spaces Dashboard          │

│  ├─ Create New Space                    │

│  ├─ Choose SDK: Docker ← [We use this] │

│  ├─ Set Name: audit-repair-env          │

│  ├─ Set License: MIT                    │

│  └─ Create                              │

└─────────────────────────────────────────┘

```

### 2. **Build Phase**
HF Spaces pulls your code (from GitHub) and builds a Docker image

```

GitHub Repo              Hugging Face Spaces

    │                           │

    ├─ Dockerfile     ────→    Build Server

    ├─ requirements.txt        │

    ├─ inference.py      Builds Docker Image

    ├─ server.py         Creates Container

    └─ demo.py           Allocates Resources

                         │

                      Pushes to Registry

```

### 3. **Runtime Phase**
The container runs on HF's infrastructure with:
- Assigned vCPU/RAM
- Public HTTP endpoint
- Environment variables & secrets

```

Public URL

    │

    ├─ https://huggingface.co/spaces/username/audit-repair-env

    │

    ├─ Routes to Container

    │     ├─ :7860 (Gradio Demo)

    │     └─ :8000 (FastAPI Server - optional)

    │

    └─ Processes Requests

        ├─ Receives HTTP request

        ├─ Runs inference.py / demo.py

        └─ Returns response

```

### 4. **Lifecycle**
- **Sleeping**: Space goes to sleep after 48 hours of inactivity
- **Paused**: You can manually pause spaces
- **Running**: Active and processing requests
- **Error**: Logs visible in Space page

---

## Step-by-Step Deployment

### Step 1: Prepare Your GitHub Repository

**Requirement**: Public GitHub repo with your code

```bash

git init

git add .

git commit -m "Initial commit"

git remote add origin https://github.com/YOUR_USERNAME/audit-repair-env.git

git branch -M main

git push -u origin main

```

**File checklist**:
- ✅ `inference.py` (root directory)
- ✅ `server.py`
- ✅ `tasks.py`
- ✅ `requirements.txt`
- ✅ `demo.py`
- ✅ `Dockerfile`
- ✅ `README.md`

### Step 2: Create Hugging Face Spaces

1. Go to [huggingface.co/spaces](https://huggingface.co/spaces)
2. Click **"Create new Space"**
3. Fill in:
   - **Owner**: Your HF username
   - **Space name**: `audit-repair-env` (or your choice)
   - **License**: MIT
   - **SDK**: Docker ← **IMPORTANT**
4. Click **"Create Space"**

### Step 3: Connect to GitHub (Auto-Deployment)

In your **Space Settings**:

1. Go to **Space** → **Settings** (gear icon)
2. Scroll to **"Linked Repository"**
3. Click **"Link a repository"**
4. Select your GitHub repo: `username/audit-repair-env`
5. Choose **"Simple"** or **"Sync"** mode
   - **Simple**: Manual redeploy via button
   - **Sync**: Auto-redeploy on GitHub push (recommended)

### Step 4: Set Environment Variables & Secrets

In **Space Settings**:

1. Scroll to **"Repository secrets"**
2. Click **"Add secret"**
3. Add:
   ```

   Name: HF_TOKEN

   Value: hf_your_actual_token_here

   ```

4. Add:
   ```

   Name: API_BASE_URL

   Value: https://router.huggingface.co/v1

   ```

5. Add:
   ```

   Name: MODEL_NAME

   Value: Qwen/Qwen2.5-72B-Instruct

   ```

**⚠️ NOTE**: These secrets are only passed to Docker at build-time. If they need to be runtime-only, use the `.dockerfile` method.

### Step 5: Check Logs & Verify Deployment

1. Go to your Space URL: `https://huggingface.co/spaces/username/audit-repair-env`
2. Click **"Logs"** tab to see build output
3. Wait for status: **"Running"**
4. Click the **"App"** link to access your demo

---

## Dockerfile Setup for Spaces

Your `Dockerfile` should be:

```dockerfile

FROM python:3.10-slim



WORKDIR /app



# Copy everything

COPY . .



# Install dependencies

RUN pip install --no-cache-dir -r requirements.txt



# Expose port for Gradio (or FastAPI)

EXPOSE 7860



# Run Gradio demo by default

CMD ["python", "demo.py"]

```

**Alternative** (run both server + demo):
```dockerfile

FROM python:3.10-slim



WORKDIR /app

COPY . .

RUN pip install --no-cache-dir -r requirements.txt



EXPOSE 7860 8000



# Create startup script

RUN echo '#!/bin/bash\npython server.py &\npython demo.py' > /app/start.sh

RUN chmod +x /app/start.sh



CMD ["/app/start.sh"]

```

---

## Troubleshooting Common Issues

### Issue: "Build Failed"
```

❌ Docker build failed

```

**Fixes**:
1. Check Logs tab for error messages
2. Verify `requirements.txt` syntax
3. Ensure `Dockerfile` references correct files
4. Check for permission issues

### Issue: "Application Error" on Load
```

❌ Application Error: Connection refused

```

**Fixes**:
1. Verify app runs on `0.0.0.0:7860`
2. Check environment variables are set
3. Look at Space Logs for exceptions
4. Ensure HF_TOKEN is valid



### Issue: "HF_TOKEN not valid"
```

❌ Error initializing client: Invalid token

```

**Fixes**:
1. Generate new token at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
2. Make sure it has API access
3. Update secret in Space Settings
4. Rebuild Space

### Issue: "Model not found"
```

❌ Error: MODEL_NAME 'Qwen/Qwen2.5-72B-Instruct' not found

```

**Fixes**:
1. Verify model exists on Hugging Face Hub
2. Check if you have access (private models need approval)
3. Use inference API endpoint instead:
   ```

   API_BASE_URL=https://api-inference.huggingface.co/v1

   ```
4. Ensure HF_TOKEN is set



### Issue: "Out of Memory"

```

❌ Killed due to resource limit

```



**Fixes**:

- Free tier is 2 vCPU / 8GB RAM

- Reduce model size

- Use a smaller LLM (e.g., `mistral-7b`)

- Consider upgrading to upgrade (usually not needed)

- Optimize inference batch size



### Issue: Space Falls Asleep

```

⚠️ This space has been sleeping for 48 hours

```



**Explanation**: HF Spaces sleep after inactivity to save resources



**Solutions**:

1. Upgrade to paid tier (stays warm)

2. Add uptime monitoring (pings Space regularly)

3. Use HF Pro subscription



---



## Performance Optimization



### For Spaces with Free Tier (2 vCPU, 8GB RAM)



**1. Use Quantized Models**

```python

# Instead of full precision 72B

MODEL_NAME = "Qwen/Qwen2.5-32B-Instruct-GGUF"  # Smaller, quantized
```



**2. Cache Client**

```python

@cache

def get_openai_client():

    return OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN)

```

**3. Limit Request Size**
```python

MAX_TOKENS = 150  # Reduce from 300

TEMPERATURE = 0.1  # Lower temp = faster convergence

```

**4. Async Requests** (if multiple concurrent users)
```python

import asyncio

# Use async/await for non-blocking I/O

```

---

## Real-World Example: Workflow

```

1. Developer makes changes locally

   ├─ git commit -am "Fix HF_TOKEN validation"

   └─ git push origin main



2. GitHub notifies HF Spaces

   ├─ HF detects push to linked repo

   └─ Triggers automatic build



3. HF Spaces builds Docker image

   ├─ Pulls latest code from main branch

   ├─ Runs: pip install -r requirements.txt

   ├─ Loads secrets (HF_TOKEN, API_BASE_URL, etc.)

   └─ Runs: python demo.py



4. Container starts running

   ├─ Gradio interface initializes on :7860

   ├─ FastAPI server (optional) on :8000

   └─ Public URL becomes active



5. User accesses Space URL

   ├─ Browser loads Gradio interface

   ├─ User selects task (easy/medium/hard)

   ├─ Clicks "Run Inference"

   └─ inference.py executes with LLM calls



6. LLM calls routed via:

   API_BASE_URL (huggingface.co/v1)

       ↓

   HF Token used for authentication

       ↓

   Model (Qwen/Qwen2.5-72B-Instruct) queried

       ↓

   Response returned to inference.py

       ↓

   Results shown in Gradio UI

```

---

## Security Best Practices

### ✅ DO

- Set HF_TOKEN as a **secret** in Space settings

- Use `.gitignore` to prevent token from being committed:

  ```

  .env

  .env.local

  *.key

  secrets/

  ```

- Validate all user inputs

- Use HTTPS (handled by HF automatically)



### ❌ DON'T



- Commit API keys to GitHub

- Expose secrets in logs

- Store sensitive data in code

- Leave Space public if handling private data



---



## Next Steps



1. **Verify locally first**:

   ```bash

   export HF_TOKEN="your_token"

   export API_BASE_URL="https://router.huggingface.co/v1"

   python inference.py  # Run submission tests

   python demo.py       # Test Gradio UI

   ```



2. **Push to GitHub**:

   ```bash

   git add -A

   git commit -m "Ready for HF Spaces deployment"

   git push origin main

   ```



3. **Create & Link Space**:

   - Create Space on HF

   - Link GitHub repo

   - Set secrets in Settings

   - Wait for build



4. **Test on Spaces**:

   - Access public URL

   - Run test inference

   - Share link with community



---



## Additional Resources



- [Hugging Face Spaces Docs](https://huggingface.co/docs/hub/spaces)

- [Docker Spaces Guide](https://huggingface.co/docs/hub/spaces-config-reference#docker)

- [Gradio Documentation](https://www.gradio.app/)

- [OpenAI Python Client](https://github.com/openai/openai-python)

- [HF Inference API Docs](https://huggingface.co/docs/api-inference)



---



**Good luck with your submission! 🚀**