# Deploying ADI to Hugging Face Spaces

**ESCP MIM2 — Applied Data Science | April 2026**

---

## Overview

Hugging Face Spaces hosts Streamlit apps for free with no time limits. Your app
reads from committed CSV/pickle files, so the deployed version always reflects
whatever data and models you last pushed. When you want to refresh the data,
you run the pipeline locally and push again.

**Total size:** ~14 MB — well within the free tier limit (50 GB).

---

## One-time Setup

### Step 1 — Install the Hugging Face CLI

```bash
pip install huggingface_hub
```

### Step 2 — Log in with your ESCP HF account

```bash
huggingface-cli login
```

This opens a prompt asking for a token. Go to:
**https://huggingface.co/settings/tokens** → New token → Role: **Write** → copy and paste it.

### Step 3 — Create your Space on the HF website

1. Go to **https://huggingface.co/new-space**
2. Fill in:
   - **Space name:** `aviation-disruption-intelligence` (or any name you like)
   - **License:** MIT
   - **SDK:** Streamlit ← important
   - **Visibility:** Public (or Private if you prefer)
3. Click **Create Space** — HF will create an empty Git repo for you.

### Step 4 — Add HF as a second git remote

In your local project folder (where the code lives), run:

```bash
# Replace YOUR_HF_USERNAME with your actual HF username
git remote add space https://huggingface.co/spaces/YOUR_HF_USERNAME/aviation-disruption-intelligence
```

You now have two remotes:
- `origin` → your GitHub repo
- `space`  → Hugging Face Space

---

## Pushing the App (First Deploy)

Run these commands from the project root:

```bash
# 1. Make sure you are on main branch
git checkout main

# 2. Stage everything, including the model .pkl files
#    (they were previously gitignored — now they are allowed)
git add src/models/saved/
git add data/base/
git add outputs/figures/
git add README.md requirements.txt .gitignore
git add src/ config/ pipeline/

# 3. Commit
git commit -m "chore: prepare repo for Hugging Face Spaces deployment"

# 4. Push to GitHub (your normal remote)
git push origin main

# 5. Push to HF Space
git push space main
```

HF will automatically detect `sdk: streamlit` in the README frontmatter and
`app_file: src/app/app.py`, install `requirements.txt`, and start the app.
Build takes ~2–4 minutes. You can watch the build logs live on your Space page.

---

## Verifying the Deployment

Once the build is green, visit:
```
https://huggingface.co/spaces/YOUR_HF_USERNAME/aviation-disruption-intelligence
```

The app should load showing all 11 tabs. If you see an error, click
**"View Logs"** on the Space page — the most common issues are:

| Error | Fix |
|-------|-----|
| `ModuleNotFoundError: shap` | requirements.txt already includes it; wait for rebuild |
| `FileNotFoundError: best_classifier.pkl` | Run Step 2 of this guide again — pkl files were not committed |
| `No module named src` | The app uses `PROJECT_ROOT = Path(__file__).parent.parent.parent` — this resolves correctly as long as `app_file: src/app/app.py` is set in README |

---

## Updating Data (Keeping It Fresh)

HF Spaces does not run your pipeline automatically on the free tier. The
workflow is:

**Run locally → commit → push to HF.**

```bash
# 1. Run the full pipeline locally to refresh data + retrain models
cd your-project-folder
python pipeline/run_pipeline.py

# 2. Stage the updated files
git add data/base/
git add src/models/saved/
git add outputs/

# 3. Commit with a timestamp
git commit -m "data: pipeline run $(date '+%Y-%m-%d %H:%M')"

# 4. Push to both remotes
git push origin main
git push space main
```

HF automatically rebuilds and restarts the app when it detects a new push.
The rebuild usually takes under 60 seconds for a data-only update.

---

## Setting API Keys / Secrets (Optional)

If you want the live ingestion features (yfinance, GDELT, SerpAPI) to work
from within the Space itself, add your keys as HF Secrets instead of a `.env`
file:

1. Go to your Space → **Settings** → **Repository secrets**
2. Add:
   - `SERPAPI_KEY` — your SerpAPI key
   - Any other keys from your `.env`

In `app.py` / pipeline scripts, these are already loaded via `python-dotenv`
and `os.environ`, so no code changes are needed.

---

## Automating Pipeline Runs with GitHub Actions (Optional)

If you want the data to refresh automatically every day without touching your
laptop, you can use a free GitHub Actions workflow that runs the pipeline and
pushes to both GitHub and HF.

Create `.github/workflows/daily_pipeline.yml`:

```yaml
name: Daily Pipeline

on:
  schedule:
    - cron: "0 6 * * *"   # 6 AM UTC every day
  workflow_dispatch:        # also allows manual trigger from GitHub UI

jobs:
  run-pipeline:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: "3.11"

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Run pipeline
        env:
          SERPAPI_KEY: ${{ secrets.SERPAPI_KEY }}
        run: python pipeline/run_pipeline.py

      - name: Commit updated data
        run: |
          git config user.email "action@github.com"
          git config user.name "GitHub Action"
          git add data/base/ src/models/saved/ outputs/
          git diff --staged --quiet || git commit -m "data: automated pipeline run $(date '+%Y-%m-%d')"
          git push origin main

      - name: Push to HF Space
        env:
          HF_TOKEN: ${{ secrets.HF_TOKEN }}
        run: |
          git remote add space https://YOUR_HF_USERNAME:$HF_TOKEN@huggingface.co/spaces/YOUR_HF_USERNAME/aviation-disruption-intelligence
          git push space main
```

Add `HF_TOKEN` (your HF write token) and `SERPAPI_KEY` to **GitHub repo
Settings → Secrets and variables → Actions**.

---

## Summary of What Gets Deployed

| What | Where in repo | Notes |
|------|--------------|-------|
| App code | `src/app/app.py` | Entry point HF runs |
| Trained models | `src/models/saved/*.pkl` | Committed (gitignore updated) |
| Base data | `data/base/*.csv` | All CSVs committed |
| Pre-generated figures | `outputs/figures/*.png` | SHAP beeswarms etc |
| Config | `config/settings.py` | Paths resolve relative to file location |
| Pipeline scripts | `pipeline/` | Not run on HF, but available |

**Not deployed (not needed for the app):**
- `data/raw/` — ingestion snapshots, gitignored
- `.env` — use HF Secrets instead
- `.docx` / `.pptx` — excluded via gitignore

---

*Group 3 — ESCP MIM2 Applied Data Science — April 2026*