File size: 6,658 Bytes
47c6cfd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
# Deploying ADI to Hugging Face Spaces

**ESCP MIM2 — Applied Data Science | April 2026**

---

## Overview

Hugging Face Spaces hosts Streamlit apps for free with no time limits. Your app
reads from committed CSV/pickle files, so the deployed version always reflects
whatever data and models you last pushed. When you want to refresh the data,
you run the pipeline locally and push again.

**Total size:** ~14 MB — well within the free tier limit (50 GB).

---

## One-time Setup

### Step 1 — Install the Hugging Face CLI

```bash
pip install huggingface_hub
```

### Step 2 — Log in with your ESCP HF account

```bash
huggingface-cli login
```

This opens a prompt asking for a token. Go to:
**https://huggingface.co/settings/tokens** → New token → Role: **Write** → copy and paste it.

### Step 3 — Create your Space on the HF website

1. Go to **https://huggingface.co/new-space**
2. Fill in:
   - **Space name:** `aviation-disruption-intelligence` (or any name you like)
   - **License:** MIT
   - **SDK:** Streamlit ← important
   - **Visibility:** Public (or Private if you prefer)
3. Click **Create Space** — HF will create an empty Git repo for you.

### Step 4 — Add HF as a second git remote

In your local project folder (where the code lives), run:

```bash
# Replace YOUR_HF_USERNAME with your actual HF username
git remote add space https://huggingface.co/spaces/YOUR_HF_USERNAME/aviation-disruption-intelligence
```

You now have two remotes:
- `origin` → your GitHub repo
- `space`  → Hugging Face Space

---

## Pushing the App (First Deploy)

Run these commands from the project root:

```bash
# 1. Make sure you are on main branch
git checkout main

# 2. Stage everything, including the model .pkl files
#    (they were previously gitignored — now they are allowed)
git add src/models/saved/
git add data/base/
git add outputs/figures/
git add README.md requirements.txt .gitignore
git add src/ config/ pipeline/

# 3. Commit
git commit -m "chore: prepare repo for Hugging Face Spaces deployment"

# 4. Push to GitHub (your normal remote)
git push origin main

# 5. Push to HF Space
git push space main
```

HF will automatically detect `sdk: streamlit` in the README frontmatter and
`app_file: src/app/app.py`, install `requirements.txt`, and start the app.
Build takes ~2–4 minutes. You can watch the build logs live on your Space page.

---

## Verifying the Deployment

Once the build is green, visit:
```
https://huggingface.co/spaces/YOUR_HF_USERNAME/aviation-disruption-intelligence
```

The app should load showing all 11 tabs. If you see an error, click
**"View Logs"** on the Space page — the most common issues are:

| Error | Fix |
|-------|-----|
| `ModuleNotFoundError: shap` | requirements.txt already includes it; wait for rebuild |
| `FileNotFoundError: best_classifier.pkl` | Run Step 2 of this guide again — pkl files were not committed |
| `No module named src` | The app uses `PROJECT_ROOT = Path(__file__).parent.parent.parent` — this resolves correctly as long as `app_file: src/app/app.py` is set in README |

---

## Updating Data (Keeping It Fresh)

HF Spaces does not run your pipeline automatically on the free tier. The
workflow is:

**Run locally → commit → push to HF.**

```bash
# 1. Run the full pipeline locally to refresh data + retrain models
cd your-project-folder
python pipeline/run_pipeline.py

# 2. Stage the updated files
git add data/base/
git add src/models/saved/
git add outputs/

# 3. Commit with a timestamp
git commit -m "data: pipeline run $(date '+%Y-%m-%d %H:%M')"

# 4. Push to both remotes
git push origin main
git push space main
```

HF automatically rebuilds and restarts the app when it detects a new push.
The rebuild usually takes under 60 seconds for a data-only update.

---

## Setting API Keys / Secrets (Optional)

If you want the live ingestion features (yfinance, GDELT, SerpAPI) to work
from within the Space itself, add your keys as HF Secrets instead of a `.env`
file:

1. Go to your Space → **Settings****Repository secrets**
2. Add:
   - `SERPAPI_KEY` — your SerpAPI key
   - Any other keys from your `.env`

In `app.py` / pipeline scripts, these are already loaded via `python-dotenv`
and `os.environ`, so no code changes are needed.

---

## Automating Pipeline Runs with GitHub Actions (Optional)

If you want the data to refresh automatically every day without touching your
laptop, you can use a free GitHub Actions workflow that runs the pipeline and
pushes to both GitHub and HF.

Create `.github/workflows/daily_pipeline.yml`:

```yaml
name: Daily Pipeline

on:
  schedule:
    - cron: "0 6 * * *"   # 6 AM UTC every day
  workflow_dispatch:        # also allows manual trigger from GitHub UI

jobs:
  run-pipeline:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: "3.11"

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Run pipeline
        env:
          SERPAPI_KEY: ${{ secrets.SERPAPI_KEY }}
        run: python pipeline/run_pipeline.py

      - name: Commit updated data
        run: |
          git config user.email "action@github.com"
          git config user.name "GitHub Action"
          git add data/base/ src/models/saved/ outputs/
          git diff --staged --quiet || git commit -m "data: automated pipeline run $(date '+%Y-%m-%d')"
          git push origin main

      - name: Push to HF Space
        env:
          HF_TOKEN: ${{ secrets.HF_TOKEN }}
        run: |
          git remote add space https://YOUR_HF_USERNAME:$HF_TOKEN@huggingface.co/spaces/YOUR_HF_USERNAME/aviation-disruption-intelligence
          git push space main
```

Add `HF_TOKEN` (your HF write token) and `SERPAPI_KEY` to **GitHub repo
Settings → Secrets and variables → Actions**.

---

## Summary of What Gets Deployed

| What | Where in repo | Notes |
|------|--------------|-------|
| App code | `src/app/app.py` | Entry point HF runs |
| Trained models | `src/models/saved/*.pkl` | Committed (gitignore updated) |
| Base data | `data/base/*.csv` | All CSVs committed |
| Pre-generated figures | `outputs/figures/*.png` | SHAP beeswarms etc |
| Config | `config/settings.py` | Paths resolve relative to file location |
| Pipeline scripts | `pipeline/` | Not run on HF, but available |

**Not deployed (not needed for the app):**
- `data/raw/` — ingestion snapshots, gitignored
- `.env` — use HF Secrets instead
- `.docx` / `.pptx` — excluded via gitignore

---

*Group 3 — ESCP MIM2 Applied Data Science — April 2026*