File size: 6,658 Bytes
47c6cfd | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 | # Deploying ADI to Hugging Face Spaces
**ESCP MIM2 — Applied Data Science | April 2026**
---
## Overview
Hugging Face Spaces hosts Streamlit apps for free with no time limits. Your app
reads from committed CSV/pickle files, so the deployed version always reflects
whatever data and models you last pushed. When you want to refresh the data,
you run the pipeline locally and push again.
**Total size:** ~14 MB — well within the free tier limit (50 GB).
---
## One-time Setup
### Step 1 — Install the Hugging Face CLI
```bash
pip install huggingface_hub
```
### Step 2 — Log in with your ESCP HF account
```bash
huggingface-cli login
```
This opens a prompt asking for a token. Go to:
**https://huggingface.co/settings/tokens** → New token → Role: **Write** → copy and paste it.
### Step 3 — Create your Space on the HF website
1. Go to **https://huggingface.co/new-space**
2. Fill in:
- **Space name:** `aviation-disruption-intelligence` (or any name you like)
- **License:** MIT
- **SDK:** Streamlit ← important
- **Visibility:** Public (or Private if you prefer)
3. Click **Create Space** — HF will create an empty Git repo for you.
### Step 4 — Add HF as a second git remote
In your local project folder (where the code lives), run:
```bash
# Replace YOUR_HF_USERNAME with your actual HF username
git remote add space https://huggingface.co/spaces/YOUR_HF_USERNAME/aviation-disruption-intelligence
```
You now have two remotes:
- `origin` → your GitHub repo
- `space` → Hugging Face Space
---
## Pushing the App (First Deploy)
Run these commands from the project root:
```bash
# 1. Make sure you are on main branch
git checkout main
# 2. Stage everything, including the model .pkl files
# (they were previously gitignored — now they are allowed)
git add src/models/saved/
git add data/base/
git add outputs/figures/
git add README.md requirements.txt .gitignore
git add src/ config/ pipeline/
# 3. Commit
git commit -m "chore: prepare repo for Hugging Face Spaces deployment"
# 4. Push to GitHub (your normal remote)
git push origin main
# 5. Push to HF Space
git push space main
```
HF will automatically detect `sdk: streamlit` in the README frontmatter and
`app_file: src/app/app.py`, install `requirements.txt`, and start the app.
Build takes ~2–4 minutes. You can watch the build logs live on your Space page.
---
## Verifying the Deployment
Once the build is green, visit:
```
https://huggingface.co/spaces/YOUR_HF_USERNAME/aviation-disruption-intelligence
```
The app should load showing all 11 tabs. If you see an error, click
**"View Logs"** on the Space page — the most common issues are:
| Error | Fix |
|-------|-----|
| `ModuleNotFoundError: shap` | requirements.txt already includes it; wait for rebuild |
| `FileNotFoundError: best_classifier.pkl` | Run Step 2 of this guide again — pkl files were not committed |
| `No module named src` | The app uses `PROJECT_ROOT = Path(__file__).parent.parent.parent` — this resolves correctly as long as `app_file: src/app/app.py` is set in README |
---
## Updating Data (Keeping It Fresh)
HF Spaces does not run your pipeline automatically on the free tier. The
workflow is:
**Run locally → commit → push to HF.**
```bash
# 1. Run the full pipeline locally to refresh data + retrain models
cd your-project-folder
python pipeline/run_pipeline.py
# 2. Stage the updated files
git add data/base/
git add src/models/saved/
git add outputs/
# 3. Commit with a timestamp
git commit -m "data: pipeline run $(date '+%Y-%m-%d %H:%M')"
# 4. Push to both remotes
git push origin main
git push space main
```
HF automatically rebuilds and restarts the app when it detects a new push.
The rebuild usually takes under 60 seconds for a data-only update.
---
## Setting API Keys / Secrets (Optional)
If you want the live ingestion features (yfinance, GDELT, SerpAPI) to work
from within the Space itself, add your keys as HF Secrets instead of a `.env`
file:
1. Go to your Space → **Settings** → **Repository secrets**
2. Add:
- `SERPAPI_KEY` — your SerpAPI key
- Any other keys from your `.env`
In `app.py` / pipeline scripts, these are already loaded via `python-dotenv`
and `os.environ`, so no code changes are needed.
---
## Automating Pipeline Runs with GitHub Actions (Optional)
If you want the data to refresh automatically every day without touching your
laptop, you can use a free GitHub Actions workflow that runs the pipeline and
pushes to both GitHub and HF.
Create `.github/workflows/daily_pipeline.yml`:
```yaml
name: Daily Pipeline
on:
schedule:
- cron: "0 6 * * *" # 6 AM UTC every day
workflow_dispatch: # also allows manual trigger from GitHub UI
jobs:
run-pipeline:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.11"
- name: Install dependencies
run: pip install -r requirements.txt
- name: Run pipeline
env:
SERPAPI_KEY: ${{ secrets.SERPAPI_KEY }}
run: python pipeline/run_pipeline.py
- name: Commit updated data
run: |
git config user.email "action@github.com"
git config user.name "GitHub Action"
git add data/base/ src/models/saved/ outputs/
git diff --staged --quiet || git commit -m "data: automated pipeline run $(date '+%Y-%m-%d')"
git push origin main
- name: Push to HF Space
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
run: |
git remote add space https://YOUR_HF_USERNAME:$HF_TOKEN@huggingface.co/spaces/YOUR_HF_USERNAME/aviation-disruption-intelligence
git push space main
```
Add `HF_TOKEN` (your HF write token) and `SERPAPI_KEY` to **GitHub repo
Settings → Secrets and variables → Actions**.
---
## Summary of What Gets Deployed
| What | Where in repo | Notes |
|------|--------------|-------|
| App code | `src/app/app.py` | Entry point HF runs |
| Trained models | `src/models/saved/*.pkl` | Committed (gitignore updated) |
| Base data | `data/base/*.csv` | All CSVs committed |
| Pre-generated figures | `outputs/figures/*.png` | SHAP beeswarms etc |
| Config | `config/settings.py` | Paths resolve relative to file location |
| Pipeline scripts | `pipeline/` | Not run on HF, but available |
**Not deployed (not needed for the app):**
- `data/raw/` — ingestion snapshots, gitignored
- `.env` — use HF Secrets instead
- `.docx` / `.pptx` — excluded via gitignore
---
*Group 3 — ESCP MIM2 Applied Data Science — April 2026*
|