aviation-disruption-intelligence / docs /HuggingFace_Deployment_Guide.md
bhanug2026
Initial commit
47c6cfd
# Deploying ADI to Hugging Face Spaces
**ESCP MIM2 — Applied Data Science | April 2026**
---
## Overview
Hugging Face Spaces hosts Streamlit apps for free with no time limits. Your app
reads from committed CSV/pickle files, so the deployed version always reflects
whatever data and models you last pushed. When you want to refresh the data,
you run the pipeline locally and push again.
**Total size:** ~14 MB — well within the free tier limit (50 GB).
---
## One-time Setup
### Step 1 — Install the Hugging Face CLI
```bash
pip install huggingface_hub
```
### Step 2 — Log in with your ESCP HF account
```bash
huggingface-cli login
```
This opens a prompt asking for a token. Go to:
**https://huggingface.co/settings/tokens** → New token → Role: **Write** → copy and paste it.
### Step 3 — Create your Space on the HF website
1. Go to **https://huggingface.co/new-space**
2. Fill in:
- **Space name:** `aviation-disruption-intelligence` (or any name you like)
- **License:** MIT
- **SDK:** Streamlit ← important
- **Visibility:** Public (or Private if you prefer)
3. Click **Create Space** — HF will create an empty Git repo for you.
### Step 4 — Add HF as a second git remote
In your local project folder (where the code lives), run:
```bash
# Replace YOUR_HF_USERNAME with your actual HF username
git remote add space https://huggingface.co/spaces/YOUR_HF_USERNAME/aviation-disruption-intelligence
```
You now have two remotes:
- `origin` → your GitHub repo
- `space` → Hugging Face Space
---
## Pushing the App (First Deploy)
Run these commands from the project root:
```bash
# 1. Make sure you are on main branch
git checkout main
# 2. Stage everything, including the model .pkl files
# (they were previously gitignored — now they are allowed)
git add src/models/saved/
git add data/base/
git add outputs/figures/
git add README.md requirements.txt .gitignore
git add src/ config/ pipeline/
# 3. Commit
git commit -m "chore: prepare repo for Hugging Face Spaces deployment"
# 4. Push to GitHub (your normal remote)
git push origin main
# 5. Push to HF Space
git push space main
```
HF will automatically detect `sdk: streamlit` in the README frontmatter and
`app_file: src/app/app.py`, install `requirements.txt`, and start the app.
Build takes ~2–4 minutes. You can watch the build logs live on your Space page.
---
## Verifying the Deployment
Once the build is green, visit:
```
https://huggingface.co/spaces/YOUR_HF_USERNAME/aviation-disruption-intelligence
```
The app should load showing all 11 tabs. If you see an error, click
**"View Logs"** on the Space page — the most common issues are:
| Error | Fix |
|-------|-----|
| `ModuleNotFoundError: shap` | requirements.txt already includes it; wait for rebuild |
| `FileNotFoundError: best_classifier.pkl` | Run Step 2 of this guide again — pkl files were not committed |
| `No module named src` | The app uses `PROJECT_ROOT = Path(__file__).parent.parent.parent` — this resolves correctly as long as `app_file: src/app/app.py` is set in README |
---
## Updating Data (Keeping It Fresh)
HF Spaces does not run your pipeline automatically on the free tier. The
workflow is:
**Run locally → commit → push to HF.**
```bash
# 1. Run the full pipeline locally to refresh data + retrain models
cd your-project-folder
python pipeline/run_pipeline.py
# 2. Stage the updated files
git add data/base/
git add src/models/saved/
git add outputs/
# 3. Commit with a timestamp
git commit -m "data: pipeline run $(date '+%Y-%m-%d %H:%M')"
# 4. Push to both remotes
git push origin main
git push space main
```
HF automatically rebuilds and restarts the app when it detects a new push.
The rebuild usually takes under 60 seconds for a data-only update.
---
## Setting API Keys / Secrets (Optional)
If you want the live ingestion features (yfinance, GDELT, SerpAPI) to work
from within the Space itself, add your keys as HF Secrets instead of a `.env`
file:
1. Go to your Space → **Settings****Repository secrets**
2. Add:
- `SERPAPI_KEY` — your SerpAPI key
- Any other keys from your `.env`
In `app.py` / pipeline scripts, these are already loaded via `python-dotenv`
and `os.environ`, so no code changes are needed.
---
## Automating Pipeline Runs with GitHub Actions (Optional)
If you want the data to refresh automatically every day without touching your
laptop, you can use a free GitHub Actions workflow that runs the pipeline and
pushes to both GitHub and HF.
Create `.github/workflows/daily_pipeline.yml`:
```yaml
name: Daily Pipeline
on:
schedule:
- cron: "0 6 * * *" # 6 AM UTC every day
workflow_dispatch: # also allows manual trigger from GitHub UI
jobs:
run-pipeline:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.11"
- name: Install dependencies
run: pip install -r requirements.txt
- name: Run pipeline
env:
SERPAPI_KEY: ${{ secrets.SERPAPI_KEY }}
run: python pipeline/run_pipeline.py
- name: Commit updated data
run: |
git config user.email "action@github.com"
git config user.name "GitHub Action"
git add data/base/ src/models/saved/ outputs/
git diff --staged --quiet || git commit -m "data: automated pipeline run $(date '+%Y-%m-%d')"
git push origin main
- name: Push to HF Space
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
run: |
git remote add space https://YOUR_HF_USERNAME:$HF_TOKEN@huggingface.co/spaces/YOUR_HF_USERNAME/aviation-disruption-intelligence
git push space main
```
Add `HF_TOKEN` (your HF write token) and `SERPAPI_KEY` to **GitHub repo
Settings → Secrets and variables → Actions**.
---
## Summary of What Gets Deployed
| What | Where in repo | Notes |
|------|--------------|-------|
| App code | `src/app/app.py` | Entry point HF runs |
| Trained models | `src/models/saved/*.pkl` | Committed (gitignore updated) |
| Base data | `data/base/*.csv` | All CSVs committed |
| Pre-generated figures | `outputs/figures/*.png` | SHAP beeswarms etc |
| Config | `config/settings.py` | Paths resolve relative to file location |
| Pipeline scripts | `pipeline/` | Not run on HF, but available |
**Not deployed (not needed for the app):**
- `data/raw/` — ingestion snapshots, gitignored
- `.env` — use HF Secrets instead
- `.docx` / `.pptx` — excluded via gitignore
---
*Group 3 — ESCP MIM2 Applied Data Science — April 2026*