Deploying ADI to Hugging Face Spaces
ESCP MIM2 — Applied Data Science | April 2026
Overview
Hugging Face Spaces hosts Streamlit apps for free with no time limits. Your app reads from committed CSV/pickle files, so the deployed version always reflects whatever data and models you last pushed. When you want to refresh the data, you run the pipeline locally and push again.
Total size: ~14 MB — well within the free tier limit (50 GB).
One-time Setup
Step 1 — Install the Hugging Face CLI
pip install huggingface_hub
Step 2 — Log in with your ESCP HF account
huggingface-cli login
This opens a prompt asking for a token. Go to: https://huggingface.co/settings/tokens → New token → Role: Write → copy and paste it.
Step 3 — Create your Space on the HF website
- Go to https://huggingface.co/new-space
- Fill in:
- Space name:
aviation-disruption-intelligence(or any name you like) - License: MIT
- SDK: Streamlit ← important
- Visibility: Public (or Private if you prefer)
- Space name:
- Click Create Space — HF will create an empty Git repo for you.
Step 4 — Add HF as a second git remote
In your local project folder (where the code lives), run:
# Replace YOUR_HF_USERNAME with your actual HF username
git remote add space https://huggingface.co/spaces/YOUR_HF_USERNAME/aviation-disruption-intelligence
You now have two remotes:
origin→ your GitHub repospace→ Hugging Face Space
Pushing the App (First Deploy)
Run these commands from the project root:
# 1. Make sure you are on main branch
git checkout main
# 2. Stage everything, including the model .pkl files
# (they were previously gitignored — now they are allowed)
git add src/models/saved/
git add data/base/
git add outputs/figures/
git add README.md requirements.txt .gitignore
git add src/ config/ pipeline/
# 3. Commit
git commit -m "chore: prepare repo for Hugging Face Spaces deployment"
# 4. Push to GitHub (your normal remote)
git push origin main
# 5. Push to HF Space
git push space main
HF will automatically detect sdk: streamlit in the README frontmatter and
app_file: src/app/app.py, install requirements.txt, and start the app.
Build takes ~2–4 minutes. You can watch the build logs live on your Space page.
Verifying the Deployment
Once the build is green, visit:
https://huggingface.co/spaces/YOUR_HF_USERNAME/aviation-disruption-intelligence
The app should load showing all 11 tabs. If you see an error, click "View Logs" on the Space page — the most common issues are:
| Error | Fix |
|---|---|
ModuleNotFoundError: shap |
requirements.txt already includes it; wait for rebuild |
FileNotFoundError: best_classifier.pkl |
Run Step 2 of this guide again — pkl files were not committed |
No module named src |
The app uses PROJECT_ROOT = Path(__file__).parent.parent.parent — this resolves correctly as long as app_file: src/app/app.py is set in README |
Updating Data (Keeping It Fresh)
HF Spaces does not run your pipeline automatically on the free tier. The workflow is:
Run locally → commit → push to HF.
# 1. Run the full pipeline locally to refresh data + retrain models
cd your-project-folder
python pipeline/run_pipeline.py
# 2. Stage the updated files
git add data/base/
git add src/models/saved/
git add outputs/
# 3. Commit with a timestamp
git commit -m "data: pipeline run $(date '+%Y-%m-%d %H:%M')"
# 4. Push to both remotes
git push origin main
git push space main
HF automatically rebuilds and restarts the app when it detects a new push. The rebuild usually takes under 60 seconds for a data-only update.
Setting API Keys / Secrets (Optional)
If you want the live ingestion features (yfinance, GDELT, SerpAPI) to work
from within the Space itself, add your keys as HF Secrets instead of a .env
file:
- Go to your Space → Settings → Repository secrets
- Add:
SERPAPI_KEY— your SerpAPI key- Any other keys from your
.env
In app.py / pipeline scripts, these are already loaded via python-dotenv
and os.environ, so no code changes are needed.
Automating Pipeline Runs with GitHub Actions (Optional)
If you want the data to refresh automatically every day without touching your laptop, you can use a free GitHub Actions workflow that runs the pipeline and pushes to both GitHub and HF.
Create .github/workflows/daily_pipeline.yml:
name: Daily Pipeline
on:
schedule:
- cron: "0 6 * * *" # 6 AM UTC every day
workflow_dispatch: # also allows manual trigger from GitHub UI
jobs:
run-pipeline:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.11"
- name: Install dependencies
run: pip install -r requirements.txt
- name: Run pipeline
env:
SERPAPI_KEY: ${{ secrets.SERPAPI_KEY }}
run: python pipeline/run_pipeline.py
- name: Commit updated data
run: |
git config user.email "action@github.com"
git config user.name "GitHub Action"
git add data/base/ src/models/saved/ outputs/
git diff --staged --quiet || git commit -m "data: automated pipeline run $(date '+%Y-%m-%d')"
git push origin main
- name: Push to HF Space
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
run: |
git remote add space https://YOUR_HF_USERNAME:$HF_TOKEN@huggingface.co/spaces/YOUR_HF_USERNAME/aviation-disruption-intelligence
git push space main
Add HF_TOKEN (your HF write token) and SERPAPI_KEY to GitHub repo
Settings → Secrets and variables → Actions.
Summary of What Gets Deployed
| What | Where in repo | Notes |
|---|---|---|
| App code | src/app/app.py |
Entry point HF runs |
| Trained models | src/models/saved/*.pkl |
Committed (gitignore updated) |
| Base data | data/base/*.csv |
All CSVs committed |
| Pre-generated figures | outputs/figures/*.png |
SHAP beeswarms etc |
| Config | config/settings.py |
Paths resolve relative to file location |
| Pipeline scripts | pipeline/ |
Not run on HF, but available |
Not deployed (not needed for the app):
data/raw/— ingestion snapshots, gitignored.env— use HF Secrets instead.docx/.pptx— excluded via gitignore
Group 3 — ESCP MIM2 Applied Data Science — April 2026