aviation-disruption-intelligence / docs /HuggingFace_Deployment_Guide.md
bhanug2026
Initial commit
47c6cfd

Deploying ADI to Hugging Face Spaces

ESCP MIM2 — Applied Data Science | April 2026


Overview

Hugging Face Spaces hosts Streamlit apps for free with no time limits. Your app reads from committed CSV/pickle files, so the deployed version always reflects whatever data and models you last pushed. When you want to refresh the data, you run the pipeline locally and push again.

Total size: ~14 MB — well within the free tier limit (50 GB).


One-time Setup

Step 1 — Install the Hugging Face CLI

pip install huggingface_hub

Step 2 — Log in with your ESCP HF account

huggingface-cli login

This opens a prompt asking for a token. Go to: https://huggingface.co/settings/tokens → New token → Role: Write → copy and paste it.

Step 3 — Create your Space on the HF website

  1. Go to https://huggingface.co/new-space
  2. Fill in:
    • Space name: aviation-disruption-intelligence (or any name you like)
    • License: MIT
    • SDK: Streamlit ← important
    • Visibility: Public (or Private if you prefer)
  3. Click Create Space — HF will create an empty Git repo for you.

Step 4 — Add HF as a second git remote

In your local project folder (where the code lives), run:

# Replace YOUR_HF_USERNAME with your actual HF username
git remote add space https://huggingface.co/spaces/YOUR_HF_USERNAME/aviation-disruption-intelligence

You now have two remotes:

  • origin → your GitHub repo
  • space → Hugging Face Space

Pushing the App (First Deploy)

Run these commands from the project root:

# 1. Make sure you are on main branch
git checkout main

# 2. Stage everything, including the model .pkl files
#    (they were previously gitignored — now they are allowed)
git add src/models/saved/
git add data/base/
git add outputs/figures/
git add README.md requirements.txt .gitignore
git add src/ config/ pipeline/

# 3. Commit
git commit -m "chore: prepare repo for Hugging Face Spaces deployment"

# 4. Push to GitHub (your normal remote)
git push origin main

# 5. Push to HF Space
git push space main

HF will automatically detect sdk: streamlit in the README frontmatter and app_file: src/app/app.py, install requirements.txt, and start the app. Build takes ~2–4 minutes. You can watch the build logs live on your Space page.


Verifying the Deployment

Once the build is green, visit:

https://huggingface.co/spaces/YOUR_HF_USERNAME/aviation-disruption-intelligence

The app should load showing all 11 tabs. If you see an error, click "View Logs" on the Space page — the most common issues are:

Error Fix
ModuleNotFoundError: shap requirements.txt already includes it; wait for rebuild
FileNotFoundError: best_classifier.pkl Run Step 2 of this guide again — pkl files were not committed
No module named src The app uses PROJECT_ROOT = Path(__file__).parent.parent.parent — this resolves correctly as long as app_file: src/app/app.py is set in README

Updating Data (Keeping It Fresh)

HF Spaces does not run your pipeline automatically on the free tier. The workflow is:

Run locally → commit → push to HF.

# 1. Run the full pipeline locally to refresh data + retrain models
cd your-project-folder
python pipeline/run_pipeline.py

# 2. Stage the updated files
git add data/base/
git add src/models/saved/
git add outputs/

# 3. Commit with a timestamp
git commit -m "data: pipeline run $(date '+%Y-%m-%d %H:%M')"

# 4. Push to both remotes
git push origin main
git push space main

HF automatically rebuilds and restarts the app when it detects a new push. The rebuild usually takes under 60 seconds for a data-only update.


Setting API Keys / Secrets (Optional)

If you want the live ingestion features (yfinance, GDELT, SerpAPI) to work from within the Space itself, add your keys as HF Secrets instead of a .env file:

  1. Go to your Space → SettingsRepository secrets
  2. Add:
    • SERPAPI_KEY — your SerpAPI key
    • Any other keys from your .env

In app.py / pipeline scripts, these are already loaded via python-dotenv and os.environ, so no code changes are needed.


Automating Pipeline Runs with GitHub Actions (Optional)

If you want the data to refresh automatically every day without touching your laptop, you can use a free GitHub Actions workflow that runs the pipeline and pushes to both GitHub and HF.

Create .github/workflows/daily_pipeline.yml:

name: Daily Pipeline

on:
  schedule:
    - cron: "0 6 * * *"   # 6 AM UTC every day
  workflow_dispatch:        # also allows manual trigger from GitHub UI

jobs:
  run-pipeline:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: "3.11"

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Run pipeline
        env:
          SERPAPI_KEY: ${{ secrets.SERPAPI_KEY }}
        run: python pipeline/run_pipeline.py

      - name: Commit updated data
        run: |
          git config user.email "action@github.com"
          git config user.name "GitHub Action"
          git add data/base/ src/models/saved/ outputs/
          git diff --staged --quiet || git commit -m "data: automated pipeline run $(date '+%Y-%m-%d')"
          git push origin main

      - name: Push to HF Space
        env:
          HF_TOKEN: ${{ secrets.HF_TOKEN }}
        run: |
          git remote add space https://YOUR_HF_USERNAME:$HF_TOKEN@huggingface.co/spaces/YOUR_HF_USERNAME/aviation-disruption-intelligence
          git push space main

Add HF_TOKEN (your HF write token) and SERPAPI_KEY to GitHub repo Settings → Secrets and variables → Actions.


Summary of What Gets Deployed

What Where in repo Notes
App code src/app/app.py Entry point HF runs
Trained models src/models/saved/*.pkl Committed (gitignore updated)
Base data data/base/*.csv All CSVs committed
Pre-generated figures outputs/figures/*.png SHAP beeswarms etc
Config config/settings.py Paths resolve relative to file location
Pipeline scripts pipeline/ Not run on HF, but available

Not deployed (not needed for the app):

  • data/raw/ — ingestion snapshots, gitignored
  • .env — use HF Secrets instead
  • .docx / .pptx — excluded via gitignore

Group 3 — ESCP MIM2 Applied Data Science — April 2026