| # Deploying ADI to Hugging Face Spaces |
|
|
| **ESCP MIM2 — Applied Data Science | April 2026** |
|
|
| --- |
|
|
| ## Overview |
|
|
| Hugging Face Spaces hosts Streamlit apps for free with no time limits. Your app |
| reads from committed CSV/pickle files, so the deployed version always reflects |
| whatever data and models you last pushed. When you want to refresh the data, |
| you run the pipeline locally and push again. |
|
|
| **Total size:** ~14 MB — well within the free tier limit (50 GB). |
|
|
| --- |
|
|
| ## One-time Setup |
|
|
| ### Step 1 — Install the Hugging Face CLI |
|
|
| ```bash |
| pip install huggingface_hub |
| ``` |
|
|
| ### Step 2 — Log in with your ESCP HF account |
|
|
| ```bash |
| huggingface-cli login |
| ``` |
|
|
| This opens a prompt asking for a token. Go to: |
| **https://huggingface.co/settings/tokens** → New token → Role: **Write** → copy and paste it. |
|
|
| ### Step 3 — Create your Space on the HF website |
|
|
| 1. Go to **https://huggingface.co/new-space** |
| 2. Fill in: |
| - **Space name:** `aviation-disruption-intelligence` (or any name you like) |
| - **License:** MIT |
| - **SDK:** Streamlit ← important |
| - **Visibility:** Public (or Private if you prefer) |
| 3. Click **Create Space** — HF will create an empty Git repo for you. |
|
|
| ### Step 4 — Add HF as a second git remote |
|
|
| In your local project folder (where the code lives), run: |
|
|
| ```bash |
| # Replace YOUR_HF_USERNAME with your actual HF username |
| git remote add space https://huggingface.co/spaces/YOUR_HF_USERNAME/aviation-disruption-intelligence |
| ``` |
|
|
| You now have two remotes: |
| - `origin` → your GitHub repo |
| - `space` → Hugging Face Space |
|
|
| --- |
|
|
| ## Pushing the App (First Deploy) |
|
|
| Run these commands from the project root: |
|
|
| ```bash |
| # 1. Make sure you are on main branch |
| git checkout main |
| |
| # 2. Stage everything, including the model .pkl files |
| # (they were previously gitignored — now they are allowed) |
| git add src/models/saved/ |
| git add data/base/ |
| git add outputs/figures/ |
| git add README.md requirements.txt .gitignore |
| git add src/ config/ pipeline/ |
| |
| # 3. Commit |
| git commit -m "chore: prepare repo for Hugging Face Spaces deployment" |
| |
| # 4. Push to GitHub (your normal remote) |
| git push origin main |
| |
| # 5. Push to HF Space |
| git push space main |
| ``` |
|
|
| HF will automatically detect `sdk: streamlit` in the README frontmatter and |
| `app_file: src/app/app.py`, install `requirements.txt`, and start the app. |
| Build takes ~2–4 minutes. You can watch the build logs live on your Space page. |
|
|
| --- |
|
|
| ## Verifying the Deployment |
|
|
| Once the build is green, visit: |
| ``` |
| https://huggingface.co/spaces/YOUR_HF_USERNAME/aviation-disruption-intelligence |
| ``` |
|
|
| The app should load showing all 11 tabs. If you see an error, click |
| **"View Logs"** on the Space page — the most common issues are: |
|
|
| | Error | Fix | |
| |-------|-----| |
| | `ModuleNotFoundError: shap` | requirements.txt already includes it; wait for rebuild | |
| | `FileNotFoundError: best_classifier.pkl` | Run Step 2 of this guide again — pkl files were not committed | |
| | `No module named src` | The app uses `PROJECT_ROOT = Path(__file__).parent.parent.parent` — this resolves correctly as long as `app_file: src/app/app.py` is set in README | |
|
|
| --- |
|
|
| ## Updating Data (Keeping It Fresh) |
|
|
| HF Spaces does not run your pipeline automatically on the free tier. The |
| workflow is: |
|
|
| **Run locally → commit → push to HF.** |
|
|
| ```bash |
| # 1. Run the full pipeline locally to refresh data + retrain models |
| cd your-project-folder |
| python pipeline/run_pipeline.py |
| |
| # 2. Stage the updated files |
| git add data/base/ |
| git add src/models/saved/ |
| git add outputs/ |
| |
| # 3. Commit with a timestamp |
| git commit -m "data: pipeline run $(date '+%Y-%m-%d %H:%M')" |
| |
| # 4. Push to both remotes |
| git push origin main |
| git push space main |
| ``` |
|
|
| HF automatically rebuilds and restarts the app when it detects a new push. |
| The rebuild usually takes under 60 seconds for a data-only update. |
|
|
| --- |
|
|
| ## Setting API Keys / Secrets (Optional) |
|
|
| If you want the live ingestion features (yfinance, GDELT, SerpAPI) to work |
| from within the Space itself, add your keys as HF Secrets instead of a `.env` |
| file: |
|
|
| 1. Go to your Space → **Settings** → **Repository secrets** |
| 2. Add: |
| - `SERPAPI_KEY` — your SerpAPI key |
| - Any other keys from your `.env` |
|
|
| In `app.py` / pipeline scripts, these are already loaded via `python-dotenv` |
| and `os.environ`, so no code changes are needed. |
|
|
| --- |
|
|
| ## Automating Pipeline Runs with GitHub Actions (Optional) |
|
|
| If you want the data to refresh automatically every day without touching your |
| laptop, you can use a free GitHub Actions workflow that runs the pipeline and |
| pushes to both GitHub and HF. |
|
|
| Create `.github/workflows/daily_pipeline.yml`: |
|
|
| ```yaml |
| name: Daily Pipeline |
| |
| on: |
| schedule: |
| - cron: "0 6 * * *" # 6 AM UTC every day |
| workflow_dispatch: # also allows manual trigger from GitHub UI |
| |
| jobs: |
| run-pipeline: |
| runs-on: ubuntu-latest |
| steps: |
| - uses: actions/checkout@v3 |
| |
| - name: Set up Python |
| uses: actions/setup-python@v4 |
| with: |
| python-version: "3.11" |
| |
| - name: Install dependencies |
| run: pip install -r requirements.txt |
| |
| - name: Run pipeline |
| env: |
| SERPAPI_KEY: ${{ secrets.SERPAPI_KEY }} |
| run: python pipeline/run_pipeline.py |
| |
| - name: Commit updated data |
| run: | |
| git config user.email "action@github.com" |
| git config user.name "GitHub Action" |
| git add data/base/ src/models/saved/ outputs/ |
| git diff --staged --quiet || git commit -m "data: automated pipeline run $(date '+%Y-%m-%d')" |
| git push origin main |
| |
| - name: Push to HF Space |
| env: |
| HF_TOKEN: ${{ secrets.HF_TOKEN }} |
| run: | |
| git remote add space https://YOUR_HF_USERNAME:$HF_TOKEN@huggingface.co/spaces/YOUR_HF_USERNAME/aviation-disruption-intelligence |
| git push space main |
| ``` |
|
|
| Add `HF_TOKEN` (your HF write token) and `SERPAPI_KEY` to **GitHub repo |
| Settings → Secrets and variables → Actions**. |
|
|
| --- |
|
|
| ## Summary of What Gets Deployed |
|
|
| | What | Where in repo | Notes | |
| |------|--------------|-------| |
| | App code | `src/app/app.py` | Entry point HF runs | |
| | Trained models | `src/models/saved/*.pkl` | Committed (gitignore updated) | |
| | Base data | `data/base/*.csv` | All CSVs committed | |
| | Pre-generated figures | `outputs/figures/*.png` | SHAP beeswarms etc | |
| | Config | `config/settings.py` | Paths resolve relative to file location | |
| | Pipeline scripts | `pipeline/` | Not run on HF, but available | |
|
|
| **Not deployed (not needed for the app):** |
| - `data/raw/` — ingestion snapshots, gitignored |
| - `.env` — use HF Secrets instead |
| - `.docx` / `.pptx` — excluded via gitignore |
|
|
| --- |
|
|
| *Group 3 — ESCP MIM2 Applied Data Science — April 2026* |
|
|