# Deploying ADI to Hugging Face Spaces **ESCP MIM2 — Applied Data Science | April 2026** --- ## Overview Hugging Face Spaces hosts Streamlit apps for free with no time limits. Your app reads from committed CSV/pickle files, so the deployed version always reflects whatever data and models you last pushed. When you want to refresh the data, you run the pipeline locally and push again. **Total size:** ~14 MB — well within the free tier limit (50 GB). --- ## One-time Setup ### Step 1 — Install the Hugging Face CLI ```bash pip install huggingface_hub ``` ### Step 2 — Log in with your ESCP HF account ```bash huggingface-cli login ``` This opens a prompt asking for a token. Go to: **https://huggingface.co/settings/tokens** → New token → Role: **Write** → copy and paste it. ### Step 3 — Create your Space on the HF website 1. Go to **https://huggingface.co/new-space** 2. Fill in: - **Space name:** `aviation-disruption-intelligence` (or any name you like) - **License:** MIT - **SDK:** Streamlit ← important - **Visibility:** Public (or Private if you prefer) 3. Click **Create Space** — HF will create an empty Git repo for you. ### Step 4 — Add HF as a second git remote In your local project folder (where the code lives), run: ```bash # Replace YOUR_HF_USERNAME with your actual HF username git remote add space https://huggingface.co/spaces/YOUR_HF_USERNAME/aviation-disruption-intelligence ``` You now have two remotes: - `origin` → your GitHub repo - `space` → Hugging Face Space --- ## Pushing the App (First Deploy) Run these commands from the project root: ```bash # 1. Make sure you are on main branch git checkout main # 2. Stage everything, including the model .pkl files # (they were previously gitignored — now they are allowed) git add src/models/saved/ git add data/base/ git add outputs/figures/ git add README.md requirements.txt .gitignore git add src/ config/ pipeline/ # 3. Commit git commit -m "chore: prepare repo for Hugging Face Spaces deployment" # 4. Push to GitHub (your normal remote) git push origin main # 5. Push to HF Space git push space main ``` HF will automatically detect `sdk: streamlit` in the README frontmatter and `app_file: src/app/app.py`, install `requirements.txt`, and start the app. Build takes ~2–4 minutes. You can watch the build logs live on your Space page. --- ## Verifying the Deployment Once the build is green, visit: ``` https://huggingface.co/spaces/YOUR_HF_USERNAME/aviation-disruption-intelligence ``` The app should load showing all 11 tabs. If you see an error, click **"View Logs"** on the Space page — the most common issues are: | Error | Fix | |-------|-----| | `ModuleNotFoundError: shap` | requirements.txt already includes it; wait for rebuild | | `FileNotFoundError: best_classifier.pkl` | Run Step 2 of this guide again — pkl files were not committed | | `No module named src` | The app uses `PROJECT_ROOT = Path(__file__).parent.parent.parent` — this resolves correctly as long as `app_file: src/app/app.py` is set in README | --- ## Updating Data (Keeping It Fresh) HF Spaces does not run your pipeline automatically on the free tier. The workflow is: **Run locally → commit → push to HF.** ```bash # 1. Run the full pipeline locally to refresh data + retrain models cd your-project-folder python pipeline/run_pipeline.py # 2. Stage the updated files git add data/base/ git add src/models/saved/ git add outputs/ # 3. Commit with a timestamp git commit -m "data: pipeline run $(date '+%Y-%m-%d %H:%M')" # 4. Push to both remotes git push origin main git push space main ``` HF automatically rebuilds and restarts the app when it detects a new push. The rebuild usually takes under 60 seconds for a data-only update. --- ## Setting API Keys / Secrets (Optional) If you want the live ingestion features (yfinance, GDELT, SerpAPI) to work from within the Space itself, add your keys as HF Secrets instead of a `.env` file: 1. Go to your Space → **Settings** → **Repository secrets** 2. Add: - `SERPAPI_KEY` — your SerpAPI key - Any other keys from your `.env` In `app.py` / pipeline scripts, these are already loaded via `python-dotenv` and `os.environ`, so no code changes are needed. --- ## Automating Pipeline Runs with GitHub Actions (Optional) If you want the data to refresh automatically every day without touching your laptop, you can use a free GitHub Actions workflow that runs the pipeline and pushes to both GitHub and HF. Create `.github/workflows/daily_pipeline.yml`: ```yaml name: Daily Pipeline on: schedule: - cron: "0 6 * * *" # 6 AM UTC every day workflow_dispatch: # also allows manual trigger from GitHub UI jobs: run-pipeline: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Set up Python uses: actions/setup-python@v4 with: python-version: "3.11" - name: Install dependencies run: pip install -r requirements.txt - name: Run pipeline env: SERPAPI_KEY: ${{ secrets.SERPAPI_KEY }} run: python pipeline/run_pipeline.py - name: Commit updated data run: | git config user.email "action@github.com" git config user.name "GitHub Action" git add data/base/ src/models/saved/ outputs/ git diff --staged --quiet || git commit -m "data: automated pipeline run $(date '+%Y-%m-%d')" git push origin main - name: Push to HF Space env: HF_TOKEN: ${{ secrets.HF_TOKEN }} run: | git remote add space https://YOUR_HF_USERNAME:$HF_TOKEN@huggingface.co/spaces/YOUR_HF_USERNAME/aviation-disruption-intelligence git push space main ``` Add `HF_TOKEN` (your HF write token) and `SERPAPI_KEY` to **GitHub repo Settings → Secrets and variables → Actions**. --- ## Summary of What Gets Deployed | What | Where in repo | Notes | |------|--------------|-------| | App code | `src/app/app.py` | Entry point HF runs | | Trained models | `src/models/saved/*.pkl` | Committed (gitignore updated) | | Base data | `data/base/*.csv` | All CSVs committed | | Pre-generated figures | `outputs/figures/*.png` | SHAP beeswarms etc | | Config | `config/settings.py` | Paths resolve relative to file location | | Pipeline scripts | `pipeline/` | Not run on HF, but available | **Not deployed (not needed for the app):** - `data/raw/` — ingestion snapshots, gitignored - `.env` — use HF Secrets instead - `.docx` / `.pptx` — excluded via gitignore --- *Group 3 — ESCP MIM2 Applied Data Science — April 2026*