# Local Setup — GapGuide Backend Prerequisites: Windows 10/11, PostgreSQL 17 installed, Python 3.13 (already set up at `backend/venv/`), Node 20+ for the frontend. ## 1. Install pgvector extension on PostgreSQL 17 Module 8 Layer 4 uses `pgvector` for SBERT cosine similarity. `CREATE EXTENSION vector` fails until the extension files are installed into the PostgreSQL server directory. Two options: ### Option A — Precompiled binary (recommended, ~2 minutes) 1. Go to https://github.com/andreiramani/pgvector_pgsql_windows/releases 2. Download the release tagged `0.8.2_17.6` (built against PostgreSQL 17.6; works on 17.3+ — older 17.0–17.2 hit a linker bug). 3. Extract. You should see `vector.control`, `vector--*.sql`, and `vector.dll`. 4. Copy files: - `vector.control` and `vector--*.sql` → `C:\Program Files\PostgreSQL\17\share\extension\` - `vector.dll` → `C:\Program Files\PostgreSQL\17\lib\` - Administrator rights required (Program Files is protected). 5. Verify in psql: ``` psql -U postgres -d gapguide -c "CREATE EXTENSION IF NOT EXISTS vector; SELECT '[1,2,3]'::vector;" ``` Output should include `[1,2,3]`. ### Option B — Build from source (~30 min, no trusted binary) 1. Install Visual Studio with "Desktop development with C++". 2. Open "x64 Native Tools Command Prompt for VS" as **administrator**. 3. Run: ``` set "PGROOT=C:\Program Files\PostgreSQL\17" cd %TEMP% git clone --branch v0.8.2 https://github.com/pgvector/pgvector.git cd pgvector nmake /F Makefile.win nmake /F Makefile.win install ``` 4. Verify as in Option A step 5. ## 2. Python dependencies ```powershell cd backend .\venv\Scripts\Activate.ps1 pip install -r requirements.txt python -m spacy download en_core_web_sm ``` First install is ~5 minutes (torch CPU wheel is the heaviest at ~200 MB). The NER layers will use `en_core_web_sm` (12 MB) by default. If you want slightly better noun-phrase quality on paraphrased CVs, upgrade to `en_core_web_lg` (~560 MB) — the layers auto-detect and prefer `_lg`: ```powershell python -m spacy download en_core_web_lg ``` HuggingFace models download lazily on first parse into `%USERPROFILE%\.cache\huggingface` (~1 GB total across Nucha BERT, JobBERT, SBERT). Subsequent parses are instant. ## 3. Database setup ```powershell # create the DB (if fresh) createdb -U postgres gapguide # migrate (requires pgvector from step 1) python manage.py migrate # seed the catalog + 5 demo users python manage.py seed_initial_skills python manage.py seed_initial_roles python manage.py seed_initial_resources python manage.py seed_demo_users # build SBERT embeddings for the Skill catalog (Module 8 Layer 4) python scripts/build_skill_embeddings.py ``` Demo user login: `demo.partial@gapguide.test` / `DemoPass123!`. ## 4. Run locally ```powershell # terminal 1 — backend cd backend .\venv\Scripts\Activate.ps1 python manage.py runserver # terminal 2 — frontend cd frontend npm install npm run dev ``` Open http://localhost:8080. ## 5. Running tests ```powershell cd backend # CI baseline (no ML, seconds) $env:GAPGUIDE_PARSE_LAYERS = "lexical" pytest -q # ML smoke test (downloads models on first run — ~15 min first time) Remove-Item Env:\GAPGUIDE_PARSE_LAYERS $env:GAPGUIDE_ML_SMOKE = "1" pytest apps/accounts/tests/test_resume_parser_integration.py -q ``` ## Troubleshooting - **`extension "vector" is not available`** — pgvector not installed on your Postgres. Redo step 1. The `vector.control` file must exist in PostgreSQL's `share\extension\` directory. - **`ModuleNotFoundError: pgvector`** — `pip install -r requirements.txt` hasn't run (or ran in a different venv). Activate the venv first. - **`spaCy model not found`** — run `python -m spacy download en_core_web_lg`. - **Slow first resume parse** — normal. The NER chain downloads ~1 GB of models on first call. Subsequent parses use the cache. - **`GAPGUIDE_PARSE_LAYERS=lexical`** — lets you run the backend without any ML deps loaded. Useful during dev when you don't want the first-call model downloads.