# Local Setup — GapGuide Backend

Prerequisites: Windows 10/11, PostgreSQL 17 installed, Python 3.13 (already set up at `backend/venv/`), Node 20+ for the frontend.

## 1. Install pgvector extension on PostgreSQL 17

Module 8 Layer 4 uses `pgvector` for SBERT cosine similarity. `CREATE EXTENSION vector` fails until the extension files are installed into the PostgreSQL server directory. Two options:

### Option A — Precompiled binary (recommended, ~2 minutes)

1. Go to https://github.com/andreiramani/pgvector_pgsql_windows/releases
2. Download the release tagged `0.8.2_17.6` (built against PostgreSQL 17.6; works on 17.3+ — older 17.0–17.2 hit a linker bug).
3. Extract. You should see `vector.control`, `vector--*.sql`, and `vector.dll`.
4. Copy files:
   - `vector.control` and `vector--*.sql` → `C:\Program Files\PostgreSQL\17\share\extension\`
   - `vector.dll` → `C:\Program Files\PostgreSQL\17\lib\`
   - Administrator rights required (Program Files is protected).
5. Verify in psql:
   ```
   psql -U postgres -d gapguide -c "CREATE EXTENSION IF NOT EXISTS vector; SELECT '[1,2,3]'::vector;"
   ```
   Output should include `[1,2,3]`.

### Option B — Build from source (~30 min, no trusted binary)

1. Install Visual Studio with "Desktop development with C++".
2. Open "x64 Native Tools Command Prompt for VS" as **administrator**.
3. Run:
   ```
   set "PGROOT=C:\Program Files\PostgreSQL\17"
   cd %TEMP%
   git clone --branch v0.8.2 https://github.com/pgvector/pgvector.git
   cd pgvector
   nmake /F Makefile.win
   nmake /F Makefile.win install
   ```
4. Verify as in Option A step 5.

## 2. Python dependencies

```powershell
cd backend
.\venv\Scripts\Activate.ps1
pip install -r requirements.txt
python -m spacy download en_core_web_sm
```

First install is ~5 minutes (torch CPU wheel is the heaviest at ~200 MB).

The NER layers will use `en_core_web_sm` (12 MB) by default. If you want
slightly better noun-phrase quality on paraphrased CVs, upgrade to
`en_core_web_lg` (~560 MB) — the layers auto-detect and prefer `_lg`:

```powershell
python -m spacy download en_core_web_lg
```

HuggingFace models download lazily on first parse into `%USERPROFILE%\.cache\huggingface` (~1 GB total across Nucha BERT, JobBERT, SBERT). Subsequent parses are instant.

## 3. Database setup

```powershell
# create the DB (if fresh)
createdb -U postgres gapguide

# migrate (requires pgvector from step 1)
python manage.py migrate

# seed the catalog + 5 demo users
python manage.py seed_initial_skills
python manage.py seed_initial_roles
python manage.py seed_initial_resources
python manage.py seed_demo_users

# build SBERT embeddings for the Skill catalog (Module 8 Layer 4)
python scripts/build_skill_embeddings.py
```

Demo user login: `demo.partial@gapguide.test` / `DemoPass123!`.

## 4. Run locally

```powershell
# terminal 1 — backend
cd backend
.\venv\Scripts\Activate.ps1
python manage.py runserver

# terminal 2 — frontend
cd frontend
npm install
npm run dev
```

Open http://localhost:8080.

## 5. Running tests

```powershell
cd backend

# CI baseline (no ML, seconds)
$env:GAPGUIDE_PARSE_LAYERS = "lexical"
pytest -q

# ML smoke test (downloads models on first run — ~15 min first time)
Remove-Item Env:\GAPGUIDE_PARSE_LAYERS
$env:GAPGUIDE_ML_SMOKE = "1"
pytest apps/accounts/tests/test_resume_parser_integration.py -q
```

## Troubleshooting

- **`extension "vector" is not available`** — pgvector not installed on your Postgres. Redo step 1. The `vector.control` file must exist in PostgreSQL's `share\extension\` directory.
- **`ModuleNotFoundError: pgvector`** — `pip install -r requirements.txt` hasn't run (or ran in a different venv). Activate the venv first.
- **`spaCy model not found`** — run `python -m spacy download en_core_web_lg`.
- **Slow first resume parse** — normal. The NER chain downloads ~1 GB of models on first call. Subsequent parses use the cache.
- **`GAPGUIDE_PARSE_LAYERS=lexical`** — lets you run the backend without any ML deps loaded. Useful during dev when you don't want the first-call model downloads.