gapguide-api / INSTALL.md
arifRB's picture
Deploy GapGuide backend (Docker)
ffd36e0 verified
|
Raw
History Blame Contribute Delete
4.09 kB
# Local Setup β€” GapGuide Backend
Prerequisites: Windows 10/11, PostgreSQL 17 installed, Python 3.13 (already set up at `backend/venv/`), Node 20+ for the frontend.
## 1. Install pgvector extension on PostgreSQL 17
Module 8 Layer 4 uses `pgvector` for SBERT cosine similarity. `CREATE EXTENSION vector` fails until the extension files are installed into the PostgreSQL server directory. Two options:
### Option A β€” Precompiled binary (recommended, ~2 minutes)
1. Go to https://github.com/andreiramani/pgvector_pgsql_windows/releases
2. Download the release tagged `0.8.2_17.6` (built against PostgreSQL 17.6; works on 17.3+ β€” older 17.0–17.2 hit a linker bug).
3. Extract. You should see `vector.control`, `vector--*.sql`, and `vector.dll`.
4. Copy files:
- `vector.control` and `vector--*.sql` β†’ `C:\Program Files\PostgreSQL\17\share\extension\`
- `vector.dll` β†’ `C:\Program Files\PostgreSQL\17\lib\`
- Administrator rights required (Program Files is protected).
5. Verify in psql:
```
psql -U postgres -d gapguide -c "CREATE EXTENSION IF NOT EXISTS vector; SELECT '[1,2,3]'::vector;"
```
Output should include `[1,2,3]`.
### Option B β€” Build from source (~30 min, no trusted binary)
1. Install Visual Studio with "Desktop development with C++".
2. Open "x64 Native Tools Command Prompt for VS" as **administrator**.
3. Run:
```
set "PGROOT=C:\Program Files\PostgreSQL\17"
cd %TEMP%
git clone --branch v0.8.2 https://github.com/pgvector/pgvector.git
cd pgvector
nmake /F Makefile.win
nmake /F Makefile.win install
```
4. Verify as in Option A step 5.
## 2. Python dependencies
```powershell
cd backend
.\venv\Scripts\Activate.ps1
pip install -r requirements.txt
python -m spacy download en_core_web_sm
```
First install is ~5 minutes (torch CPU wheel is the heaviest at ~200 MB).
The NER layers will use `en_core_web_sm` (12 MB) by default. If you want
slightly better noun-phrase quality on paraphrased CVs, upgrade to
`en_core_web_lg` (~560 MB) β€” the layers auto-detect and prefer `_lg`:
```powershell
python -m spacy download en_core_web_lg
```
HuggingFace models download lazily on first parse into `%USERPROFILE%\.cache\huggingface` (~1 GB total across Nucha BERT, JobBERT, SBERT). Subsequent parses are instant.
## 3. Database setup
```powershell
# create the DB (if fresh)
createdb -U postgres gapguide
# migrate (requires pgvector from step 1)
python manage.py migrate
# seed the catalog + 5 demo users
python manage.py seed_initial_skills
python manage.py seed_initial_roles
python manage.py seed_initial_resources
python manage.py seed_demo_users
# build SBERT embeddings for the Skill catalog (Module 8 Layer 4)
python scripts/build_skill_embeddings.py
```
Demo user login: `demo.partial@gapguide.test` / `DemoPass123!`.
## 4. Run locally
```powershell
# terminal 1 β€” backend
cd backend
.\venv\Scripts\Activate.ps1
python manage.py runserver
# terminal 2 β€” frontend
cd frontend
npm install
npm run dev
```
Open http://localhost:8080.
## 5. Running tests
```powershell
cd backend
# CI baseline (no ML, seconds)
$env:GAPGUIDE_PARSE_LAYERS = "lexical"
pytest -q
# ML smoke test (downloads models on first run β€” ~15 min first time)
Remove-Item Env:\GAPGUIDE_PARSE_LAYERS
$env:GAPGUIDE_ML_SMOKE = "1"
pytest apps/accounts/tests/test_resume_parser_integration.py -q
```
## Troubleshooting
- **`extension "vector" is not available`** β€” pgvector not installed on your Postgres. Redo step 1. The `vector.control` file must exist in PostgreSQL's `share\extension\` directory.
- **`ModuleNotFoundError: pgvector`** β€” `pip install -r requirements.txt` hasn't run (or ran in a different venv). Activate the venv first.
- **`spaCy model not found`** β€” run `python -m spacy download en_core_web_lg`.
- **Slow first resume parse** β€” normal. The NER chain downloads ~1 GB of models on first call. Subsequent parses use the cache.
- **`GAPGUIDE_PARSE_LAYERS=lexical`** β€” lets you run the backend without any ML deps loaded. Useful during dev when you don't want the first-call model downloads.