Spaces:

arifRB
/

gapguide-api

Sleeping

App Files Files Community

gapguide-api / INSTALL.md

arifRB

Deploy GapGuide backend (Docker)

ffd36e0 verified 15 days ago

preview code

Raw

History Blame Contribute Delete

4.09 kB

Local Setup — GapGuide Backend

Prerequisites: Windows 10/11, PostgreSQL 17 installed, Python 3.13 (already set up at backend/venv/), Node 20+ for the frontend.

1. Install pgvector extension on PostgreSQL 17

Module 8 Layer 4 uses pgvector for SBERT cosine similarity. CREATE EXTENSION vector fails until the extension files are installed into the PostgreSQL server directory. Two options:

Option A — Precompiled binary (recommended, ~2 minutes)

Go to https://github.com/andreiramani/pgvector_pgsql_windows/releases
Download the release tagged 0.8.2_17.6 (built against PostgreSQL 17.6; works on 17.3+ — older 17.0–17.2 hit a linker bug).
Extract. You should see vector.control, vector--*.sql, and vector.dll.
Copy files:
- vector.control and vector--*.sql → C:\Program Files\PostgreSQL\17\share\extension\
- vector.dll → C:\Program Files\PostgreSQL\17\lib\
- Administrator rights required (Program Files is protected).

Verify in psql:

psql -U postgres -d gapguide -c "CREATE EXTENSION IF NOT EXISTS vector; SELECT '[1,2,3]'::vector;"

Output should include [1,2,3].

Option B — Build from source (~30 min, no trusted binary)

Install Visual Studio with "Desktop development with C++".
Open "x64 Native Tools Command Prompt for VS" as administrator.

Run:

set "PGROOT=C:\Program Files\PostgreSQL\17"
cd %TEMP%
git clone --branch v0.8.2 https://github.com/pgvector/pgvector.git
cd pgvector
nmake /F Makefile.win
nmake /F Makefile.win install

Verify as in Option A step 5.

2. Python dependencies

cd backend
.\venv\Scripts\Activate.ps1
pip install -r requirements.txt
python -m spacy download en_core_web_sm

First install is ~5 minutes (torch CPU wheel is the heaviest at ~200 MB).

The NER layers will use en_core_web_sm (12 MB) by default. If you want slightly better noun-phrase quality on paraphrased CVs, upgrade to en_core_web_lg (~560 MB) — the layers auto-detect and prefer _lg:

python -m spacy download en_core_web_lg

HuggingFace models download lazily on first parse into %USERPROFILE%\.cache\huggingface (~1 GB total across Nucha BERT, JobBERT, SBERT). Subsequent parses are instant.

3. Database setup

# create the DB (if fresh)
createdb -U postgres gapguide

# migrate (requires pgvector from step 1)
python manage.py migrate

# seed the catalog + 5 demo users
python manage.py seed_initial_skills
python manage.py seed_initial_roles
python manage.py seed_initial_resources
python manage.py seed_demo_users

# build SBERT embeddings for the Skill catalog (Module 8 Layer 4)
python scripts/build_skill_embeddings.py

Demo user login: demo.partial@gapguide.test / DemoPass123!.

4. Run locally

# terminal 1 — backend
cd backend
.\venv\Scripts\Activate.ps1
python manage.py runserver

# terminal 2 — frontend
cd frontend
npm install
npm run dev

Open http://localhost:8080.

5. Running tests

cd backend

# CI baseline (no ML, seconds)
$env:GAPGUIDE_PARSE_LAYERS = "lexical"
pytest -q

# ML smoke test (downloads models on first run — ~15 min first time)
Remove-Item Env:\GAPGUIDE_PARSE_LAYERS
$env:GAPGUIDE_ML_SMOKE = "1"
pytest apps/accounts/tests/test_resume_parser_integration.py -q

Troubleshooting

extension "vector" is not available — pgvector not installed on your Postgres. Redo step 1. The vector.control file must exist in PostgreSQL's share\extension\ directory.
ModuleNotFoundError: pgvector — pip install -r requirements.txt hasn't run (or ran in a different venv). Activate the venv first.
spaCy model not found — run python -m spacy download en_core_web_lg.
Slow first resume parse — normal. The NER chain downloads ~1 GB of models on first call. Subsequent parses use the cache.
GAPGUIDE_PARSE_LAYERS=lexical — lets you run the backend without any ML deps loaded. Useful during dev when you don't want the first-call model downloads.