gapguide-api / INSTALL.md
arifRB's picture
Deploy GapGuide backend (Docker)
ffd36e0 verified
|
Raw
History Blame Contribute Delete
4.09 kB

Local Setup β€” GapGuide Backend

Prerequisites: Windows 10/11, PostgreSQL 17 installed, Python 3.13 (already set up at backend/venv/), Node 20+ for the frontend.

1. Install pgvector extension on PostgreSQL 17

Module 8 Layer 4 uses pgvector for SBERT cosine similarity. CREATE EXTENSION vector fails until the extension files are installed into the PostgreSQL server directory. Two options:

Option A β€” Precompiled binary (recommended, ~2 minutes)

  1. Go to https://github.com/andreiramani/pgvector_pgsql_windows/releases
  2. Download the release tagged 0.8.2_17.6 (built against PostgreSQL 17.6; works on 17.3+ β€” older 17.0–17.2 hit a linker bug).
  3. Extract. You should see vector.control, vector--*.sql, and vector.dll.
  4. Copy files:
    • vector.control and vector--*.sql β†’ C:\Program Files\PostgreSQL\17\share\extension\
    • vector.dll β†’ C:\Program Files\PostgreSQL\17\lib\
    • Administrator rights required (Program Files is protected).
  5. Verify in psql:
    psql -U postgres -d gapguide -c "CREATE EXTENSION IF NOT EXISTS vector; SELECT '[1,2,3]'::vector;"
    
    Output should include [1,2,3].

Option B β€” Build from source (~30 min, no trusted binary)

  1. Install Visual Studio with "Desktop development with C++".
  2. Open "x64 Native Tools Command Prompt for VS" as administrator.
  3. Run:
    set "PGROOT=C:\Program Files\PostgreSQL\17"
    cd %TEMP%
    git clone --branch v0.8.2 https://github.com/pgvector/pgvector.git
    cd pgvector
    nmake /F Makefile.win
    nmake /F Makefile.win install
    
  4. Verify as in Option A step 5.

2. Python dependencies

cd backend
.\venv\Scripts\Activate.ps1
pip install -r requirements.txt
python -m spacy download en_core_web_sm

First install is ~5 minutes (torch CPU wheel is the heaviest at ~200 MB).

The NER layers will use en_core_web_sm (12 MB) by default. If you want slightly better noun-phrase quality on paraphrased CVs, upgrade to en_core_web_lg (~560 MB) β€” the layers auto-detect and prefer _lg:

python -m spacy download en_core_web_lg

HuggingFace models download lazily on first parse into %USERPROFILE%\.cache\huggingface (~1 GB total across Nucha BERT, JobBERT, SBERT). Subsequent parses are instant.

3. Database setup

# create the DB (if fresh)
createdb -U postgres gapguide

# migrate (requires pgvector from step 1)
python manage.py migrate

# seed the catalog + 5 demo users
python manage.py seed_initial_skills
python manage.py seed_initial_roles
python manage.py seed_initial_resources
python manage.py seed_demo_users

# build SBERT embeddings for the Skill catalog (Module 8 Layer 4)
python scripts/build_skill_embeddings.py

Demo user login: demo.partial@gapguide.test / DemoPass123!.

4. Run locally

# terminal 1 β€” backend
cd backend
.\venv\Scripts\Activate.ps1
python manage.py runserver

# terminal 2 β€” frontend
cd frontend
npm install
npm run dev

Open http://localhost:8080.

5. Running tests

cd backend

# CI baseline (no ML, seconds)
$env:GAPGUIDE_PARSE_LAYERS = "lexical"
pytest -q

# ML smoke test (downloads models on first run β€” ~15 min first time)
Remove-Item Env:\GAPGUIDE_PARSE_LAYERS
$env:GAPGUIDE_ML_SMOKE = "1"
pytest apps/accounts/tests/test_resume_parser_integration.py -q

Troubleshooting

  • extension "vector" is not available β€” pgvector not installed on your Postgres. Redo step 1. The vector.control file must exist in PostgreSQL's share\extension\ directory.
  • ModuleNotFoundError: pgvector β€” pip install -r requirements.txt hasn't run (or ran in a different venv). Activate the venv first.
  • spaCy model not found β€” run python -m spacy download en_core_web_lg.
  • Slow first resume parse β€” normal. The NER chain downloads ~1 GB of models on first call. Subsequent parses use the cache.
  • GAPGUIDE_PARSE_LAYERS=lexical β€” lets you run the backend without any ML deps loaded. Useful during dev when you don't want the first-call model downloads.