Spaces:
Sleeping
Sleeping
| # Local Setup β GapGuide Backend | |
| Prerequisites: Windows 10/11, PostgreSQL 17 installed, Python 3.13 (already set up at `backend/venv/`), Node 20+ for the frontend. | |
| ## 1. Install pgvector extension on PostgreSQL 17 | |
| Module 8 Layer 4 uses `pgvector` for SBERT cosine similarity. `CREATE EXTENSION vector` fails until the extension files are installed into the PostgreSQL server directory. Two options: | |
| ### Option A β Precompiled binary (recommended, ~2 minutes) | |
| 1. Go to https://github.com/andreiramani/pgvector_pgsql_windows/releases | |
| 2. Download the release tagged `0.8.2_17.6` (built against PostgreSQL 17.6; works on 17.3+ β older 17.0β17.2 hit a linker bug). | |
| 3. Extract. You should see `vector.control`, `vector--*.sql`, and `vector.dll`. | |
| 4. Copy files: | |
| - `vector.control` and `vector--*.sql` β `C:\Program Files\PostgreSQL\17\share\extension\` | |
| - `vector.dll` β `C:\Program Files\PostgreSQL\17\lib\` | |
| - Administrator rights required (Program Files is protected). | |
| 5. Verify in psql: | |
| ``` | |
| psql -U postgres -d gapguide -c "CREATE EXTENSION IF NOT EXISTS vector; SELECT '[1,2,3]'::vector;" | |
| ``` | |
| Output should include `[1,2,3]`. | |
| ### Option B β Build from source (~30 min, no trusted binary) | |
| 1. Install Visual Studio with "Desktop development with C++". | |
| 2. Open "x64 Native Tools Command Prompt for VS" as **administrator**. | |
| 3. Run: | |
| ``` | |
| set "PGROOT=C:\Program Files\PostgreSQL\17" | |
| cd %TEMP% | |
| git clone --branch v0.8.2 https://github.com/pgvector/pgvector.git | |
| cd pgvector | |
| nmake /F Makefile.win | |
| nmake /F Makefile.win install | |
| ``` | |
| 4. Verify as in Option A step 5. | |
| ## 2. Python dependencies | |
| ```powershell | |
| cd backend | |
| .\venv\Scripts\Activate.ps1 | |
| pip install -r requirements.txt | |
| python -m spacy download en_core_web_sm | |
| ``` | |
| First install is ~5 minutes (torch CPU wheel is the heaviest at ~200 MB). | |
| The NER layers will use `en_core_web_sm` (12 MB) by default. If you want | |
| slightly better noun-phrase quality on paraphrased CVs, upgrade to | |
| `en_core_web_lg` (~560 MB) β the layers auto-detect and prefer `_lg`: | |
| ```powershell | |
| python -m spacy download en_core_web_lg | |
| ``` | |
| HuggingFace models download lazily on first parse into `%USERPROFILE%\.cache\huggingface` (~1 GB total across Nucha BERT, JobBERT, SBERT). Subsequent parses are instant. | |
| ## 3. Database setup | |
| ```powershell | |
| # create the DB (if fresh) | |
| createdb -U postgres gapguide | |
| # migrate (requires pgvector from step 1) | |
| python manage.py migrate | |
| # seed the catalog + 5 demo users | |
| python manage.py seed_initial_skills | |
| python manage.py seed_initial_roles | |
| python manage.py seed_initial_resources | |
| python manage.py seed_demo_users | |
| # build SBERT embeddings for the Skill catalog (Module 8 Layer 4) | |
| python scripts/build_skill_embeddings.py | |
| ``` | |
| Demo user login: `demo.partial@gapguide.test` / `DemoPass123!`. | |
| ## 4. Run locally | |
| ```powershell | |
| # terminal 1 β backend | |
| cd backend | |
| .\venv\Scripts\Activate.ps1 | |
| python manage.py runserver | |
| # terminal 2 β frontend | |
| cd frontend | |
| npm install | |
| npm run dev | |
| ``` | |
| Open http://localhost:8080. | |
| ## 5. Running tests | |
| ```powershell | |
| cd backend | |
| # CI baseline (no ML, seconds) | |
| $env:GAPGUIDE_PARSE_LAYERS = "lexical" | |
| pytest -q | |
| # ML smoke test (downloads models on first run β ~15 min first time) | |
| Remove-Item Env:\GAPGUIDE_PARSE_LAYERS | |
| $env:GAPGUIDE_ML_SMOKE = "1" | |
| pytest apps/accounts/tests/test_resume_parser_integration.py -q | |
| ``` | |
| ## Troubleshooting | |
| - **`extension "vector" is not available`** β pgvector not installed on your Postgres. Redo step 1. The `vector.control` file must exist in PostgreSQL's `share\extension\` directory. | |
| - **`ModuleNotFoundError: pgvector`** β `pip install -r requirements.txt` hasn't run (or ran in a different venv). Activate the venv first. | |
| - **`spaCy model not found`** β run `python -m spacy download en_core_web_lg`. | |
| - **Slow first resume parse** β normal. The NER chain downloads ~1 GB of models on first call. Subsequent parses use the cache. | |
| - **`GAPGUIDE_PARSE_LAYERS=lexical`** β lets you run the backend without any ML deps loaded. Useful during dev when you don't want the first-call model downloads. | |