Spaces:

arifRB
/

gapguide-api

Sleeping

App Files Files Community

gapguide-api / INSTALL.md

arifRB

Deploy GapGuide backend (Docker)

ffd36e0 verified 15 days ago

preview code

Raw

History Blame Contribute Delete

4.09 kB

	# Local Setup — GapGuide Backend

	Prerequisites: Windows 10/11, PostgreSQL 17 installed, Python 3.13 (already set up at `backend/venv/`), Node 20+ for the frontend.

	## 1. Install pgvector extension on PostgreSQL 17

	Module 8 Layer 4 uses `pgvector` for SBERT cosine similarity. `CREATE EXTENSION vector` fails until the extension files are installed into the PostgreSQL server directory. Two options:

	### Option A — Precompiled binary (recommended, ~2 minutes)

	1. Go to https://github.com/andreiramani/pgvector_pgsql_windows/releases
	2. Download the release tagged `0.8.2_17.6` (built against PostgreSQL 17.6; works on 17.3+ — older 17.0–17.2 hit a linker bug).
	3. Extract. You should see `vector.control`, `vector--*.sql`, and `vector.dll`.
	4. Copy files:
	- `vector.control` and `vector--*.sql` → `C:\Program Files\PostgreSQL\17\share\extension\`
	- `vector.dll` → `C:\Program Files\PostgreSQL\17\lib\`
	- Administrator rights required (Program Files is protected).
	5. Verify in psql:
	```
	psql -U postgres -d gapguide -c "CREATE EXTENSION IF NOT EXISTS vector; SELECT '[1,2,3]'::vector;"
	```
	Output should include `[1,2,3]`.

	### Option B — Build from source (~30 min, no trusted binary)

	1. Install Visual Studio with "Desktop development with C++".
	2. Open "x64 Native Tools Command Prompt for VS" as administrator.
	3. Run:
	```
	set "PGROOT=C:\Program Files\PostgreSQL\17"
	cd %TEMP%
	git clone --branch v0.8.2 https://github.com/pgvector/pgvector.git
	cd pgvector
	nmake /F Makefile.win
	nmake /F Makefile.win install
	```
	4. Verify as in Option A step 5.

	## 2. Python dependencies

	```powershell
	cd backend
	.\venv\Scripts\Activate.ps1
	pip install -r requirements.txt
	python -m spacy download en_core_web_sm
	```

	First install is ~5 minutes (torch CPU wheel is the heaviest at ~200 MB).

	The NER layers will use `en_core_web_sm` (12 MB) by default. If you want
	slightly better noun-phrase quality on paraphrased CVs, upgrade to
	`en_core_web_lg` (~560 MB) — the layers auto-detect and prefer `_lg`:

	```powershell
	python -m spacy download en_core_web_lg
	```

	HuggingFace models download lazily on first parse into `%USERPROFILE%\.cache\huggingface` (~1 GB total across Nucha BERT, JobBERT, SBERT). Subsequent parses are instant.

	## 3. Database setup

	```powershell
	# create the DB (if fresh)
	createdb -U postgres gapguide

	# migrate (requires pgvector from step 1)
	python manage.py migrate

	# seed the catalog + 5 demo users
	python manage.py seed_initial_skills
	python manage.py seed_initial_roles
	python manage.py seed_initial_resources
	python manage.py seed_demo_users

	# build SBERT embeddings for the Skill catalog (Module 8 Layer 4)
	python scripts/build_skill_embeddings.py
	```

	Demo user login: `demo.partial@gapguide.test` / `DemoPass123!`.

	## 4. Run locally

	```powershell
	# terminal 1 — backend
	cd backend
	.\venv\Scripts\Activate.ps1
	python manage.py runserver

	# terminal 2 — frontend
	cd frontend
	npm install
	npm run dev
	```

	Open http://localhost:8080.

	## 5. Running tests

	```powershell
	cd backend

	# CI baseline (no ML, seconds)
	$env:GAPGUIDE_PARSE_LAYERS = "lexical"
	pytest -q

	# ML smoke test (downloads models on first run — ~15 min first time)
	Remove-Item Env:\GAPGUIDE_PARSE_LAYERS
	$env:GAPGUIDE_ML_SMOKE = "1"
	pytest apps/accounts/tests/test_resume_parser_integration.py -q
	```

	## Troubleshooting

	- `extension "vector" is not available` — pgvector not installed on your Postgres. Redo step 1. The `vector.control` file must exist in PostgreSQL's `share\extension\` directory.
	- `ModuleNotFoundError: pgvector` — `pip install -r requirements.txt` hasn't run (or ran in a different venv). Activate the venv first.
	- `spaCy model not found` — run `python -m spacy download en_core_web_lg`.
	- Slow first resume parse — normal. The NER chain downloads ~1 GB of models on first call. Subsequent parses use the cache.
	- `GAPGUIDE_PARSE_LAYERS=lexical` — lets you run the backend without any ML deps loaded. Useful during dev when you don't want the first-call model downloads.