Spaces:

GerardCB
/

GeoQuery

Sleeping

App Files Files Community

GeoQuery / SETUP.md

GerardCB

Deploy to Spaces (Final Clean)

4851501 21 days ago

preview code

raw

history blame contribute delete

10.2 kB

	# GeoQuery Setup Guide

	Complete guide for setting up the GeoQuery development environment.

	---

	## Prerequisites

	### Required Software

	\| Requirement \| Minimum Version \| Purpose \|
	\|------------\|----------------\|---------\|
	\| Python \| 3.11+ \| Backend runtime \|
	\| Node.js \| 18+ \| Frontend runtime \|
	\| npm \| 9+ \| Package management \|
	\| Git \| 2.0+ \| Version control \|

	### API Keys

	- Google AI API Key (Gemini): Required for LLM functionality
	- Get one free at: https://aistudio.google.com/app/apikey
	- Free tier: 15 requests/minute, 1500/day

	### System Requirements

	- RAM: 4GB minimum, 8GB recommended (for DuckDB in-memory database)
	- Disk: 2GB for datasets
	- OS: macOS, Linux, or Windows (WSL recommended)

	---

	## Installation

	### 1. Clone Repository

	```bash
	git clone https://github.com/GerardCB/GeoQuery.git
	cd GeoQuery
	```

	### 2. Backend Setup

	#### Create Virtual Environment

	```bash
	cd backend
	python3 -m venv venv
	```

	#### Activate Virtual Environment

	macOS/Linux:
	```bash
	source venv/bin/activate
	```

	Windows (PowerShell):
	```powershell
	venv\Scripts\Activate.ps1
	```

	Windows (CMD):
	```cmd
	venv\Scripts\activate.bat
	```

	#### Install Dependencies

	```bash
	pip install --upgrade pip
	pip install -e .
	```

	This installs the package in editable mode, including all dependencies from `setup.py`.

	Key Dependencies:
	- `fastapi` - Web framework
	- `uvicorn` - ASGI server
	- `duckdb` - Embedded database
	- `geopandas` - Geospatial data processing
	- `sentence-transformers` - Embeddings
	- `google-generativeai` - Gemini SDK

	#### Configure Environment Variables

	Create `.env` file in `backend/` directory:

	```bash
	# Required
	GEMINI_API_KEY=your-api-key-here

	# Optional (defaults shown)
	PORT=8000
	HOST=0.0.0.0
	LOG_LEVEL=INFO
	```

	Alternative: Export directly in terminal:

	```bash
	export GEMINI_API_KEY="your-api-key-here"
	```

	Windows:
	```powershell
	$env:GEMINI_API_KEY="your-api-key-here"
	```

	#### Verify Backend Installation

	```bash
	python -c "import backend; print('Backend installed successfully')"
	```

	### 3. Frontend Setup

	```bash
	cd ../frontend # From backend directory
	npm install
	```

	Key Dependencies:
	- `next` - React framework
	- `react` - UI library
	- `leaflet` - Map library
	- `react-leaflet` - React bindings for Leaflet
	- `@dnd-kit/core` - Drag and drop

	#### Configure Frontend (Optional)

	Edit `frontend/.env.local` if backend is not on default port:

	```bash
	NEXT_PUBLIC_API_URL=http://localhost:8000
	```

	---

	## Running Locally

	### Start Backend

	From `backend/` directory with venv activated:

	```bash
	uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000
	```

	Flags:
	- `--reload`: Auto-restart on code changes
	- `--host 0.0.0.0`: Allow external connections
	- `--port 8000`: Port number

	Expected Output:
	```
	INFO: Uvicorn running on http://0.0.0.0:8000
	INFO: Application startup complete.
	```

	Verify:
	- Open http://localhost:8000/docs → Should show FastAPI Swagger UI
	- Check http://localhost:8000/api/catalog → Should return GeoJSON catalog

	### Start Frontend

	From `frontend/` directory:

	```bash
	npm run dev
	```

	Expected Output:
	```
	▲ Next.js 15.1.3
	- Local: http://localhost:3000
	- Ready in 2.1s
	```

	Verify:
	- Open http://localhost:3000 → Should show GeoQuery chat interface

	---

	## Database Setup

	### DuckDB Initialization

	Automatic: Database is created in-memory on first query.

	Manual Test:

	```python
	from backend.core.geo_engine import get_geo_engine

	engine = get_geo_engine()
	print(f"Loaded tables: {list(engine.loaded_tables.keys())}")
	```

	### Load Initial Datasets

	Datasets are loaded lazily (on-demand). To pre-load common datasets:

	```python
	from backend.core.geo_engine import get_geo_engine

	engine = get_geo_engine()
	engine.ensure_table_loaded("pan_admin1") # Provinces
	engine.ensure_table_loaded("panama_healthsites_geojson") # Hospitals
	```

	### Generate Embeddings

	Required for semantic search:

	```bash
	cd backend
	python -c "from backend.core.semantic_search import get_semantic_search; get_semantic_search()"
	```

	This generates `backend/data/embeddings.npy` (cached for future use).

	---

	## Directory Structure After Setup

	```
	GeoQuery/
	├── backend/
	│ ├── venv/ # Virtual environment (created)
	│ ├── .env # Environment variables (created)
	│ ├── data/
	│ │ ├── embeddings.npy # Generated embeddings (created)
	│ │ ├── catalog.json # Dataset registry (existing)
	│ │ └── osm/ # GeoJSON datasets (existing)
	│ └── <source files>
	├── frontend/
	│ ├── node_modules/ # npm packages (created)
	│ ├── .next/ # Build output (created)
	│ └── <source files>
	└── <other files>
	```

	---

	## Common Issues & Troubleshooting

	### Backend Issues

	#### Issue: "ModuleNotFoundError: No module named 'backend'"

	Cause: Virtual environment not activated or package not installed.

	Solution:
	```bash
	source venv/bin/activate # Activate venv
	pip install -e . # Install package
	```

	#### Issue: "duckdb.IOException: No files found that match the pattern"

	Cause: GeoJSON file missing or incorrect path in catalog.json.

	Solution:
	1. Check file exists: `ls backend/data/osm/hospitals.geojson`
	2. Verify path in `catalog.json`
	3. Download missing data: `python backend/scripts/download_geofabrik.py`

	#### Issue: "google.api_core.exceptions.PermissionDenied: API key not valid"

	Cause: Invalid or missing GEMINI_API_KEY.

	Solution:
	```bash
	export GEMINI_API_KEY="your-actual-api-key"
	# Restart backend
	```

	#### Issue: "Module 'sentence_transformers' has no attribute 'SentenceTransformer'"

	Cause: Corrupted installation.

	Solution:
	```bash
	pip uninstall sentence-transformers
	pip install sentence-transformers --no-cache-dir
	```

	### Frontend Issues

	#### Issue: "Error: Cannot find module 'next'"

	Cause: npm packages not installed.

	Solution:
	```bash
	cd frontend
	rm -rf node_modules package-lock.json
	npm install
	```

	#### Issue: "Failed to fetch from localhost:8000"

	Cause: Backend not running or CORS issue.

	Solution:
	1. Verify backend is running: `curl http://localhost:8000/api/catalog`
	2. Check CORS settings in `backend/main.py`
	3. Verify `NEXT_PUBLIC_API_URL` in frontend `.env.local`

	#### Issue: "Map tiles not loading"

	Cause: Network issue or ad blocker.

	Solution:
	1. Check internet connection
	2. Disable ad blocker for localhost
	3. Alternative tile server in `MapViewer.tsx`:
	```typescript
	url="https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png"
	```

	### General Issues

	#### Issue: Port 8000 already in use

	Solution:
	```bash
	# Find process using port
	lsof -ti:8000

	# Kill process
	kill -9 $(lsof -ti:8000)

	# Or use different port
	uvicorn backend.main:app --port 8001
	```

	#### Issue: Out of memory errors

	Cause: Loading too many large datasets.

	Solution:
	1. Reduce dataset size (filter before loading)
	2. Increase system RAM
	3. Use query limits: `LIMIT 10000`

	---

	## Development Workflow

	### Code Changes

	Backend:
	- Python files auto-reload with `--reload` flag
	- Changes in `core/`, `services/`, `api/` take effect immediately

	Frontend:
	- Hot Module Replacement (HMR) enabled
	- Changes in `components/`, `app/` reload automatically

	### Adding New Datasets

	1. Add GeoJSON file to appropriate directory (e.g., `backend/data/osm/`)

	2. Update catalog.json:
	```json
	"my_new_dataset": {
	"path": "osm/my_new_dataset.geojson",
	"description": "Description for display",
	"semantic_description": "Detailed description for AI",
	"categories": ["infrastructure"],
	"tags": ["roads", "transport"]
	}
	```

	3. Regenerate embeddings:
	```bash
	rm backend/data/embeddings.npy
	python -c "from backend.core.semantic_search import get_semantic_search; get_semantic_search()"
	```

	4. Test: Query for the new dataset

	See [docs/backend/SCRIPTS.md](docs/backend/SCRIPTS.md) for data ingestion scripts.

	### Testing API Endpoints

	Using curl:
	```bash
	# Get catalog
	curl http://localhost:8000/api/catalog

	# Query chat endpoint
	curl -X POST http://localhost:8000/api/chat \
	-H "Content-Type: application/json" \
	-d '{"message": "Show me provinces", "history": []}'
	```

	Using Swagger UI:
	- Open http://localhost:8000/docs
	- Try endpoints interactively

	---

	## Environment Variables Reference

	\| Variable \| Required \| Default \| Description \|
	\|----------\|----------\|---------\|-------------\|
	\| `GEMINI_API_KEY` \| ✅ Yes \| - \| Google AI API key \|
	\| `PORT` \| ❌ No \| 8000 \| Backend server port \|
	\| `HOST` \| ❌ No \| 0.0.0.0 \| Backend host \|
	\| `LOG_LEVEL` \| ❌ No \| INFO \| Logging level (DEBUG, INFO, WARNING, ERROR) \|
	\| `DATABASE_PATH` \| ❌ No \| :memory: \| DuckDB database path (use for persistence) \|

	---

	## IDE Setup

	### VS Code

	Recommended Extensions:
	- Python (`ms-python.python`)
	- Pylance (`ms-python.vscode-pylance`)
	- ESLint (`dbaeumer.vscode-eslint`)
	- Prettier (`esbenp.prettier-vscode`)

	Settings (`.vscode/settings.json`):
	```json
	{
	"python.defaultInterpreterPath": "./backend/venv/bin/python",
	"python.linting.enabled": true,
	"python.formatting.provider": "black",
	"editor.formatOnSave": true,
	"[typescript]": {
	"editor.defaultFormatter": "esbenp.prettier-vscode"
	}
	}
	```

	### PyCharm

	1. Set Python Interpreter: Settings → Project → Python Interpreter → Add → Existing Environment → `backend/venv/bin/python`
	2. Enable FastAPI: Settings → Languages & Frameworks → FastAPI
	3. Configure Run: Run → Edit Configurations → Add → Python → Script path: `backend/main.py`

	---

	## Next Steps

	- ✅ Verify installation by running a test query
	- 📖 Read [ARCHITECTURE.md](../ARCHITECTURE.md) to understand the system
	- 🔧 Explore [docs/backend/CORE_SERVICES.md](docs/backend/CORE_SERVICES.md) for component details
	- 📊 Review [docs/data/DATASET_SOURCES.md](docs/data/DATASET_SOURCES.md) for available data