GeoQuery / SETUP.md
GerardCB's picture
Deploy to Spaces (Final Clean)
4851501
# GeoQuery Setup Guide
Complete guide for setting up the GeoQuery development environment.
---
## Prerequisites
### Required Software
| Requirement | Minimum Version | Purpose |
|------------|----------------|---------|
| **Python** | 3.11+ | Backend runtime |
| **Node.js** | 18+ | Frontend runtime |
| **npm** | 9+ | Package management |
| **Git** | 2.0+ | Version control |
### API Keys
- **Google AI API Key (Gemini)**: Required for LLM functionality
- Get one free at: https://aistudio.google.com/app/apikey
- Free tier: 15 requests/minute, 1500/day
### System Requirements
- **RAM**: 4GB minimum, 8GB recommended (for DuckDB in-memory database)
- **Disk**: 2GB for datasets
- **OS**: macOS, Linux, or Windows (WSL recommended)
---
## Installation
### 1. Clone Repository
```bash
git clone https://github.com/GerardCB/GeoQuery.git
cd GeoQuery
```
### 2. Backend Setup
#### Create Virtual Environment
```bash
cd backend
python3 -m venv venv
```
#### Activate Virtual Environment
**macOS/Linux**:
```bash
source venv/bin/activate
```
**Windows** (PowerShell):
```powershell
venv\Scripts\Activate.ps1
```
**Windows** (CMD):
```cmd
venv\Scripts\activate.bat
```
#### Install Dependencies
```bash
pip install --upgrade pip
pip install -e .
```
This installs the package in editable mode, including all dependencies from `setup.py`.
**Key Dependencies**:
- `fastapi` - Web framework
- `uvicorn` - ASGI server
- `duckdb` - Embedded database
- `geopandas` - Geospatial data processing
- `sentence-transformers` - Embeddings
- `google-generativeai` - Gemini SDK
#### Configure Environment Variables
Create `.env` file in `backend/` directory:
```bash
# Required
GEMINI_API_KEY=your-api-key-here
# Optional (defaults shown)
PORT=8000
HOST=0.0.0.0
LOG_LEVEL=INFO
```
**Alternative**: Export directly in terminal:
```bash
export GEMINI_API_KEY="your-api-key-here"
```
**Windows**:
```powershell
$env:GEMINI_API_KEY="your-api-key-here"
```
#### Verify Backend Installation
```bash
python -c "import backend; print('Backend installed successfully')"
```
### 3. Frontend Setup
```bash
cd ../frontend # From backend directory
npm install
```
**Key Dependencies**:
- `next` - React framework
- `react` - UI library
- `leaflet` - Map library
- `react-leaflet` - React bindings for Leaflet
- `@dnd-kit/core` - Drag and drop
#### Configure Frontend (Optional)
Edit `frontend/.env.local` if backend is not on default port:
```bash
NEXT_PUBLIC_API_URL=http://localhost:8000
```
---
## Running Locally
### Start Backend
From `backend/` directory with venv activated:
```bash
uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000
```
**Flags**:
- `--reload`: Auto-restart on code changes
- `--host 0.0.0.0`: Allow external connections
- `--port 8000`: Port number
**Expected Output**:
```
INFO: Uvicorn running on http://0.0.0.0:8000
INFO: Application startup complete.
```
**Verify**:
- Open http://localhost:8000/docs β†’ Should show FastAPI Swagger UI
- Check http://localhost:8000/api/catalog β†’ Should return GeoJSON catalog
### Start Frontend
From `frontend/` directory:
```bash
npm run dev
```
**Expected Output**:
```
β–² Next.js 15.1.3
- Local: http://localhost:3000
- Ready in 2.1s
```
**Verify**:
- Open http://localhost:3000 β†’ Should show GeoQuery chat interface
---
## Database Setup
### DuckDB Initialization
**Automatic**: Database is created in-memory on first query.
**Manual Test**:
```python
from backend.core.geo_engine import get_geo_engine
engine = get_geo_engine()
print(f"Loaded tables: {list(engine.loaded_tables.keys())}")
```
### Load Initial Datasets
Datasets are loaded lazily (on-demand). To pre-load common datasets:
```python
from backend.core.geo_engine import get_geo_engine
engine = get_geo_engine()
engine.ensure_table_loaded("pan_admin1") # Provinces
engine.ensure_table_loaded("panama_healthsites_geojson") # Hospitals
```
### Generate Embeddings
Required for semantic search:
```bash
cd backend
python -c "from backend.core.semantic_search import get_semantic_search; get_semantic_search()"
```
This generates `backend/data/embeddings.npy` (cached for future use).
---
## Directory Structure After Setup
```
GeoQuery/
β”œβ”€β”€ backend/
β”‚ β”œβ”€β”€ venv/ # Virtual environment (created)
β”‚ β”œβ”€β”€ .env # Environment variables (created)
β”‚ β”œβ”€β”€ data/
β”‚ β”‚ β”œβ”€β”€ embeddings.npy # Generated embeddings (created)
β”‚ β”‚ β”œβ”€β”€ catalog.json # Dataset registry (existing)
β”‚ β”‚ └── osm/ # GeoJSON datasets (existing)
β”‚ └── <source files>
β”œβ”€β”€ frontend/
β”‚ β”œβ”€β”€ node_modules/ # npm packages (created)
β”‚ β”œβ”€β”€ .next/ # Build output (created)
β”‚ └── <source files>
└── <other files>
```
---
## Common Issues & Troubleshooting
### Backend Issues
#### Issue: "ModuleNotFoundError: No module named 'backend'"
**Cause**: Virtual environment not activated or package not installed.
**Solution**:
```bash
source venv/bin/activate # Activate venv
pip install -e . # Install package
```
#### Issue: "duckdb.IOException: No files found that match the pattern"
**Cause**: GeoJSON file missing or incorrect path in catalog.json.
**Solution**:
1. Check file exists: `ls backend/data/osm/hospitals.geojson`
2. Verify path in `catalog.json`
3. Download missing data: `python backend/scripts/download_geofabrik.py`
#### Issue: "google.api_core.exceptions.PermissionDenied: API key not valid"
**Cause**: Invalid or missing GEMINI_API_KEY.
**Solution**:
```bash
export GEMINI_API_KEY="your-actual-api-key"
# Restart backend
```
#### Issue: "Module 'sentence_transformers' has no attribute 'SentenceTransformer'"
**Cause**: Corrupted installation.
**Solution**:
```bash
pip uninstall sentence-transformers
pip install sentence-transformers --no-cache-dir
```
### Frontend Issues
#### Issue: "Error: Cannot find module 'next'"
**Cause**: npm packages not installed.
**Solution**:
```bash
cd frontend
rm -rf node_modules package-lock.json
npm install
```
#### Issue: "Failed to fetch from localhost:8000"
**Cause**: Backend not running or CORS issue.
**Solution**:
1. Verify backend is running: `curl http://localhost:8000/api/catalog`
2. Check CORS settings in `backend/main.py`
3. Verify `NEXT_PUBLIC_API_URL` in frontend `.env.local`
#### Issue: "Map tiles not loading"
**Cause**: Network issue or ad blocker.
**Solution**:
1. Check internet connection
2. Disable ad blocker for localhost
3. Alternative tile server in `MapViewer.tsx`:
```typescript
url="https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png"
```
### General Issues
#### Issue: Port 8000 already in use
**Solution**:
```bash
# Find process using port
lsof -ti:8000
# Kill process
kill -9 $(lsof -ti:8000)
# Or use different port
uvicorn backend.main:app --port 8001
```
#### Issue: Out of memory errors
**Cause**: Loading too many large datasets.
**Solution**:
1. Reduce dataset size (filter before loading)
2. Increase system RAM
3. Use query limits: `LIMIT 10000`
---
## Development Workflow
### Code Changes
**Backend**:
- Python files auto-reload with `--reload` flag
- Changes in `core/`, `services/`, `api/` take effect immediately
**Frontend**:
- Hot Module Replacement (HMR) enabled
- Changes in `components/`, `app/` reload automatically
### Adding New Datasets
1. **Add GeoJSON file** to appropriate directory (e.g., `backend/data/osm/`)
2. **Update catalog.json**:
```json
"my_new_dataset": {
"path": "osm/my_new_dataset.geojson",
"description": "Description for display",
"semantic_description": "Detailed description for AI",
"categories": ["infrastructure"],
"tags": ["roads", "transport"]
}
```
3. **Regenerate embeddings**:
```bash
rm backend/data/embeddings.npy
python -c "from backend.core.semantic_search import get_semantic_search; get_semantic_search()"
```
4. **Test**: Query for the new dataset
See [docs/backend/SCRIPTS.md](docs/backend/SCRIPTS.md) for data ingestion scripts.
### Testing API Endpoints
**Using curl**:
```bash
# Get catalog
curl http://localhost:8000/api/catalog
# Query chat endpoint
curl -X POST http://localhost:8000/api/chat \
-H "Content-Type: application/json" \
-d '{"message": "Show me provinces", "history": []}'
```
**Using Swagger UI**:
- Open http://localhost:8000/docs
- Try endpoints interactively
---
## Environment Variables Reference
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `GEMINI_API_KEY` | βœ… Yes | - | Google AI API key |
| `PORT` | ❌ No | 8000 | Backend server port |
| `HOST` | ❌ No | 0.0.0.0 | Backend host |
| `LOG_LEVEL` | ❌ No | INFO | Logging level (DEBUG, INFO, WARNING, ERROR) |
| `DATABASE_PATH` | ❌ No | :memory: | DuckDB database path (use for persistence) |
---
## IDE Setup
### VS Code
**Recommended Extensions**:
- Python (`ms-python.python`)
- Pylance (`ms-python.vscode-pylance`)
- ESLint (`dbaeumer.vscode-eslint`)
- Prettier (`esbenp.prettier-vscode`)
**Settings** (`.vscode/settings.json`):
```json
{
"python.defaultInterpreterPath": "./backend/venv/bin/python",
"python.linting.enabled": true,
"python.formatting.provider": "black",
"editor.formatOnSave": true,
"[typescript]": {
"editor.defaultFormatter": "esbenp.prettier-vscode"
}
}
```
### PyCharm
1. **Set Python Interpreter**: Settings β†’ Project β†’ Python Interpreter β†’ Add β†’ Existing Environment β†’ `backend/venv/bin/python`
2. **Enable FastAPI**: Settings β†’ Languages & Frameworks β†’ FastAPI
3. **Configure Run**: Run β†’ Edit Configurations β†’ Add β†’ Python β†’ Script path: `backend/main.py`
---
## Next Steps
- βœ… **Verify installation** by running a test query
- πŸ“– **Read [ARCHITECTURE.md](../ARCHITECTURE.md)** to understand the system
- πŸ”§ **Explore [docs/backend/CORE_SERVICES.md](docs/backend/CORE_SERVICES.md)** for component details
- πŸ“Š **Review [docs/data/DATASET_SOURCES.md](docs/data/DATASET_SOURCES.md)** for available data