Spaces:

GerardCB
/

GeoQuery

Sleeping

File size: 10,166 Bytes
# GeoQuery Setup Guide

Complete guide for setting up the GeoQuery development environment.

---

## Prerequisites

### Required Software

| Requirement | Minimum Version | Purpose |
|------------|----------------|---------|
| **Python** | 3.11+ | Backend runtime |
| **Node.js** | 18+ | Frontend runtime |
| **npm** | 9+ | Package management |
| **Git** | 2.0+ | Version control |

### API Keys

- **Google AI API Key (Gemini)**: Required for LLM functionality
  - Get one free at: https://aistudio.google.com/app/apikey
  - Free tier: 15 requests/minute, 1500/day

### System Requirements

- **RAM**: 4GB minimum, 8GB recommended (for DuckDB in-memory database)
- **Disk**: 2GB for datasets
- **OS**: macOS, Linux, or Windows (WSL recommended)

---

## Installation

### 1. Clone Repository

```bash
git clone https://github.com/GerardCB/GeoQuery.git
cd GeoQuery
```

### 2. Backend Setup

#### Create Virtual Environment

```bash
cd backend
python3 -m venv venv
```

#### Activate Virtual Environment

**macOS/Linux**:
```bash
source venv/bin/activate
```

**Windows** (PowerShell):
```powershell
venv\Scripts\Activate.ps1
```

**Windows** (CMD):
```cmd
venv\Scripts\activate.bat
```

#### Install Dependencies

```bash
pip install --upgrade pip
pip install -e .
```

This installs the package in editable mode, including all dependencies from `setup.py`.

**Key Dependencies**:
- `fastapi` - Web framework
- `uvicorn` - ASGI server
- `duckdb` - Embedded database
- `geopandas` - Geospatial data processing
- `sentence-transformers` - Embeddings
- `google-generativeai` - Gemini SDK

#### Configure Environment Variables

Create `.env` file in `backend/` directory:

```bash
# Required
GEMINI_API_KEY=your-api-key-here

# Optional (defaults shown)
PORT=8000
HOST=0.0.0.0
LOG_LEVEL=INFO
```

**Alternative**: Export directly in terminal:

```bash
export GEMINI_API_KEY="your-api-key-here"
```

**Windows**:
```powershell
$env:GEMINI_API_KEY="your-api-key-here"
```

#### Verify Backend Installation

```bash
python -c "import backend; print('Backend installed successfully')"
```

### 3. Frontend Setup

```bash
cd ../frontend  # From backend directory
npm install
```

**Key Dependencies**:
- `next` - React framework
- `react` - UI library
- `leaflet` - Map library
- `react-leaflet` - React bindings for Leaflet
- `@dnd-kit/core` - Drag and drop

#### Configure Frontend (Optional)

Edit `frontend/.env.local` if backend is not on default port:

```bash
NEXT_PUBLIC_API_URL=http://localhost:8000
```

---

## Running Locally

### Start Backend

From `backend/` directory with venv activated:

```bash
uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000
```

**Flags**:
- `--reload`: Auto-restart on code changes
- `--host 0.0.0.0`: Allow external connections
- `--port 8000`: Port number

**Expected Output**:
```
INFO:     Uvicorn running on http://0.0.0.0:8000
INFO:     Application startup complete.
```

**Verify**:
- Open http://localhost:8000/docs  → Should show FastAPI Swagger UI
- Check http://localhost:8000/api/catalog → Should return GeoJSON catalog

### Start Frontend

From `frontend/` directory:

```bash
npm run dev
```

**Expected Output**:
```
▲ Next.js 15.1.3
- Local:        http://localhost:3000
- Ready in 2.1s
```

**Verify**:
- Open http://localhost:3000 → Should show GeoQuery chat interface

---

## Database Setup

### DuckDB Initialization

**Automatic**: Database is created in-memory on first query.

**Manual Test**:

```python
from backend.core.geo_engine import get_geo_engine

engine = get_geo_engine()
print(f"Loaded tables: {list(engine.loaded_tables.keys())}")
```

### Load Initial Datasets

Datasets are loaded lazily (on-demand). To pre-load common datasets:

```python
from backend.core.geo_engine import get_geo_engine

engine = get_geo_engine()
engine.ensure_table_loaded("pan_admin1")  # Provinces
engine.ensure_table_loaded("panama_healthsites_geojson")  # Hospitals
```

### Generate Embeddings

Required for semantic search:

```bash
cd backend
python -c "from backend.core.semantic_search import get_semantic_search; get_semantic_search()"
```

This generates `backend/data/embeddings.npy` (cached for future use).

---

## Directory Structure After Setup

```
GeoQuery/
├── backend/
│   ├── venv/                   # Virtual environment (created)
│   ├── .env                    # Environment variables (created)
│   ├── data/
│   │   ├── embeddings.npy      # Generated embeddings (created)
│   │   ├── catalog.json        # Dataset registry (existing)
│   │   └── osm/                # GeoJSON datasets (existing)
│   └── <source files>
├── frontend/
│   ├── node_modules/           # npm packages (created)
│   ├── .next/                  # Build output (created)
│   └── <source files>
└── <other files>
```

---

## Common Issues & Troubleshooting

### Backend Issues

#### Issue: "ModuleNotFoundError: No module named 'backend'"

**Cause**: Virtual environment not activated or package not installed.

**Solution**:
```bash
source venv/bin/activate  # Activate venv
pip install -e .          # Install package
```

#### Issue: "duckdb.IOException: No files found that match the pattern"

**Cause**: GeoJSON file missing or incorrect path in catalog.json.

**Solution**:
1. Check file exists: `ls backend/data/osm/hospitals.geojson`
2. Verify path in `catalog.json`
3. Download missing data: `python backend/scripts/download_geofabrik.py`

#### Issue: "google.api_core.exceptions.PermissionDenied: API key not valid"

**Cause**: Invalid or missing GEMINI_API_KEY.

**Solution**:
```bash
export GEMINI_API_KEY="your-actual-api-key"
# Restart backend
```

#### Issue: "Module 'sentence_transformers' has no attribute 'SentenceTransformer'"

**Cause**: Corrupted installation.

**Solution**:
```bash
pip uninstall sentence-transformers
pip install sentence-transformers --no-cache-dir
```

### Frontend Issues

#### Issue: "Error: Cannot find module 'next'"

**Cause**: npm packages not installed.

**Solution**:
```bash
cd frontend
rm -rf node_modules package-lock.json
npm install
```

#### Issue: "Failed to fetch from localhost:8000"

**Cause**: Backend not running or CORS issue.

**Solution**:
1. Verify backend is running: `curl http://localhost:8000/api/catalog`
2. Check CORS settings in `backend/main.py`
3. Verify `NEXT_PUBLIC_API_URL` in frontend `.env.local`

#### Issue: "Map tiles not loading"

**Cause**: Network issue or ad blocker.

**Solution**:
1. Check internet connection
2. Disable ad blocker for localhost
3. Alternative tile server in `MapViewer.tsx`:
   ```typescript
   url="https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png"
   ```

### General Issues

#### Issue: Port 8000 already in use

**Solution**:
```bash
# Find process using port
lsof -ti:8000

# Kill process
kill -9 $(lsof -ti:8000)

# Or use different port
uvicorn backend.main:app --port 8001
```

#### Issue: Out of memory errors

**Cause**: Loading too many large datasets.

**Solution**:
1. Reduce dataset size (filter before loading)
2. Increase system RAM
3. Use query limits: `LIMIT 10000`

---

## Development Workflow

### Code Changes

**Backend**:
- Python files auto-reload with `--reload` flag
- Changes in `core/`, `services/`, `api/` take effect immediately

**Frontend**:
- Hot Module Replacement (HMR) enabled
- Changes in `components/`, `app/` reload automatically

### Adding New Datasets

1. **Add GeoJSON file** to appropriate directory (e.g., `backend/data/osm/`)

2. **Update catalog.json**:
   ```json
   "my_new_dataset": {
     "path": "osm/my_new_dataset.geojson",
     "description": "Description for display",
     "semantic_description": "Detailed description for AI",
     "categories": ["infrastructure"],
     "tags": ["roads", "transport"]
   }
   ```

3. **Regenerate embeddings**:
   ```bash
   rm backend/data/embeddings.npy
   python -c "from backend.core.semantic_search import get_semantic_search; get_semantic_search()"
   ```

4. **Test**: Query for the new dataset

See [docs/backend/SCRIPTS.md](docs/backend/SCRIPTS.md) for data ingestion scripts.

### Testing API Endpoints

**Using curl**:
```bash
# Get catalog
curl http://localhost:8000/api/catalog

# Query chat endpoint
curl -X POST http://localhost:8000/api/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "Show me provinces", "history": []}'
```

**Using Swagger UI**:
- Open http://localhost:8000/docs
- Try endpoints interactively

---

## Environment Variables Reference

| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `GEMINI_API_KEY` | ✅ Yes | - | Google AI API key |
| `PORT` | ❌ No | 8000 | Backend server port |
| `HOST` | ❌ No | 0.0.0.0 | Backend host |
| `LOG_LEVEL` | ❌ No | INFO | Logging level (DEBUG, INFO, WARNING, ERROR) |
| `DATABASE_PATH` | ❌ No | :memory: | DuckDB database path (use for persistence) |

---

## IDE Setup

### VS Code

**Recommended Extensions**:
- Python (`ms-python.python`)
- Pylance (`ms-python.vscode-pylance`)
- ESLint (`dbaeumer.vscode-eslint`)
- Prettier (`esbenp.prettier-vscode`)

**Settings** (`.vscode/settings.json`):
```json
{
  "python.defaultInterpreterPath": "./backend/venv/bin/python",
  "python.linting.enabled": true,
  "python.formatting.provider": "black",
  "editor.formatOnSave": true,
  "[typescript]": {
    "editor.defaultFormatter": "esbenp.prettier-vscode"
  }
}
```

### PyCharm

1. **Set Python Interpreter**: Settings → Project → Python Interpreter → Add → Existing Environment → `backend/venv/bin/python`
2. **Enable FastAPI**: Settings → Languages & Frameworks → FastAPI
3. **Configure Run**: Run → Edit Configurations → Add → Python → Script path: `backend/main.py`

---

## Next Steps

- ✅ **Verify installation** by running a test query
- 📖 **Read [ARCHITECTURE.md](../ARCHITECTURE.md)** to understand the system
- 🔧 **Explore [docs/backend/CORE_SERVICES.md](docs/backend/CORE_SERVICES.md)** for component details
- 📊 **Review [docs/data/DATASET_SOURCES.md](docs/data/DATASET_SOURCES.md)** for available data