File size: 10,166 Bytes
4851501
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
# GeoQuery Setup Guide

Complete guide for setting up the GeoQuery development environment.

---

## Prerequisites

### Required Software

| Requirement | Minimum Version | Purpose |
|------------|----------------|---------|
| **Python** | 3.11+ | Backend runtime |
| **Node.js** | 18+ | Frontend runtime |
| **npm** | 9+ | Package management |
| **Git** | 2.0+ | Version control |

### API Keys

- **Google AI API Key (Gemini)**: Required for LLM functionality
  - Get one free at: https://aistudio.google.com/app/apikey
  - Free tier: 15 requests/minute, 1500/day

### System Requirements

- **RAM**: 4GB minimum, 8GB recommended (for DuckDB in-memory database)
- **Disk**: 2GB for datasets
- **OS**: macOS, Linux, or Windows (WSL recommended)

---

## Installation

### 1. Clone Repository

```bash
git clone https://github.com/GerardCB/GeoQuery.git
cd GeoQuery
```

### 2. Backend Setup

#### Create Virtual Environment

```bash
cd backend
python3 -m venv venv
```

#### Activate Virtual Environment

**macOS/Linux**:
```bash
source venv/bin/activate
```

**Windows** (PowerShell):
```powershell
venv\Scripts\Activate.ps1
```

**Windows** (CMD):
```cmd
venv\Scripts\activate.bat
```

#### Install Dependencies

```bash
pip install --upgrade pip
pip install -e .
```

This installs the package in editable mode, including all dependencies from `setup.py`.

**Key Dependencies**:
- `fastapi` - Web framework
- `uvicorn` - ASGI server
- `duckdb` - Embedded database
- `geopandas` - Geospatial data processing
- `sentence-transformers` - Embeddings
- `google-generativeai` - Gemini SDK

#### Configure Environment Variables

Create `.env` file in `backend/` directory:

```bash
# Required
GEMINI_API_KEY=your-api-key-here

# Optional (defaults shown)
PORT=8000
HOST=0.0.0.0
LOG_LEVEL=INFO
```

**Alternative**: Export directly in terminal:

```bash
export GEMINI_API_KEY="your-api-key-here"
```

**Windows**:
```powershell
$env:GEMINI_API_KEY="your-api-key-here"
```

#### Verify Backend Installation

```bash
python -c "import backend; print('Backend installed successfully')"
```

### 3. Frontend Setup

```bash
cd ../frontend  # From backend directory
npm install
```

**Key Dependencies**:
- `next` - React framework
- `react` - UI library
- `leaflet` - Map library
- `react-leaflet` - React bindings for Leaflet
- `@dnd-kit/core` - Drag and drop

#### Configure Frontend (Optional)

Edit `frontend/.env.local` if backend is not on default port:

```bash
NEXT_PUBLIC_API_URL=http://localhost:8000
```

---

## Running Locally

### Start Backend

From `backend/` directory with venv activated:

```bash
uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000
```

**Flags**:
- `--reload`: Auto-restart on code changes
- `--host 0.0.0.0`: Allow external connections
- `--port 8000`: Port number

**Expected Output**:
```
INFO:     Uvicorn running on http://0.0.0.0:8000
INFO:     Application startup complete.
```

**Verify**:
- Open http://localhost:8000/docs  β†’ Should show FastAPI Swagger UI
- Check http://localhost:8000/api/catalog β†’ Should return GeoJSON catalog

### Start Frontend

From `frontend/` directory:

```bash
npm run dev
```

**Expected Output**:
```
β–² Next.js 15.1.3
- Local:        http://localhost:3000
- Ready in 2.1s
```

**Verify**:
- Open http://localhost:3000 β†’ Should show GeoQuery chat interface

---

## Database Setup

### DuckDB Initialization

**Automatic**: Database is created in-memory on first query.

**Manual Test**:

```python
from backend.core.geo_engine import get_geo_engine

engine = get_geo_engine()
print(f"Loaded tables: {list(engine.loaded_tables.keys())}")
```

### Load Initial Datasets

Datasets are loaded lazily (on-demand). To pre-load common datasets:

```python
from backend.core.geo_engine import get_geo_engine

engine = get_geo_engine()
engine.ensure_table_loaded("pan_admin1")  # Provinces
engine.ensure_table_loaded("panama_healthsites_geojson")  # Hospitals
```

### Generate Embeddings

Required for semantic search:

```bash
cd backend
python -c "from backend.core.semantic_search import get_semantic_search; get_semantic_search()"
```

This generates `backend/data/embeddings.npy` (cached for future use).

---

## Directory Structure After Setup

```
GeoQuery/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ venv/                   # Virtual environment (created)
β”‚   β”œβ”€β”€ .env                    # Environment variables (created)
β”‚   β”œβ”€β”€ data/
β”‚   β”‚   β”œβ”€β”€ embeddings.npy      # Generated embeddings (created)
β”‚   β”‚   β”œβ”€β”€ catalog.json        # Dataset registry (existing)
β”‚   β”‚   └── osm/                # GeoJSON datasets (existing)
β”‚   └── <source files>
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ node_modules/           # npm packages (created)
β”‚   β”œβ”€β”€ .next/                  # Build output (created)
β”‚   └── <source files>
└── <other files>
```

---

## Common Issues & Troubleshooting

### Backend Issues

#### Issue: "ModuleNotFoundError: No module named 'backend'"

**Cause**: Virtual environment not activated or package not installed.

**Solution**:
```bash
source venv/bin/activate  # Activate venv
pip install -e .          # Install package
```

#### Issue: "duckdb.IOException: No files found that match the pattern"

**Cause**: GeoJSON file missing or incorrect path in catalog.json.

**Solution**:
1. Check file exists: `ls backend/data/osm/hospitals.geojson`
2. Verify path in `catalog.json`
3. Download missing data: `python backend/scripts/download_geofabrik.py`

#### Issue: "google.api_core.exceptions.PermissionDenied: API key not valid"

**Cause**: Invalid or missing GEMINI_API_KEY.

**Solution**:
```bash
export GEMINI_API_KEY="your-actual-api-key"
# Restart backend
```

#### Issue: "Module 'sentence_transformers' has no attribute 'SentenceTransformer'"

**Cause**: Corrupted installation.

**Solution**:
```bash
pip uninstall sentence-transformers
pip install sentence-transformers --no-cache-dir
```

### Frontend Issues

#### Issue: "Error: Cannot find module 'next'"

**Cause**: npm packages not installed.

**Solution**:
```bash
cd frontend
rm -rf node_modules package-lock.json
npm install
```

#### Issue: "Failed to fetch from localhost:8000"

**Cause**: Backend not running or CORS issue.

**Solution**:
1. Verify backend is running: `curl http://localhost:8000/api/catalog`
2. Check CORS settings in `backend/main.py`
3. Verify `NEXT_PUBLIC_API_URL` in frontend `.env.local`

#### Issue: "Map tiles not loading"

**Cause**: Network issue or ad blocker.

**Solution**:
1. Check internet connection
2. Disable ad blocker for localhost
3. Alternative tile server in `MapViewer.tsx`:
   ```typescript
   url="https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png"
   ```

### General Issues

#### Issue: Port 8000 already in use

**Solution**:
```bash
# Find process using port
lsof -ti:8000

# Kill process
kill -9 $(lsof -ti:8000)

# Or use different port
uvicorn backend.main:app --port 8001
```

#### Issue: Out of memory errors

**Cause**: Loading too many large datasets.

**Solution**:
1. Reduce dataset size (filter before loading)
2. Increase system RAM
3. Use query limits: `LIMIT 10000`

---

## Development Workflow

### Code Changes

**Backend**:
- Python files auto-reload with `--reload` flag
- Changes in `core/`, `services/`, `api/` take effect immediately

**Frontend**:
- Hot Module Replacement (HMR) enabled
- Changes in `components/`, `app/` reload automatically

### Adding New Datasets

1. **Add GeoJSON file** to appropriate directory (e.g., `backend/data/osm/`)

2. **Update catalog.json**:
   ```json
   "my_new_dataset": {
     "path": "osm/my_new_dataset.geojson",
     "description": "Description for display",
     "semantic_description": "Detailed description for AI",
     "categories": ["infrastructure"],
     "tags": ["roads", "transport"]
   }
   ```

3. **Regenerate embeddings**:
   ```bash
   rm backend/data/embeddings.npy
   python -c "from backend.core.semantic_search import get_semantic_search; get_semantic_search()"
   ```

4. **Test**: Query for the new dataset

See [docs/backend/SCRIPTS.md](docs/backend/SCRIPTS.md) for data ingestion scripts.

### Testing API Endpoints

**Using curl**:
```bash
# Get catalog
curl http://localhost:8000/api/catalog

# Query chat endpoint
curl -X POST http://localhost:8000/api/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "Show me provinces", "history": []}'
```

**Using Swagger UI**:
- Open http://localhost:8000/docs
- Try endpoints interactively

---

## Environment Variables Reference

| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `GEMINI_API_KEY` | βœ… Yes | - | Google AI API key |
| `PORT` | ❌ No | 8000 | Backend server port |
| `HOST` | ❌ No | 0.0.0.0 | Backend host |
| `LOG_LEVEL` | ❌ No | INFO | Logging level (DEBUG, INFO, WARNING, ERROR) |
| `DATABASE_PATH` | ❌ No | :memory: | DuckDB database path (use for persistence) |

---

## IDE Setup

### VS Code

**Recommended Extensions**:
- Python (`ms-python.python`)
- Pylance (`ms-python.vscode-pylance`)
- ESLint (`dbaeumer.vscode-eslint`)
- Prettier (`esbenp.prettier-vscode`)

**Settings** (`.vscode/settings.json`):
```json
{
  "python.defaultInterpreterPath": "./backend/venv/bin/python",
  "python.linting.enabled": true,
  "python.formatting.provider": "black",
  "editor.formatOnSave": true,
  "[typescript]": {
    "editor.defaultFormatter": "esbenp.prettier-vscode"
  }
}
```

### PyCharm

1. **Set Python Interpreter**: Settings β†’ Project β†’ Python Interpreter β†’ Add β†’ Existing Environment β†’ `backend/venv/bin/python`
2. **Enable FastAPI**: Settings β†’ Languages & Frameworks β†’ FastAPI
3. **Configure Run**: Run β†’ Edit Configurations β†’ Add β†’ Python β†’ Script path: `backend/main.py`

---

## Next Steps

- βœ… **Verify installation** by running a test query
- πŸ“– **Read [ARCHITECTURE.md](../ARCHITECTURE.md)** to understand the system
- πŸ”§ **Explore [docs/backend/CORE_SERVICES.md](docs/backend/CORE_SERVICES.md)** for component details
- πŸ“Š **Review [docs/data/DATASET_SOURCES.md](docs/data/DATASET_SOURCES.md)** for available data