| # GeoQuery Setup Guide | |
| Complete guide for setting up the GeoQuery development environment. | |
| --- | |
| ## Prerequisites | |
| ### Required Software | |
| | Requirement | Minimum Version | Purpose | | |
| |------------|----------------|---------| | |
| | **Python** | 3.11+ | Backend runtime | | |
| | **Node.js** | 18+ | Frontend runtime | | |
| | **npm** | 9+ | Package management | | |
| | **Git** | 2.0+ | Version control | | |
| ### API Keys | |
| - **Google AI API Key (Gemini)**: Required for LLM functionality | |
| - Get one free at: https://aistudio.google.com/app/apikey | |
| - Free tier: 15 requests/minute, 1500/day | |
| ### System Requirements | |
| - **RAM**: 4GB minimum, 8GB recommended (for DuckDB in-memory database) | |
| - **Disk**: 2GB for datasets | |
| - **OS**: macOS, Linux, or Windows (WSL recommended) | |
| --- | |
| ## Installation | |
| ### 1. Clone Repository | |
| ```bash | |
| git clone https://github.com/GerardCB/GeoQuery.git | |
| cd GeoQuery | |
| ``` | |
| ### 2. Backend Setup | |
| #### Create Virtual Environment | |
| ```bash | |
| cd backend | |
| python3 -m venv venv | |
| ``` | |
| #### Activate Virtual Environment | |
| **macOS/Linux**: | |
| ```bash | |
| source venv/bin/activate | |
| ``` | |
| **Windows** (PowerShell): | |
| ```powershell | |
| venv\Scripts\Activate.ps1 | |
| ``` | |
| **Windows** (CMD): | |
| ```cmd | |
| venv\Scripts\activate.bat | |
| ``` | |
| #### Install Dependencies | |
| ```bash | |
| pip install --upgrade pip | |
| pip install -e . | |
| ``` | |
| This installs the package in editable mode, including all dependencies from `setup.py`. | |
| **Key Dependencies**: | |
| - `fastapi` - Web framework | |
| - `uvicorn` - ASGI server | |
| - `duckdb` - Embedded database | |
| - `geopandas` - Geospatial data processing | |
| - `sentence-transformers` - Embeddings | |
| - `google-generativeai` - Gemini SDK | |
| #### Configure Environment Variables | |
| Create `.env` file in `backend/` directory: | |
| ```bash | |
| # Required | |
| GEMINI_API_KEY=your-api-key-here | |
| # Optional (defaults shown) | |
| PORT=8000 | |
| HOST=0.0.0.0 | |
| LOG_LEVEL=INFO | |
| ``` | |
| **Alternative**: Export directly in terminal: | |
| ```bash | |
| export GEMINI_API_KEY="your-api-key-here" | |
| ``` | |
| **Windows**: | |
| ```powershell | |
| $env:GEMINI_API_KEY="your-api-key-here" | |
| ``` | |
| #### Verify Backend Installation | |
| ```bash | |
| python -c "import backend; print('Backend installed successfully')" | |
| ``` | |
| ### 3. Frontend Setup | |
| ```bash | |
| cd ../frontend # From backend directory | |
| npm install | |
| ``` | |
| **Key Dependencies**: | |
| - `next` - React framework | |
| - `react` - UI library | |
| - `leaflet` - Map library | |
| - `react-leaflet` - React bindings for Leaflet | |
| - `@dnd-kit/core` - Drag and drop | |
| #### Configure Frontend (Optional) | |
| Edit `frontend/.env.local` if backend is not on default port: | |
| ```bash | |
| NEXT_PUBLIC_API_URL=http://localhost:8000 | |
| ``` | |
| --- | |
| ## Running Locally | |
| ### Start Backend | |
| From `backend/` directory with venv activated: | |
| ```bash | |
| uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000 | |
| ``` | |
| **Flags**: | |
| - `--reload`: Auto-restart on code changes | |
| - `--host 0.0.0.0`: Allow external connections | |
| - `--port 8000`: Port number | |
| **Expected Output**: | |
| ``` | |
| INFO: Uvicorn running on http://0.0.0.0:8000 | |
| INFO: Application startup complete. | |
| ``` | |
| **Verify**: | |
| - Open http://localhost:8000/docs β Should show FastAPI Swagger UI | |
| - Check http://localhost:8000/api/catalog β Should return GeoJSON catalog | |
| ### Start Frontend | |
| From `frontend/` directory: | |
| ```bash | |
| npm run dev | |
| ``` | |
| **Expected Output**: | |
| ``` | |
| β² Next.js 15.1.3 | |
| - Local: http://localhost:3000 | |
| - Ready in 2.1s | |
| ``` | |
| **Verify**: | |
| - Open http://localhost:3000 β Should show GeoQuery chat interface | |
| --- | |
| ## Database Setup | |
| ### DuckDB Initialization | |
| **Automatic**: Database is created in-memory on first query. | |
| **Manual Test**: | |
| ```python | |
| from backend.core.geo_engine import get_geo_engine | |
| engine = get_geo_engine() | |
| print(f"Loaded tables: {list(engine.loaded_tables.keys())}") | |
| ``` | |
| ### Load Initial Datasets | |
| Datasets are loaded lazily (on-demand). To pre-load common datasets: | |
| ```python | |
| from backend.core.geo_engine import get_geo_engine | |
| engine = get_geo_engine() | |
| engine.ensure_table_loaded("pan_admin1") # Provinces | |
| engine.ensure_table_loaded("panama_healthsites_geojson") # Hospitals | |
| ``` | |
| ### Generate Embeddings | |
| Required for semantic search: | |
| ```bash | |
| cd backend | |
| python -c "from backend.core.semantic_search import get_semantic_search; get_semantic_search()" | |
| ``` | |
| This generates `backend/data/embeddings.npy` (cached for future use). | |
| --- | |
| ## Directory Structure After Setup | |
| ``` | |
| GeoQuery/ | |
| βββ backend/ | |
| β βββ venv/ # Virtual environment (created) | |
| β βββ .env # Environment variables (created) | |
| β βββ data/ | |
| β β βββ embeddings.npy # Generated embeddings (created) | |
| β β βββ catalog.json # Dataset registry (existing) | |
| β β βββ osm/ # GeoJSON datasets (existing) | |
| β βββ <source files> | |
| βββ frontend/ | |
| β βββ node_modules/ # npm packages (created) | |
| β βββ .next/ # Build output (created) | |
| β βββ <source files> | |
| βββ <other files> | |
| ``` | |
| --- | |
| ## Common Issues & Troubleshooting | |
| ### Backend Issues | |
| #### Issue: "ModuleNotFoundError: No module named 'backend'" | |
| **Cause**: Virtual environment not activated or package not installed. | |
| **Solution**: | |
| ```bash | |
| source venv/bin/activate # Activate venv | |
| pip install -e . # Install package | |
| ``` | |
| #### Issue: "duckdb.IOException: No files found that match the pattern" | |
| **Cause**: GeoJSON file missing or incorrect path in catalog.json. | |
| **Solution**: | |
| 1. Check file exists: `ls backend/data/osm/hospitals.geojson` | |
| 2. Verify path in `catalog.json` | |
| 3. Download missing data: `python backend/scripts/download_geofabrik.py` | |
| #### Issue: "google.api_core.exceptions.PermissionDenied: API key not valid" | |
| **Cause**: Invalid or missing GEMINI_API_KEY. | |
| **Solution**: | |
| ```bash | |
| export GEMINI_API_KEY="your-actual-api-key" | |
| # Restart backend | |
| ``` | |
| #### Issue: "Module 'sentence_transformers' has no attribute 'SentenceTransformer'" | |
| **Cause**: Corrupted installation. | |
| **Solution**: | |
| ```bash | |
| pip uninstall sentence-transformers | |
| pip install sentence-transformers --no-cache-dir | |
| ``` | |
| ### Frontend Issues | |
| #### Issue: "Error: Cannot find module 'next'" | |
| **Cause**: npm packages not installed. | |
| **Solution**: | |
| ```bash | |
| cd frontend | |
| rm -rf node_modules package-lock.json | |
| npm install | |
| ``` | |
| #### Issue: "Failed to fetch from localhost:8000" | |
| **Cause**: Backend not running or CORS issue. | |
| **Solution**: | |
| 1. Verify backend is running: `curl http://localhost:8000/api/catalog` | |
| 2. Check CORS settings in `backend/main.py` | |
| 3. Verify `NEXT_PUBLIC_API_URL` in frontend `.env.local` | |
| #### Issue: "Map tiles not loading" | |
| **Cause**: Network issue or ad blocker. | |
| **Solution**: | |
| 1. Check internet connection | |
| 2. Disable ad blocker for localhost | |
| 3. Alternative tile server in `MapViewer.tsx`: | |
| ```typescript | |
| url="https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png" | |
| ``` | |
| ### General Issues | |
| #### Issue: Port 8000 already in use | |
| **Solution**: | |
| ```bash | |
| # Find process using port | |
| lsof -ti:8000 | |
| # Kill process | |
| kill -9 $(lsof -ti:8000) | |
| # Or use different port | |
| uvicorn backend.main:app --port 8001 | |
| ``` | |
| #### Issue: Out of memory errors | |
| **Cause**: Loading too many large datasets. | |
| **Solution**: | |
| 1. Reduce dataset size (filter before loading) | |
| 2. Increase system RAM | |
| 3. Use query limits: `LIMIT 10000` | |
| --- | |
| ## Development Workflow | |
| ### Code Changes | |
| **Backend**: | |
| - Python files auto-reload with `--reload` flag | |
| - Changes in `core/`, `services/`, `api/` take effect immediately | |
| **Frontend**: | |
| - Hot Module Replacement (HMR) enabled | |
| - Changes in `components/`, `app/` reload automatically | |
| ### Adding New Datasets | |
| 1. **Add GeoJSON file** to appropriate directory (e.g., `backend/data/osm/`) | |
| 2. **Update catalog.json**: | |
| ```json | |
| "my_new_dataset": { | |
| "path": "osm/my_new_dataset.geojson", | |
| "description": "Description for display", | |
| "semantic_description": "Detailed description for AI", | |
| "categories": ["infrastructure"], | |
| "tags": ["roads", "transport"] | |
| } | |
| ``` | |
| 3. **Regenerate embeddings**: | |
| ```bash | |
| rm backend/data/embeddings.npy | |
| python -c "from backend.core.semantic_search import get_semantic_search; get_semantic_search()" | |
| ``` | |
| 4. **Test**: Query for the new dataset | |
| See [docs/backend/SCRIPTS.md](docs/backend/SCRIPTS.md) for data ingestion scripts. | |
| ### Testing API Endpoints | |
| **Using curl**: | |
| ```bash | |
| # Get catalog | |
| curl http://localhost:8000/api/catalog | |
| # Query chat endpoint | |
| curl -X POST http://localhost:8000/api/chat \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"message": "Show me provinces", "history": []}' | |
| ``` | |
| **Using Swagger UI**: | |
| - Open http://localhost:8000/docs | |
| - Try endpoints interactively | |
| --- | |
| ## Environment Variables Reference | |
| | Variable | Required | Default | Description | | |
| |----------|----------|---------|-------------| | |
| | `GEMINI_API_KEY` | β Yes | - | Google AI API key | | |
| | `PORT` | β No | 8000 | Backend server port | | |
| | `HOST` | β No | 0.0.0.0 | Backend host | | |
| | `LOG_LEVEL` | β No | INFO | Logging level (DEBUG, INFO, WARNING, ERROR) | | |
| | `DATABASE_PATH` | β No | :memory: | DuckDB database path (use for persistence) | | |
| --- | |
| ## IDE Setup | |
| ### VS Code | |
| **Recommended Extensions**: | |
| - Python (`ms-python.python`) | |
| - Pylance (`ms-python.vscode-pylance`) | |
| - ESLint (`dbaeumer.vscode-eslint`) | |
| - Prettier (`esbenp.prettier-vscode`) | |
| **Settings** (`.vscode/settings.json`): | |
| ```json | |
| { | |
| "python.defaultInterpreterPath": "./backend/venv/bin/python", | |
| "python.linting.enabled": true, | |
| "python.formatting.provider": "black", | |
| "editor.formatOnSave": true, | |
| "[typescript]": { | |
| "editor.defaultFormatter": "esbenp.prettier-vscode" | |
| } | |
| } | |
| ``` | |
| ### PyCharm | |
| 1. **Set Python Interpreter**: Settings β Project β Python Interpreter β Add β Existing Environment β `backend/venv/bin/python` | |
| 2. **Enable FastAPI**: Settings β Languages & Frameworks β FastAPI | |
| 3. **Configure Run**: Run β Edit Configurations β Add β Python β Script path: `backend/main.py` | |
| --- | |
| ## Next Steps | |
| - β **Verify installation** by running a test query | |
| - π **Read [ARCHITECTURE.md](../ARCHITECTURE.md)** to understand the system | |
| - π§ **Explore [docs/backend/CORE_SERVICES.md](docs/backend/CORE_SERVICES.md)** for component details | |
| - π **Review [docs/data/DATASET_SOURCES.md](docs/data/DATASET_SOURCES.md)** for available data | |