# GeoQuery Setup Guide Complete guide for setting up the GeoQuery development environment. --- ## Prerequisites ### Required Software | Requirement | Minimum Version | Purpose | |------------|----------------|---------| | **Python** | 3.11+ | Backend runtime | | **Node.js** | 18+ | Frontend runtime | | **npm** | 9+ | Package management | | **Git** | 2.0+ | Version control | ### API Keys - **Google AI API Key (Gemini)**: Required for LLM functionality - Get one free at: https://aistudio.google.com/app/apikey - Free tier: 15 requests/minute, 1500/day ### System Requirements - **RAM**: 4GB minimum, 8GB recommended (for DuckDB in-memory database) - **Disk**: 2GB for datasets - **OS**: macOS, Linux, or Windows (WSL recommended) --- ## Installation ### 1. Clone Repository ```bash git clone https://github.com/GerardCB/GeoQuery.git cd GeoQuery ``` ### 2. Backend Setup #### Create Virtual Environment ```bash cd backend python3 -m venv venv ``` #### Activate Virtual Environment **macOS/Linux**: ```bash source venv/bin/activate ``` **Windows** (PowerShell): ```powershell venv\Scripts\Activate.ps1 ``` **Windows** (CMD): ```cmd venv\Scripts\activate.bat ``` #### Install Dependencies ```bash pip install --upgrade pip pip install -e . ``` This installs the package in editable mode, including all dependencies from `setup.py`. **Key Dependencies**: - `fastapi` - Web framework - `uvicorn` - ASGI server - `duckdb` - Embedded database - `geopandas` - Geospatial data processing - `sentence-transformers` - Embeddings - `google-generativeai` - Gemini SDK #### Configure Environment Variables Create `.env` file in `backend/` directory: ```bash # Required GEMINI_API_KEY=your-api-key-here # Optional (defaults shown) PORT=8000 HOST=0.0.0.0 LOG_LEVEL=INFO ``` **Alternative**: Export directly in terminal: ```bash export GEMINI_API_KEY="your-api-key-here" ``` **Windows**: ```powershell $env:GEMINI_API_KEY="your-api-key-here" ``` #### Verify Backend Installation ```bash python -c "import backend; print('Backend installed successfully')" ``` ### 3. Frontend Setup ```bash cd ../frontend # From backend directory npm install ``` **Key Dependencies**: - `next` - React framework - `react` - UI library - `leaflet` - Map library - `react-leaflet` - React bindings for Leaflet - `@dnd-kit/core` - Drag and drop #### Configure Frontend (Optional) Edit `frontend/.env.local` if backend is not on default port: ```bash NEXT_PUBLIC_API_URL=http://localhost:8000 ``` --- ## Running Locally ### Start Backend From `backend/` directory with venv activated: ```bash uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000 ``` **Flags**: - `--reload`: Auto-restart on code changes - `--host 0.0.0.0`: Allow external connections - `--port 8000`: Port number **Expected Output**: ``` INFO: Uvicorn running on http://0.0.0.0:8000 INFO: Application startup complete. ``` **Verify**: - Open http://localhost:8000/docs → Should show FastAPI Swagger UI - Check http://localhost:8000/api/catalog → Should return GeoJSON catalog ### Start Frontend From `frontend/` directory: ```bash npm run dev ``` **Expected Output**: ``` ▲ Next.js 15.1.3 - Local: http://localhost:3000 - Ready in 2.1s ``` **Verify**: - Open http://localhost:3000 → Should show GeoQuery chat interface --- ## Database Setup ### DuckDB Initialization **Automatic**: Database is created in-memory on first query. **Manual Test**: ```python from backend.core.geo_engine import get_geo_engine engine = get_geo_engine() print(f"Loaded tables: {list(engine.loaded_tables.keys())}") ``` ### Load Initial Datasets Datasets are loaded lazily (on-demand). To pre-load common datasets: ```python from backend.core.geo_engine import get_geo_engine engine = get_geo_engine() engine.ensure_table_loaded("pan_admin1") # Provinces engine.ensure_table_loaded("panama_healthsites_geojson") # Hospitals ``` ### Generate Embeddings Required for semantic search: ```bash cd backend python -c "from backend.core.semantic_search import get_semantic_search; get_semantic_search()" ``` This generates `backend/data/embeddings.npy` (cached for future use). --- ## Directory Structure After Setup ``` GeoQuery/ ├── backend/ │ ├── venv/ # Virtual environment (created) │ ├── .env # Environment variables (created) │ ├── data/ │ │ ├── embeddings.npy # Generated embeddings (created) │ │ ├── catalog.json # Dataset registry (existing) │ │ └── osm/ # GeoJSON datasets (existing) │ └── ├── frontend/ │ ├── node_modules/ # npm packages (created) │ ├── .next/ # Build output (created) │ └── └── ``` --- ## Common Issues & Troubleshooting ### Backend Issues #### Issue: "ModuleNotFoundError: No module named 'backend'" **Cause**: Virtual environment not activated or package not installed. **Solution**: ```bash source venv/bin/activate # Activate venv pip install -e . # Install package ``` #### Issue: "duckdb.IOException: No files found that match the pattern" **Cause**: GeoJSON file missing or incorrect path in catalog.json. **Solution**: 1. Check file exists: `ls backend/data/osm/hospitals.geojson` 2. Verify path in `catalog.json` 3. Download missing data: `python backend/scripts/download_geofabrik.py` #### Issue: "google.api_core.exceptions.PermissionDenied: API key not valid" **Cause**: Invalid or missing GEMINI_API_KEY. **Solution**: ```bash export GEMINI_API_KEY="your-actual-api-key" # Restart backend ``` #### Issue: "Module 'sentence_transformers' has no attribute 'SentenceTransformer'" **Cause**: Corrupted installation. **Solution**: ```bash pip uninstall sentence-transformers pip install sentence-transformers --no-cache-dir ``` ### Frontend Issues #### Issue: "Error: Cannot find module 'next'" **Cause**: npm packages not installed. **Solution**: ```bash cd frontend rm -rf node_modules package-lock.json npm install ``` #### Issue: "Failed to fetch from localhost:8000" **Cause**: Backend not running or CORS issue. **Solution**: 1. Verify backend is running: `curl http://localhost:8000/api/catalog` 2. Check CORS settings in `backend/main.py` 3. Verify `NEXT_PUBLIC_API_URL` in frontend `.env.local` #### Issue: "Map tiles not loading" **Cause**: Network issue or ad blocker. **Solution**: 1. Check internet connection 2. Disable ad blocker for localhost 3. Alternative tile server in `MapViewer.tsx`: ```typescript url="https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png" ``` ### General Issues #### Issue: Port 8000 already in use **Solution**: ```bash # Find process using port lsof -ti:8000 # Kill process kill -9 $(lsof -ti:8000) # Or use different port uvicorn backend.main:app --port 8001 ``` #### Issue: Out of memory errors **Cause**: Loading too many large datasets. **Solution**: 1. Reduce dataset size (filter before loading) 2. Increase system RAM 3. Use query limits: `LIMIT 10000` --- ## Development Workflow ### Code Changes **Backend**: - Python files auto-reload with `--reload` flag - Changes in `core/`, `services/`, `api/` take effect immediately **Frontend**: - Hot Module Replacement (HMR) enabled - Changes in `components/`, `app/` reload automatically ### Adding New Datasets 1. **Add GeoJSON file** to appropriate directory (e.g., `backend/data/osm/`) 2. **Update catalog.json**: ```json "my_new_dataset": { "path": "osm/my_new_dataset.geojson", "description": "Description for display", "semantic_description": "Detailed description for AI", "categories": ["infrastructure"], "tags": ["roads", "transport"] } ``` 3. **Regenerate embeddings**: ```bash rm backend/data/embeddings.npy python -c "from backend.core.semantic_search import get_semantic_search; get_semantic_search()" ``` 4. **Test**: Query for the new dataset See [docs/backend/SCRIPTS.md](docs/backend/SCRIPTS.md) for data ingestion scripts. ### Testing API Endpoints **Using curl**: ```bash # Get catalog curl http://localhost:8000/api/catalog # Query chat endpoint curl -X POST http://localhost:8000/api/chat \ -H "Content-Type: application/json" \ -d '{"message": "Show me provinces", "history": []}' ``` **Using Swagger UI**: - Open http://localhost:8000/docs - Try endpoints interactively --- ## Environment Variables Reference | Variable | Required | Default | Description | |----------|----------|---------|-------------| | `GEMINI_API_KEY` | ✅ Yes | - | Google AI API key | | `PORT` | ❌ No | 8000 | Backend server port | | `HOST` | ❌ No | 0.0.0.0 | Backend host | | `LOG_LEVEL` | ❌ No | INFO | Logging level (DEBUG, INFO, WARNING, ERROR) | | `DATABASE_PATH` | ❌ No | :memory: | DuckDB database path (use for persistence) | --- ## IDE Setup ### VS Code **Recommended Extensions**: - Python (`ms-python.python`) - Pylance (`ms-python.vscode-pylance`) - ESLint (`dbaeumer.vscode-eslint`) - Prettier (`esbenp.prettier-vscode`) **Settings** (`.vscode/settings.json`): ```json { "python.defaultInterpreterPath": "./backend/venv/bin/python", "python.linting.enabled": true, "python.formatting.provider": "black", "editor.formatOnSave": true, "[typescript]": { "editor.defaultFormatter": "esbenp.prettier-vscode" } } ``` ### PyCharm 1. **Set Python Interpreter**: Settings → Project → Python Interpreter → Add → Existing Environment → `backend/venv/bin/python` 2. **Enable FastAPI**: Settings → Languages & Frameworks → FastAPI 3. **Configure Run**: Run → Edit Configurations → Add → Python → Script path: `backend/main.py` --- ## Next Steps - ✅ **Verify installation** by running a test query - 📖 **Read [ARCHITECTURE.md](../ARCHITECTURE.md)** to understand the system - 🔧 **Explore [docs/backend/CORE_SERVICES.md](docs/backend/CORE_SERVICES.md)** for component details - 📊 **Review [docs/data/DATASET_SOURCES.md](docs/data/DATASET_SOURCES.md)** for available data