Spaces:
Running
Running
Daniel Wiesmann commited on
Commit ·
8f4385f
1
Parent(s): 6fcda3c
Rename demo ap and better docs
Browse files- README.md +39 -34
- demo_app.py → gazet_demo.py +0 -0
README.md
CHANGED
|
@@ -1,68 +1,73 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
Lean natural-language geocoder with GIS operations over Overture and Natural Earth parquet datasets.
|
| 4 |
|
| 5 |
-
|
| 6 |
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
| Module | Contents |
|
| 10 |
-
| --- | --- |
|
| 11 |
-
| `config.py` | data paths, model name, SQL schema description |
|
| 12 |
-
| `types.py` | `SUBTYPES`, `COUNTRIES`, `Place`, `PlacesResult` |
|
| 13 |
-
| `lm.py` | DSPy signatures + LM init (`extract`, `write_sql`) |
|
| 14 |
-
| `search.py` | fuzzy search against `divisions_area` / `natural_earth` |
|
| 15 |
-
| `sql.py` | code-act SQL generation loop |
|
| 16 |
-
| `export.py` | GeoJSON FeatureCollection writer |
|
| 17 |
-
| `api.py` | FastAPI app with `/search?q=...` returning GeoJSON FeatureCollection |
|
| 18 |
|
| 19 |
## Local setup
|
| 20 |
|
| 21 |
-
|
|
|
|
|
|
|
| 22 |
|
| 23 |
```bash
|
| 24 |
uv sync --extra dev --extra demo
|
| 25 |
```
|
| 26 |
|
| 27 |
-
###
|
| 28 |
|
| 29 |
-
|
| 30 |
-
|
|
|
|
|
|
|
| 31 |
|
| 32 |
-
|
| 33 |
|
| 34 |
```bash
|
| 35 |
-
|
| 36 |
-
|
| 37 |
```
|
| 38 |
|
| 39 |
-
|
| 40 |
|
| 41 |
```bash
|
| 42 |
-
|
| 43 |
-
uv run streamlit run demo_app.py # demo UI
|
| 44 |
```
|
| 45 |
|
| 46 |
-
##
|
| 47 |
|
| 48 |
-
|
| 49 |
-
2. Download the 10m physical layer from [Natural Earth](https://www.naturalearthdata.com/downloads/10m-physical-vectors/)
|
| 50 |
-
3. Unzip the data
|
| 51 |
-
4. Convert natural earth data to parquet
|
| 52 |
|
| 53 |
-
|
| 54 |
|
| 55 |
```bash
|
| 56 |
-
|
| 57 |
-
|
| 58 |
```
|
| 59 |
|
| 60 |
-
|
| 61 |
|
| 62 |
```bash
|
| 63 |
-
|
|
|
|
| 64 |
```
|
| 65 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
## Design notes
|
| 67 |
|
| 68 |
- `api.py` exposes GET `/search?q=<query>`; returns GeoJSON FeatureCollection and logs intermediate output.
|
|
|
|
| 1 |
+
# Gazet
|
| 2 |
|
| 3 |
+
Lean natural-language geocoder with GIS operations over Overture and Natural Earth parquet datasets.
|
| 4 |
|
| 5 |
+
Gazet is built to be easily packagable and minimal in setup, trying to push the boundaries on how small we can go in setup for LLM driven data applications. It is built for working with small language models and parquet files.
|
| 6 |
|
| 7 |
+
The name inspired by [Gazetteer](https://en.wikipedia.org/wiki/Gazetteer). A gazetteer is a geographical dictionary or directory used in conjunction with a map or atlas.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
## Local setup
|
| 10 |
|
| 11 |
+
### Python setup
|
| 12 |
+
|
| 13 |
+
Install python dependencies using [uv](https://docs.astral.sh/uv/)
|
| 14 |
|
| 15 |
```bash
|
| 16 |
uv sync --extra dev --extra demo
|
| 17 |
```
|
| 18 |
|
| 19 |
+
### Data preparation
|
| 20 |
|
| 21 |
+
1. Download Overture divisions data
|
| 22 |
+
2. Download the 10m physical layer from [Natural Earth](https://www.naturalearthdata.com/downloads/10m-physical-vectors/)
|
| 23 |
+
3. Unzip the data
|
| 24 |
+
4. Convert natural earth data to parquet
|
| 25 |
|
| 26 |
+
Example for downloading overture
|
| 27 |
|
| 28 |
```bash
|
| 29 |
+
aws s3 sync
|
| 30 |
+
s3 sync s3://overturemaps-us-west-2/release/2026-02-18.0/theme=divisions/type=division_area/ data/overture/divisions_area
|
| 31 |
```
|
| 32 |
|
| 33 |
+
Example for running conversion script for natural earth
|
| 34 |
|
| 35 |
```bash
|
| 36 |
+
python -m ingest.convert_natural_earth ~/Downloads/10m_physical
|
|
|
|
| 37 |
```
|
| 38 |
|
| 39 |
+
### Based on ollama
|
| 40 |
|
| 41 |
+
For now, gazet relies on [ollama](https://ollama.com/). For remote (cloud) models, ensure you are loged into Ollama.
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
+
## Usage
|
| 44 |
|
| 45 |
```bash
|
| 46 |
+
python -m gazet
|
| 47 |
+
# then GET http://localhost:8000/search?q=Border%20between%20Loja%20and%20Piura
|
| 48 |
```
|
| 49 |
|
| 50 |
+
### API + Streamlit demo
|
| 51 |
|
| 52 |
```bash
|
| 53 |
+
uv run uvicorn gazet.api:app --reload # API on :8000
|
| 54 |
+
uv run streamlit run gazet_demo.py # demo UI
|
| 55 |
```
|
| 56 |
|
| 57 |
+
|
| 58 |
+
|
| 59 |
+
## Modules
|
| 60 |
+
|
| 61 |
+
| Module | Contents |
|
| 62 |
+
| --- | --- |
|
| 63 |
+
| `config.py` | data paths, model name, SQL schema description |
|
| 64 |
+
| `types.py` | `SUBTYPES`, `COUNTRIES`, `Place`, `PlacesResult` |
|
| 65 |
+
| `lm.py` | DSPy signatures + LM init (`extract`, `write_sql`) |
|
| 66 |
+
| `search.py` | fuzzy search against `divisions_area` / `natural_earth` |
|
| 67 |
+
| `sql.py` | code-act SQL generation loop |
|
| 68 |
+
| `export.py` | GeoJSON FeatureCollection writer |
|
| 69 |
+
| `api.py` | FastAPI app with `/search?q=...` returning GeoJSON FeatureCollection |
|
| 70 |
+
|
| 71 |
## Design notes
|
| 72 |
|
| 73 |
- `api.py` exposes GET `/search?q=<query>`; returns GeoJSON FeatureCollection and logs intermediate output.
|
demo_app.py → gazet_demo.py
RENAMED
|
File without changes
|