Daniel Wiesmann commited on
Commit
8f4385f
·
1 Parent(s): 6fcda3c

Rename demo ap and better docs

Browse files
Files changed (2) hide show
  1. README.md +39 -34
  2. demo_app.py → gazet_demo.py +0 -0
README.md CHANGED
@@ -1,68 +1,73 @@
1
- # gazet
2
 
3
- Lean natural-language geocoder with GIS operations over Overture and Natural Earth parquet datasets. In an industry trending toward ever-larger models and heavier infrastructure, gazet takes the opposite path: small language models, DuckDB, and local Parquet files — no PostGIS, no cloud geocoding APIs, no bloat.
4
 
5
- Name inspired by [Gazetteer](https://en.wikipedia.org/wiki/Gazetteer). A gazetteer is a geographical dictionary or directory used in conjunction with a map or atlas.
6
 
7
- ## Modules
8
-
9
- | Module | Contents |
10
- | --- | --- |
11
- | `config.py` | data paths, model name, SQL schema description |
12
- | `types.py` | `SUBTYPES`, `COUNTRIES`, `Place`, `PlacesResult` |
13
- | `lm.py` | DSPy signatures + LM init (`extract`, `write_sql`) |
14
- | `search.py` | fuzzy search against `divisions_area` / `natural_earth` |
15
- | `sql.py` | code-act SQL generation loop |
16
- | `export.py` | GeoJSON FeatureCollection writer |
17
- | `api.py` | FastAPI app with `/search?q=...` returning GeoJSON FeatureCollection |
18
 
19
  ## Local setup
20
 
21
- Install python dependencies
 
 
22
 
23
  ```bash
24
  uv sync --extra dev --extra demo
25
  ```
26
 
27
- ### Based on ollama
28
 
29
- For now, gazet relies on ollama. For remote (cloud) models,
30
- ensure you are loged into Ollama.
 
 
31
 
32
- ## Usage
33
 
34
  ```bash
35
- python -m gazet
36
- # then GET http://localhost:8000/search?q=Border%20between%20Loja%20and%20Piura
37
  ```
38
 
39
- ### API + Streamlit demo
40
 
41
  ```bash
42
- uv run uvicorn gazet.api:app --reload # API on :8000
43
- uv run streamlit run demo_app.py # demo UI
44
  ```
45
 
46
- ## Data preparation
47
 
48
- 1. Download Overture divisions data
49
- 2. Download the 10m physical layer from [Natural Earth](https://www.naturalearthdata.com/downloads/10m-physical-vectors/)
50
- 3. Unzip the data
51
- 4. Convert natural earth data to parquet
52
 
53
- Example for downloading overture
54
 
55
  ```bash
56
- aws s3 sync
57
- s3 sync s3://overturemaps-us-west-2/release/2026-02-18.0/theme=divisions/type=division_area/ data/overture/divisions_area
58
  ```
59
 
60
- Example for running conversion script for natural earth
61
 
62
  ```bash
63
- python -m ingest.convert_natural_earth ~/Downloads/10m_physical
 
64
  ```
65
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
  ## Design notes
67
 
68
  - `api.py` exposes GET `/search?q=<query>`; returns GeoJSON FeatureCollection and logs intermediate output.
 
1
+ # Gazet
2
 
3
+ Lean natural-language geocoder with GIS operations over Overture and Natural Earth parquet datasets.
4
 
5
+ Gazet is built to be easily packagable and minimal in setup, trying to push the boundaries on how small we can go in setup for LLM driven data applications. It is built for working with small language models and parquet files.
6
 
7
+ The name inspired by [Gazetteer](https://en.wikipedia.org/wiki/Gazetteer). A gazetteer is a geographical dictionary or directory used in conjunction with a map or atlas.
 
 
 
 
 
 
 
 
 
 
8
 
9
  ## Local setup
10
 
11
+ ### Python setup
12
+
13
+ Install python dependencies using [uv](https://docs.astral.sh/uv/)
14
 
15
  ```bash
16
  uv sync --extra dev --extra demo
17
  ```
18
 
19
+ ### Data preparation
20
 
21
+ 1. Download Overture divisions data
22
+ 2. Download the 10m physical layer from [Natural Earth](https://www.naturalearthdata.com/downloads/10m-physical-vectors/)
23
+ 3. Unzip the data
24
+ 4. Convert natural earth data to parquet
25
 
26
+ Example for downloading overture
27
 
28
  ```bash
29
+ aws s3 sync
30
+ s3 sync s3://overturemaps-us-west-2/release/2026-02-18.0/theme=divisions/type=division_area/ data/overture/divisions_area
31
  ```
32
 
33
+ Example for running conversion script for natural earth
34
 
35
  ```bash
36
+ python -m ingest.convert_natural_earth ~/Downloads/10m_physical
 
37
  ```
38
 
39
+ ### Based on ollama
40
 
41
+ For now, gazet relies on [ollama](https://ollama.com/). For remote (cloud) models, ensure you are loged into Ollama.
 
 
 
42
 
43
+ ## Usage
44
 
45
  ```bash
46
+ python -m gazet
47
+ # then GET http://localhost:8000/search?q=Border%20between%20Loja%20and%20Piura
48
  ```
49
 
50
+ ### API + Streamlit demo
51
 
52
  ```bash
53
+ uv run uvicorn gazet.api:app --reload # API on :8000
54
+ uv run streamlit run gazet_demo.py # demo UI
55
  ```
56
 
57
+
58
+
59
+ ## Modules
60
+
61
+ | Module | Contents |
62
+ | --- | --- |
63
+ | `config.py` | data paths, model name, SQL schema description |
64
+ | `types.py` | `SUBTYPES`, `COUNTRIES`, `Place`, `PlacesResult` |
65
+ | `lm.py` | DSPy signatures + LM init (`extract`, `write_sql`) |
66
+ | `search.py` | fuzzy search against `divisions_area` / `natural_earth` |
67
+ | `sql.py` | code-act SQL generation loop |
68
+ | `export.py` | GeoJSON FeatureCollection writer |
69
+ | `api.py` | FastAPI app with `/search?q=...` returning GeoJSON FeatureCollection |
70
+
71
  ## Design notes
72
 
73
  - `api.py` exposes GET `/search?q=<query>`; returns GeoJSON FeatureCollection and logs intermediate output.
demo_app.py → gazet_demo.py RENAMED
File without changes