saunteringcat commited on
Commit
d0b2b59
·
verified ·
1 Parent(s): 615f5da

Update README.md

Browse files

Update Model Card

Files changed (1) hide show
  1. README.md +38 -5
README.md CHANGED
@@ -2,11 +2,44 @@
2
  license: mit
3
  ---
4
 
5
- # Whereabouts databases
6
 
7
- A collection of geocoding databases for use by the [whereabouts](https://www.github.com/ajl2718/whereabouts) geocoding package in Python.
 
8
 
9
- This repository currently includes databases for the following countries:
10
- - Australia
 
11
 
12
- The file format is `<country_abbreviation>_<states>_<size>`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: mit
3
  ---
4
 
5
+ # Whereabouts
6
 
7
+ Whereabouts is a geocoding package in Python that implements some clever record linkage algorithms in SQL using DuckDB. The package itself
8
+ is available at [whereabouts](https://github.com/ajl2718/whereabouts) and can be installed via
9
 
10
+ ```
11
+ pip install whereabouts
12
+ ```
13
 
14
+ ## Installation of reference databases
15
+ Once the package is installed you will need to install a geocoding database, which has been built from a country's or region's address data.
16
+ This repo contains a collection of these databases for different countries and regions. Currently it has files for
17
+
18
+ - Australia (whole of country)
19
+ - Victoria, Australia
20
+ - New South Wales, Australia
21
+
22
+ More are being added as I get around to cleaning the data and creating the corresponding databases. The file format is
23
+ `<country_abbreviation>_<states>_<size>` where `<size>` is either `sm` or `lg` depending on whether the inverted index has been created using
24
+ pairs of consecutive tokens or trigrams. The large models can handle lower quality address data at the expense of speed.
25
+
26
+ Example (install the small Australian geocoding database)
27
+
28
+ ```
29
+ python -m whereabouts download au_all_sm
30
+ ```
31
+
32
+ ## Start geocoding
33
+
34
+ Once you have installed the package and a database you can start geocoding your data.
35
+
36
+ ```
37
+ from whereabouts.Matcher import Matcher
38
+
39
+ addresslist = ['122 station st fairfield vic', '643-645 sydney road brsunwick', '504 sydney rd brunswick']
40
+
41
+ matcher = Matcher(db_name='au_all_sm')
42
+ matcher.geocode(addresslist, how='standard')
43
+ ```
44
+ ## References
45
+ The algorithm is based on the following paper https://arxiv.org/abs/1708.01402