GLiNER AU PII — Australian Organisations & Locations

Fine-tuned extension of knowledgator/gliner-pii-large-v1.0 for detecting Australian organisations, government agencies, and locations in open-text survey responses.

The base model performs well on general PII but lacks coverage of Australian-specific entities — remote Indigenous place names, Australian-only companies, state government agencies, and AU-specific acronyms. This model addresses those gaps.

Entity Labels

Label	Description	Examples
`AU_ORGANISATION`	Private companies, banks, retailers, universities, NFPs	Guzman y Gomez, Bapcor, TAFE Queensland, Lifeline Australia
`AU_GOV_AGENCY`	Government departments, regulatory bodies, public services	TfNSW, SA Department for Education, Fair Work Commission, APRA
`AU_LOCATION`	Suburbs, cities, towns, states, territories	Nhulunbuy, Warakurna, Indooroopilly, Mount Kuring-gai

Usage

from gliner import GLiNER

model = GLiNER.from_pretrained("cutaa/gliner-au-pii-v1")

labels = ["AU_ORGANISATION", "AU_GOV_AGENCY", "AU_LOCATION"]

text = "I switched from HCF to Teachers Health and it made no difference in Yugambeh."
entities = model.predict_entities(text, labels, threshold=0.5)

for entity in entities:
print(f"[{entity['label']}] '{entity['text']}' ({entity['score']:.2f})")

PII Masking

def mask_pii(text: str, model, threshold: float = 0.5) -> str:
labels = ["AU_ORGANISATION", "AU_GOV_AGENCY", "AU_LOCATION"]
entities = model.predict_entities(text, labels, threshold=threshold)
entities = sorted(entities, key=lambda x: x["start"], reverse=True)
for entity in entities:
text = text[:entity["start"]] + f"[{entity['label']}]" + text[entity["end"]:]
return text

mask_pii("I work at Bapcor in Caringbah.", model)
# → "I work at [AU_ORGANISATION] in [AU_LOCATION]."

Training

Base model: knowledgator/gliner-pii-large-v1.0
Training samples: ~5000 synthetic Australian survey responses
Epochs: 5
Learning rate: 5e-6
Framework: GLiNER 0.2.26

Training data was synthetically generated using OpenAI and consists of realistic Australian workplace, healthcare, government services, and community survey responses containing labelled entity spans.

Entity Coverage

Training data prioritised entities the base model handles poorly:

Indigenous place names — Nhulunbuy, Yuendumu, Warakurna, Wadeye and ~330 others
Ambiguous single-word locations — Eden, Kelso, Hope Island, Swan Hill
Unusual multi-word suburbs — Mount Kuring-gai, Tea Tree Gully, Narre Warren
AU-only organisations — ~440 companies, law firms, health funds, universities with no global footprint
Government acronyms used standalone — TfNSW, APRA, ASIC, AEC, AAMI

Intended Use

Masking PII in Australian open-text survey responses before analysis or storage. Designed to complement rule-based detectors (e.g. Microsoft Presidio) which handle structured identifiers like TFN, ABN, Medicare numbers, and Australian phone/address formats.

Use this model for the unstructured entity types — names of places and organisations — that regex cannot reliably catch.

Limitations

v1 model — trained on ~5000 samples. Coverage of rare entities will improve in subsequent versions.
Combine with Presidio regex rules for structured Australian identifiers (TFN, ABN, Medicare, postcodes).
Threshold tuning recommended for your specific survey context. Start at 0.5, raise to 0.65–0.7 to reduce false positives.
Not evaluated on formal NER benchmarks — optimised for Australian survey response text specifically.

Combined Usage with Base Model

For full PII coverage, run both models and merge results:

from gliner import GLiNER

# Base model handles names, emails, phones, dates, generic PII
base_model = GLiNER.from_pretrained("knowledgator/gliner-pii-large-v1.0")

# This model handles AU organisations and locations
au_model = GLiNER.from_pretrained("cutaa/gliner-au-pii-v1")

base_labels = ["person", "email", "phone", "date of birth", "address"]
au_labels = ["AU_ORGANISATION", "AU_GOV_AGENCY", "AU_LOCATION"]

base_entities = base_model.predict_entities(text, base_labels, threshold=0.5)
au_entities = au_model.predict_entities(text, au_labels, threshold=0.5)

all_entities = base_entities + au_entities

Version History

Version	Training samples	Notes
v1	~5000	Initial release — core AU entity coverage

Downloads last month: 57

Model tree for cutaa/gliner-au-pii-v1

Base model

knowledgator/gliner-pii-large-v1.0

Finetuned

(1)

this model

cutaa
/

gliner-au-pii-v1