GLiNER AU PII β€” Australian Organisations & Locations

Fine-tuned extension of knowledgator/gliner-pii-large-v1.0 for detecting Australian organisations, government agencies, and locations in open-text survey responses.

The base model performs well on general PII but lacks coverage of Australian-specific entities β€” remote Indigenous place names, Australian-only companies, state government agencies, and AU-specific acronyms. This model addresses those gaps.

Entity Labels

Label Description Examples
AU_ORGANISATION Private companies, banks, retailers, universities, NFPs Guzman y Gomez, Bapcor, TAFE Queensland, Lifeline Australia
AU_GOV_AGENCY Government departments, regulatory bodies, public services TfNSW, SA Department for Education, Fair Work Commission, APRA
AU_LOCATION Suburbs, cities, towns, states, territories Nhulunbuy, Warakurna, Indooroopilly, Mount Kuring-gai

Usage

from gliner import GLiNER

model = GLiNER.from_pretrained("cutaa/gliner-au-pii-v1")

labels = ["AU_ORGANISATION", "AU_GOV_AGENCY", "AU_LOCATION"]

text = "I switched from HCF to Teachers Health and it made no difference in Yugambeh."
entities = model.predict_entities(text, labels, threshold=0.5)

for entity in entities:
print(f"[{entity['label']}] '{entity['text']}' ({entity['score']:.2f})")

PII Masking

def mask_pii(text: str, model, threshold: float = 0.5) -> str:
labels = ["AU_ORGANISATION", "AU_GOV_AGENCY", "AU_LOCATION"]
entities = model.predict_entities(text, labels, threshold=threshold)
entities = sorted(entities, key=lambda x: x["start"], reverse=True)
for entity in entities:
text = text[:entity["start"]] + f"[{entity['label']}]" + text[entity["end"]:]
return text

mask_pii("I work at Bapcor in Caringbah.", model)
# β†’ "I work at [AU_ORGANISATION] in [AU_LOCATION]."

Training

  • Base model: knowledgator/gliner-pii-large-v1.0
  • Training samples: ~5000 synthetic Australian survey responses
  • Epochs: 5
  • Learning rate: 5e-6
  • Framework: GLiNER 0.2.26

Training data was synthetically generated using OpenAI and consists of realistic Australian workplace, healthcare, government services, and community survey responses containing labelled entity spans.

Entity Coverage

Training data prioritised entities the base model handles poorly:

  • Indigenous place names β€” Nhulunbuy, Yuendumu, Warakurna, Wadeye and ~330 others
  • Ambiguous single-word locations β€” Eden, Kelso, Hope Island, Swan Hill
  • Unusual multi-word suburbs β€” Mount Kuring-gai, Tea Tree Gully, Narre Warren
  • AU-only organisations β€” ~440 companies, law firms, health funds, universities with no global footprint
  • Government acronyms used standalone β€” TfNSW, APRA, ASIC, AEC, AAMI

Intended Use

Masking PII in Australian open-text survey responses before analysis or storage. Designed to complement rule-based detectors (e.g. Microsoft Presidio) which handle structured identifiers like TFN, ABN, Medicare numbers, and Australian phone/address formats.

Use this model for the unstructured entity types β€” names of places and organisations β€” that regex cannot reliably catch.

Limitations

  • v1 model β€” trained on ~5000 samples. Coverage of rare entities will improve in subsequent versions.
  • Combine with Presidio regex rules for structured Australian identifiers (TFN, ABN, Medicare, postcodes).
  • Threshold tuning recommended for your specific survey context. Start at 0.5, raise to 0.65–0.7 to reduce false positives.
  • Not evaluated on formal NER benchmarks β€” optimised for Australian survey response text specifically.

Combined Usage with Base Model

For full PII coverage, run both models and merge results:

from gliner import GLiNER

# Base model handles names, emails, phones, dates, generic PII
base_model = GLiNER.from_pretrained("knowledgator/gliner-pii-large-v1.0")

# This model handles AU organisations and locations
au_model = GLiNER.from_pretrained("cutaa/gliner-au-pii-v1")

base_labels = ["person", "email", "phone", "date of birth", "address"]
au_labels = ["AU_ORGANISATION", "AU_GOV_AGENCY", "AU_LOCATION"]

base_entities = base_model.predict_entities(text, base_labels, threshold=0.5)
au_entities = au_model.predict_entities(text, au_labels, threshold=0.5)

all_entities = base_entities + au_entities

Version History

Version Training samples Notes
v1 ~5000 Initial release β€” core AU entity coverage
Downloads last month
57
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for cutaa/gliner-au-pii-v1

Finetuned
(1)
this model

Space using cutaa/gliner-au-pii-v1 1