GLiNER AU PII β Australian Organisations & Locations
Fine-tuned extension of knowledgator/gliner-pii-large-v1.0 for detecting Australian organisations, government agencies, and locations in open-text survey responses.
The base model performs well on general PII but lacks coverage of Australian-specific entities β remote Indigenous place names, Australian-only companies, state government agencies, and AU-specific acronyms. This model addresses those gaps.
Entity Labels
| Label | Description | Examples |
|---|---|---|
AU_ORGANISATION |
Private companies, banks, retailers, universities, NFPs | Guzman y Gomez, Bapcor, TAFE Queensland, Lifeline Australia |
AU_GOV_AGENCY |
Government departments, regulatory bodies, public services | TfNSW, SA Department for Education, Fair Work Commission, APRA |
AU_LOCATION |
Suburbs, cities, towns, states, territories | Nhulunbuy, Warakurna, Indooroopilly, Mount Kuring-gai |
Usage
from gliner import GLiNER
model = GLiNER.from_pretrained("cutaa/gliner-au-pii-v1")
labels = ["AU_ORGANISATION", "AU_GOV_AGENCY", "AU_LOCATION"]
text = "I switched from HCF to Teachers Health and it made no difference in Yugambeh."
entities = model.predict_entities(text, labels, threshold=0.5)
for entity in entities:
print(f"[{entity['label']}] '{entity['text']}' ({entity['score']:.2f})")
PII Masking
def mask_pii(text: str, model, threshold: float = 0.5) -> str:
labels = ["AU_ORGANISATION", "AU_GOV_AGENCY", "AU_LOCATION"]
entities = model.predict_entities(text, labels, threshold=threshold)
entities = sorted(entities, key=lambda x: x["start"], reverse=True)
for entity in entities:
text = text[:entity["start"]] + f"[{entity['label']}]" + text[entity["end"]:]
return text
mask_pii("I work at Bapcor in Caringbah.", model)
# β "I work at [AU_ORGANISATION] in [AU_LOCATION]."
Training
- Base model: knowledgator/gliner-pii-large-v1.0
- Training samples: ~5000 synthetic Australian survey responses
- Epochs: 5
- Learning rate: 5e-6
- Framework: GLiNER 0.2.26
Training data was synthetically generated using OpenAI and consists of realistic Australian workplace, healthcare, government services, and community survey responses containing labelled entity spans.
Entity Coverage
Training data prioritised entities the base model handles poorly:
- Indigenous place names β Nhulunbuy, Yuendumu, Warakurna, Wadeye and ~330 others
- Ambiguous single-word locations β Eden, Kelso, Hope Island, Swan Hill
- Unusual multi-word suburbs β Mount Kuring-gai, Tea Tree Gully, Narre Warren
- AU-only organisations β ~440 companies, law firms, health funds, universities with no global footprint
- Government acronyms used standalone β TfNSW, APRA, ASIC, AEC, AAMI
Intended Use
Masking PII in Australian open-text survey responses before analysis or storage. Designed to complement rule-based detectors (e.g. Microsoft Presidio) which handle structured identifiers like TFN, ABN, Medicare numbers, and Australian phone/address formats.
Use this model for the unstructured entity types β names of places and organisations β that regex cannot reliably catch.
Limitations
- v1 model β trained on ~5000 samples. Coverage of rare entities will improve in subsequent versions.
- Combine with Presidio regex rules for structured Australian identifiers (TFN, ABN, Medicare, postcodes).
- Threshold tuning recommended for your specific survey context. Start at 0.5, raise to 0.65β0.7 to reduce false positives.
- Not evaluated on formal NER benchmarks β optimised for Australian survey response text specifically.
Combined Usage with Base Model
For full PII coverage, run both models and merge results:
from gliner import GLiNER
# Base model handles names, emails, phones, dates, generic PII
base_model = GLiNER.from_pretrained("knowledgator/gliner-pii-large-v1.0")
# This model handles AU organisations and locations
au_model = GLiNER.from_pretrained("cutaa/gliner-au-pii-v1")
base_labels = ["person", "email", "phone", "date of birth", "address"]
au_labels = ["AU_ORGANISATION", "AU_GOV_AGENCY", "AU_LOCATION"]
base_entities = base_model.predict_entities(text, base_labels, threshold=0.5)
au_entities = au_model.predict_entities(text, au_labels, threshold=0.5)
all_entities = base_entities + au_entities
Version History
| Version | Training samples | Notes |
|---|---|---|
| v1 | ~5000 | Initial release β core AU entity coverage |
- Downloads last month
- 57
Model tree for cutaa/gliner-au-pii-v1
Base model
knowledgator/gliner-pii-large-v1.0