PII NER โ€” Singapore

Fine-tuned BERT model for detecting Personally Identifiable Information (PII) in Singapore-context English text. Optimised for use with ONNX Runtime Web (browser inference via the Local Presidio extension).

Labels

Label Description
PER Person name
ID NRIC / FIN / passport number
PHONE Phone number
EMAIL Email address
ADDR Physical address
ORG Organisation name
FIN Financial info (bank account, credit card)
DOB Date of birth
REG Registration number (vehicle, company)
MISC Other PII

Usage (Transformers.js)

import { pipeline } from '@xenova/transformers';
const ner = await pipeline('token-classification', 'ohhsj/pii-ner-singapore');
const result = await ner('John Tan NRIC S1234567D lives at 123 Orchard Road.');
console.log(result);

Usage (ONNX Runtime directly)

The onnx/model_quantized.onnx file can be loaded directly with ONNX Runtime Web for browser inference without the Transformers.js wrapper.

Training Details

Parameter Value
Base model dslim/bert-base-NER
Training data Synthetic Singapore PII sentences
Sentences 2,500
Epochs 4
Batch size 32
Max length 256 tokens
GPU NVIDIA A100 (RunPod serverless)
Quantisation int8 dynamic (ONNX Runtime)

Evaluation (synthetic test set)

Entity Precision Recall F1
ADDR 1.000 1.000 1.000
DOB 1.000 1.000 1.000
EMAIL 1.000 1.000 1.000
FIN 1.000 1.000 1.000
ID 1.000 1.000 1.000
ORG 1.000 1.000 1.000
PER 1.000 1.000 1.000
PHONE 1.000 1.000 1.000
REG 1.000 1.000 1.000
avg 1.000 1.000 1.000

Note: Evaluated on synthetic data generated by the same pipeline. Real-world performance will vary โ€” use regex detectors for structured fields (NRIC, phone, email) and this model for unstructured text (names, addresses, organisations).

Files

onnx/model_quantized.onnx   104 MB   int8-quantised, for browser inference
label_map.json               mapping of label IDs to entity names
config.json                  model config (dslim/bert-base-NER base)
tokenizer.json               tokenizer (WordPiece, bert-base-cased vocab)
tokenizer_config.json
special_tokens_map.json
vocab.txt
Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support