PII NER โ Singapore
Fine-tuned BERT model for detecting Personally Identifiable Information (PII) in Singapore-context English text. Optimised for use with ONNX Runtime Web (browser inference via the Local Presidio extension).
Labels
| Label | Description |
|---|---|
| PER | Person name |
| ID | NRIC / FIN / passport number |
| PHONE | Phone number |
| Email address | |
| ADDR | Physical address |
| ORG | Organisation name |
| FIN | Financial info (bank account, credit card) |
| DOB | Date of birth |
| REG | Registration number (vehicle, company) |
| MISC | Other PII |
Usage (Transformers.js)
import { pipeline } from '@xenova/transformers';
const ner = await pipeline('token-classification', 'ohhsj/pii-ner-singapore');
const result = await ner('John Tan NRIC S1234567D lives at 123 Orchard Road.');
console.log(result);
Usage (ONNX Runtime directly)
The onnx/model_quantized.onnx file can be loaded directly with ONNX Runtime Web
for browser inference without the Transformers.js wrapper.
Training Details
| Parameter | Value |
|---|---|
| Base model | dslim/bert-base-NER |
| Training data | Synthetic Singapore PII sentences |
| Sentences | 2,500 |
| Epochs | 4 |
| Batch size | 32 |
| Max length | 256 tokens |
| GPU | NVIDIA A100 (RunPod serverless) |
| Quantisation | int8 dynamic (ONNX Runtime) |
Evaluation (synthetic test set)
| Entity | Precision | Recall | F1 |
|---|---|---|---|
| ADDR | 1.000 | 1.000 | 1.000 |
| DOB | 1.000 | 1.000 | 1.000 |
| 1.000 | 1.000 | 1.000 | |
| FIN | 1.000 | 1.000 | 1.000 |
| ID | 1.000 | 1.000 | 1.000 |
| ORG | 1.000 | 1.000 | 1.000 |
| PER | 1.000 | 1.000 | 1.000 |
| PHONE | 1.000 | 1.000 | 1.000 |
| REG | 1.000 | 1.000 | 1.000 |
| avg | 1.000 | 1.000 | 1.000 |
Note: Evaluated on synthetic data generated by the same pipeline. Real-world performance will vary โ use regex detectors for structured fields (NRIC, phone, email) and this model for unstructured text (names, addresses, organisations).
Files
onnx/model_quantized.onnx 104 MB int8-quantised, for browser inference
label_map.json mapping of label IDs to entity names
config.json model config (dslim/bert-base-NER base)
tokenizer.json tokenizer (WordPiece, bert-base-cased vocab)
tokenizer_config.json
special_tokens_map.json
vocab.txt
- Downloads last month
- 14