ai4privacy/open-pii-masking-500k-ai4privacy
Viewer • Updated • 580k • 1.37k • 23
How to use barflyman/multilang-pii-ner-ONNX with Transformers.js:
// npm i @huggingface/transformers
import { pipeline } from '@huggingface/transformers';
// Allocate pipeline
const pipe = await pipeline('token-classification', 'barflyman/multilang-pii-ner-ONNX');This is an ONNX version of Ar86Bat/multilang-pii-ner. It was automatically converted and uploaded using this Hugging Face Space.
See the pipeline documentation for token-classification: https://huggingface.co/docs/transformers.js/api/pipelines#module_pipelines.TokenClassificationPipeline
A multilingual transformer model (xlm-roberta-base) fine-tuned for Named Entity Recognition (NER) to detect and mask Personally Identifiable Information (PII) in text across English, German, Italian, and French.
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
model_id = "Ar86Bat/multilang-pii-ner"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id)
nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
text = "John Doe was born on 12/12/1990 and lives in Berlin."
results = nlp(text)
print(results)
AGE, BUILDINGNUM, CITY, DATE, EMAIL, GIVENNAME, STREET, TELEPHONENUM, TIMEEMAIL and DATE (F1 ≈ 0.999)DRIVERLICENSENUM (F1 ≈ 0.85), GENDER (F1 ≈ 0.83), PASSPORTNUM (F1 ≈ 0.88), SURNAME (F1 ≈ 0.85), SEX (F1 ≈ 0.84)model/ directory.num_train_epochs=2 # Total number of training epochsper_device_train_batch_size=32 # Batch size for trainingper_device_eval_batch_size=32 # Batch size for evaluationIf you use this model, please cite the repository:
@misc{ar86bat_multilang_pii_ner_2025,
author = {Arif Hizlan},
title = {Multilingual PII NER},
year = {2025},
howpublished = {\\url{https://huggingface.co/Ar86Bat/multilang-pii-ner}}
}
https://github.com/Ar86Bat/multilang-pii-ner
MIT License