pii-ner-nemotron / README.md

scanpatch

Upload folder using huggingface_hub

99a8c78 verified about 1 month ago

preview code

raw

history blame contribute delete

1.77 kB

metadata

language:
  - en
  - ru
  - uk
tags:
  - token-classification
  - ner
  - pii
  - xlm-roberta
  - transformers
library_name: transformers
license: apache-2.0
base_model: xlm-roberta-large
pipeline_tag: token-classification
model-index:
  - name: pii-ner-nemotron
    results:
      - task:
          type: token-classification
          name: Named Entity Recognition
        metrics:
          - type: f1
            value: 0.9768405285513023

pii-ner-nemotron

Model summary

PII NER model trained on nemotron dataset for multilingual PII entity extraction.

Base model: xlm-roberta-large
Repository: scanpatch/pii-ner-nemotron
Training run name: pii-ner-nemotron
Export timestamp (UTC): 2025-12-29T12:06:13.731145+00:00

Labels

Entity types

address
address_apartment
address_building
address_city
address_country
address_district
address_geolocation
address_house
address_postal_code
address_region
address_street
date
document_number
email
first_name
ip
last_name
middle_name
military_individual_number
mobile_phone
name
name_initials
nickname
organization
snils
tin
vehicle_number

Evaluation

Metric	Value
`test_f1`	`0.9768405285513023`
`test_precision`	`0.9734942064790006`
`test_recall`	`0.9802099354987895`
`test_accuracy`	`0.9977181928808507`
`train_runtime`	`1693.5057`
`train_samples_per_second`	`238.116`

How to use

from transformers import pipeline

ner = pipeline(
    "token-classification",
    model="scanpatch/pii-ner-nemotron",
    aggregation_strategy="simple",
)

text = "Contact me at test@example.com and my phone is +380 67 123 45 67."
print(ner(text))