pii-ner-nemotron / README.md
scanpatch's picture
Upload folder using huggingface_hub
99a8c78 verified
metadata
language:
  - en
  - ru
  - uk
tags:
  - token-classification
  - ner
  - pii
  - xlm-roberta
  - transformers
library_name: transformers
license: apache-2.0
base_model: xlm-roberta-large
pipeline_tag: token-classification
model-index:
  - name: pii-ner-nemotron
    results:
      - task:
          type: token-classification
          name: Named Entity Recognition
        metrics:
          - type: f1
            value: 0.9768405285513023

pii-ner-nemotron

Model summary

PII NER model trained on nemotron dataset for multilingual PII entity extraction.

  • Base model: xlm-roberta-large
  • Repository: scanpatch/pii-ner-nemotron
  • Training run name: pii-ner-nemotron
  • Export timestamp (UTC): 2025-12-29T12:06:13.731145+00:00

Labels

Entity types

  • address
  • address_apartment
  • address_building
  • address_city
  • address_country
  • address_district
  • address_geolocation
  • address_house
  • address_postal_code
  • address_region
  • address_street
  • date
  • document_number
  • email
  • first_name
  • ip
  • last_name
  • middle_name
  • military_individual_number
  • mobile_phone
  • name
  • name_initials
  • nickname
  • organization
  • snils
  • tin
  • vehicle_number

Evaluation

Metric Value
test_f1 0.9768405285513023
test_precision 0.9734942064790006
test_recall 0.9802099354987895
test_accuracy 0.9977181928808507
train_runtime 1693.5057
train_samples_per_second 238.116

How to use

from transformers import pipeline

ner = pipeline(
    "token-classification",
    model="scanpatch/pii-ner-nemotron",
    aggregation_strategy="simple",
)

text = "Contact me at test@example.com and my phone is +380 67 123 45 67."
print(ner(text))