File size: 1,766 Bytes

99a8c78

---
language:
- en
- ru
- uk
tags:
- token-classification
- ner
- pii
- xlm-roberta
- transformers
library_name: transformers
license: apache-2.0
base_model: xlm-roberta-large
pipeline_tag: token-classification
model-index:
- name: pii-ner-nemotron
  results:
  - task:
      type: token-classification
      name: Named Entity Recognition
    metrics:
    - type: f1
      value: 0.9768405285513023
---

# pii-ner-nemotron

## Model summary

PII NER model trained on nemotron dataset for multilingual PII entity extraction.

- **Base model:** `xlm-roberta-large`
- **Repository:** `scanpatch/pii-ner-nemotron`
- **Training run name:** `pii-ner-nemotron`
- **Export timestamp (UTC):** `2025-12-29T12:06:13.731145+00:00`

## Labels

### Entity types
- `address`
- `address_apartment`
- `address_building`
- `address_city`
- `address_country`
- `address_district`
- `address_geolocation`
- `address_house`
- `address_postal_code`
- `address_region`
- `address_street`
- `date`
- `document_number`
- `email`
- `first_name`
- `ip`
- `last_name`
- `middle_name`
- `military_individual_number`
- `mobile_phone`
- `name`
- `name_initials`
- `nickname`
- `organization`
- `snils`
- `tin`
- `vehicle_number`

## Evaluation

| Metric | Value |
|---|---:|
| `test_f1` | `0.9768405285513023` |
| `test_precision` | `0.9734942064790006` |
| `test_recall` | `0.9802099354987895` |
| `test_accuracy` | `0.9977181928808507` |
| `train_runtime` | `1693.5057` |
| `train_samples_per_second` | `238.116` |

## How to use

```python
from transformers import pipeline

ner = pipeline(
    "token-classification",
    model="scanpatch/pii-ner-nemotron",
    aggregation_strategy="simple",
)

text = "Contact me at test@example.com and my phone is +380 67 123 45 67."
print(ner(text))
```