|
|
--- |
|
|
language: |
|
|
- en |
|
|
- ru |
|
|
- uk |
|
|
tags: |
|
|
- token-classification |
|
|
- ner |
|
|
- pii |
|
|
- xlm-roberta |
|
|
- transformers |
|
|
library_name: transformers |
|
|
license: apache-2.0 |
|
|
base_model: xlm-roberta-large |
|
|
pipeline_tag: token-classification |
|
|
model-index: |
|
|
- name: pii-ner-nemotron |
|
|
results: |
|
|
- task: |
|
|
type: token-classification |
|
|
name: Named Entity Recognition |
|
|
metrics: |
|
|
- type: f1 |
|
|
value: 0.9768405285513023 |
|
|
--- |
|
|
|
|
|
# pii-ner-nemotron |
|
|
|
|
|
## Model summary |
|
|
|
|
|
PII NER model trained on nemotron dataset for multilingual PII entity extraction. |
|
|
|
|
|
- **Base model:** `xlm-roberta-large` |
|
|
- **Repository:** `scanpatch/pii-ner-nemotron` |
|
|
- **Training run name:** `pii-ner-nemotron` |
|
|
- **Export timestamp (UTC):** `2025-12-29T12:06:13.731145+00:00` |
|
|
|
|
|
## Labels |
|
|
|
|
|
### Entity types |
|
|
- `address` |
|
|
- `address_apartment` |
|
|
- `address_building` |
|
|
- `address_city` |
|
|
- `address_country` |
|
|
- `address_district` |
|
|
- `address_geolocation` |
|
|
- `address_house` |
|
|
- `address_postal_code` |
|
|
- `address_region` |
|
|
- `address_street` |
|
|
- `date` |
|
|
- `document_number` |
|
|
- `email` |
|
|
- `first_name` |
|
|
- `ip` |
|
|
- `last_name` |
|
|
- `middle_name` |
|
|
- `military_individual_number` |
|
|
- `mobile_phone` |
|
|
- `name` |
|
|
- `name_initials` |
|
|
- `nickname` |
|
|
- `organization` |
|
|
- `snils` |
|
|
- `tin` |
|
|
- `vehicle_number` |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
| Metric | Value | |
|
|
|---|---:| |
|
|
| `test_f1` | `0.9768405285513023` | |
|
|
| `test_precision` | `0.9734942064790006` | |
|
|
| `test_recall` | `0.9802099354987895` | |
|
|
| `test_accuracy` | `0.9977181928808507` | |
|
|
| `train_runtime` | `1693.5057` | |
|
|
| `train_samples_per_second` | `238.116` | |
|
|
|
|
|
## How to use |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
ner = pipeline( |
|
|
"token-classification", |
|
|
model="scanpatch/pii-ner-nemotron", |
|
|
aggregation_strategy="simple", |
|
|
) |
|
|
|
|
|
text = "Contact me at test@example.com and my phone is +380 67 123 45 67." |
|
|
print(ner(text)) |
|
|
``` |
|
|
|