File size: 1,766 Bytes
99a8c78 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
---
language:
- en
- ru
- uk
tags:
- token-classification
- ner
- pii
- xlm-roberta
- transformers
library_name: transformers
license: apache-2.0
base_model: xlm-roberta-large
pipeline_tag: token-classification
model-index:
- name: pii-ner-nemotron
results:
- task:
type: token-classification
name: Named Entity Recognition
metrics:
- type: f1
value: 0.9768405285513023
---
# pii-ner-nemotron
## Model summary
PII NER model trained on nemotron dataset for multilingual PII entity extraction.
- **Base model:** `xlm-roberta-large`
- **Repository:** `scanpatch/pii-ner-nemotron`
- **Training run name:** `pii-ner-nemotron`
- **Export timestamp (UTC):** `2025-12-29T12:06:13.731145+00:00`
## Labels
### Entity types
- `address`
- `address_apartment`
- `address_building`
- `address_city`
- `address_country`
- `address_district`
- `address_geolocation`
- `address_house`
- `address_postal_code`
- `address_region`
- `address_street`
- `date`
- `document_number`
- `email`
- `first_name`
- `ip`
- `last_name`
- `middle_name`
- `military_individual_number`
- `mobile_phone`
- `name`
- `name_initials`
- `nickname`
- `organization`
- `snils`
- `tin`
- `vehicle_number`
## Evaluation
| Metric | Value |
|---|---:|
| `test_f1` | `0.9768405285513023` |
| `test_precision` | `0.9734942064790006` |
| `test_recall` | `0.9802099354987895` |
| `test_accuracy` | `0.9977181928808507` |
| `train_runtime` | `1693.5057` |
| `train_samples_per_second` | `238.116` |
## How to use
```python
from transformers import pipeline
ner = pipeline(
"token-classification",
model="scanpatch/pii-ner-nemotron",
aggregation_strategy="simple",
)
text = "Contact me at test@example.com and my phone is +380 67 123 45 67."
print(ner(text))
```
|