pii-ner-nemotron / README.md
scanpatch's picture
Upload folder using huggingface_hub
99a8c78 verified
---
language:
- en
- ru
- uk
tags:
- token-classification
- ner
- pii
- xlm-roberta
- transformers
library_name: transformers
license: apache-2.0
base_model: xlm-roberta-large
pipeline_tag: token-classification
model-index:
- name: pii-ner-nemotron
results:
- task:
type: token-classification
name: Named Entity Recognition
metrics:
- type: f1
value: 0.9768405285513023
---
# pii-ner-nemotron
## Model summary
PII NER model trained on nemotron dataset for multilingual PII entity extraction.
- **Base model:** `xlm-roberta-large`
- **Repository:** `scanpatch/pii-ner-nemotron`
- **Training run name:** `pii-ner-nemotron`
- **Export timestamp (UTC):** `2025-12-29T12:06:13.731145+00:00`
## Labels
### Entity types
- `address`
- `address_apartment`
- `address_building`
- `address_city`
- `address_country`
- `address_district`
- `address_geolocation`
- `address_house`
- `address_postal_code`
- `address_region`
- `address_street`
- `date`
- `document_number`
- `email`
- `first_name`
- `ip`
- `last_name`
- `middle_name`
- `military_individual_number`
- `mobile_phone`
- `name`
- `name_initials`
- `nickname`
- `organization`
- `snils`
- `tin`
- `vehicle_number`
## Evaluation
| Metric | Value |
|---|---:|
| `test_f1` | `0.9768405285513023` |
| `test_precision` | `0.9734942064790006` |
| `test_recall` | `0.9802099354987895` |
| `test_accuracy` | `0.9977181928808507` |
| `train_runtime` | `1693.5057` |
| `train_samples_per_second` | `238.116` |
## How to use
```python
from transformers import pipeline
ner = pipeline(
"token-classification",
model="scanpatch/pii-ner-nemotron",
aggregation_strategy="simple",
)
text = "Contact me at test@example.com and my phone is +380 67 123 45 67."
print(ner(text))
```