scanpatch
/

pii-ner-nemotron

Token Classification

Eval Results (legacy)

Model card Files Files and versions

pii-ner-nemotron / README.md

scanpatch's picture

Upload folder using huggingface_hub

99a8c78 verified about 1 month ago

|

history blame contribute delete

1.77 kB

	---
	language:
	- en
	- ru
	- uk
	tags:
	- token-classification
	- ner
	- pii
	- xlm-roberta
	- transformers
	library_name: transformers
	license: apache-2.0
	base_model: xlm-roberta-large
	pipeline_tag: token-classification
	model-index:
	- name: pii-ner-nemotron
	results:
	- task:
	type: token-classification
	name: Named Entity Recognition
	metrics:
	- type: f1
	value: 0.9768405285513023
	---

	# pii-ner-nemotron

	## Model summary

	PII NER model trained on nemotron dataset for multilingual PII entity extraction.

	- Base model: `xlm-roberta-large`
	- Repository: `scanpatch/pii-ner-nemotron`
	- Training run name: `pii-ner-nemotron`
	- Export timestamp (UTC): `2025-12-29T12:06:13.731145+00:00`

	## Labels

	### Entity types
	- `address`
	- `address_apartment`
	- `address_building`
	- `address_city`
	- `address_country`
	- `address_district`
	- `address_geolocation`
	- `address_house`
	- `address_postal_code`
	- `address_region`
	- `address_street`
	- `date`
	- `document_number`
	- `email`
	- `first_name`
	- `ip`
	- `last_name`
	- `middle_name`
	- `military_individual_number`
	- `mobile_phone`
	- `name`
	- `name_initials`
	- `nickname`
	- `organization`
	- `snils`
	- `tin`
	- `vehicle_number`

	## Evaluation

	\| Metric \| Value \|
	\|---\|---:\|
	\| `test_f1` \| `0.9768405285513023` \|
	\| `test_precision` \| `0.9734942064790006` \|
	\| `test_recall` \| `0.9802099354987895` \|
	\| `test_accuracy` \| `0.9977181928808507` \|
	\| `train_runtime` \| `1693.5057` \|
	\| `train_samples_per_second` \| `238.116` \|

	## How to use

	```python
	from transformers import pipeline

	ner = pipeline(
	"token-classification",
	model="scanpatch/pii-ner-nemotron",
	aggregation_strategy="simple",
	)

	text = "Contact me at test@example.com and my phone is +380 67 123 45 67."
	print(ner(text))
	```