QomSSLab
/

Anonymizer-v2

Token Classification

Model card Files Files and versions

Metrics Training metrics Community

Anonymizer-v2 / README.md

QomSSLab's picture

Upload README.md with huggingface_hub

e0f95c0 verified about 1 month ago

|

history blame contribute delete

1.46 kB

	---
	language: fa
	pipeline_tag: token-classification
	library_name: transformers
	---

	# QomSSLab/Anonymizer-v2

	This repository hosts an XLM-RoBERTa token-classification head trained.

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

	model_id = "QomSSLab/Anonymizer-v2"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForTokenClassification.from_pretrained(model_id)
	tagger = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="simple")

	text = "مثال از یک ورودی فارسی"
	for entity in tagger(text):
	print(entity)
	```

	## Labels

	- `ACOUNT`
	- `ADDRESS`
	- `AMOUNT`
	- `DATE`
	- `DOCUMENT_ID`
	- `ID`
	- `JOB`
	- `O`
	- `ORG`
	- `ORG_BRANCH`
	- `PERSON`

	## Metrics

	## Validation Metrics

	- Precision: 0.9789
	- Recall: 0.9731
	- F1: 0.9760
	- Accuracy: 0.9932

	### Per-label Breakdown

	\| Label \| Precision \| Recall \| F1 \| Support \|
	\| --- \| --- \| --- \| --- \| --- \|
	\| ACOUNT \| 1.0000 \| 1.0000 \| 1.0000 \| 0 \|
	\| ADDRESS \| 0.9944 \| 0.9958 \| 0.9951 \| 712 \|
	\| AMOUNT \| 1.0000 \| 1.0000 \| 1.0000 \| 41 \|
	\| DATE \| 0.9913 \| 0.9785 \| 0.9849 \| 233 \|
	\| DOCUMENT_ID \| 1.0000 \| 1.0000 \| 1.0000 \| 427 \|
	\| ID \| 1.0000 \| 1.0000 \| 1.0000 \| 75 \|
	\| JOB \| 0.8919 \| 0.4783 \| 0.6226 \| 69 \|
	\| O \| 0.9957 \| 0.9972 \| 0.9965 \| 8359 \|
	\| ORG \| 0.8509 \| 0.9327 \| 0.8899 \| 104 \|
	\| ORG_BRANCH \| 0.9656 \| 1.0000 \| 0.9825 \| 281 \|
	\| PERSON \| 0.9983 \| 1.0000 \| 0.9991 \| 587 \|