Anonymizer-v2 / README.md

QomSSLab

Upload README.md with huggingface_hub

e0f95c0 verified about 1 month ago

preview code

raw

history blame contribute delete

1.46 kB

metadata

language: fa
pipeline_tag: token-classification
library_name: transformers

QomSSLab/Anonymizer-v2

This repository hosts an XLM-RoBERTa token-classification head trained.

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

model_id = "QomSSLab/Anonymizer-v2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id)
tagger = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="simple")

text = "مثال از یک ورودی فارسی"
for entity in tagger(text):
    print(entity)

Labels

ACOUNT
ADDRESS
AMOUNT
DATE
DOCUMENT_ID
ID
JOB
O
ORG
ORG_BRANCH
PERSON

Metrics

Validation Metrics

Precision: 0.9789
Recall: 0.9731
F1: 0.9760
Accuracy: 0.9932

Per-label Breakdown

Label	Precision	Recall	F1	Support
ACOUNT	1.0000	1.0000	1.0000	0
ADDRESS	0.9944	0.9958	0.9951	712
AMOUNT	1.0000	1.0000	1.0000	41
DATE	0.9913	0.9785	0.9849	233
DOCUMENT_ID	1.0000	1.0000	1.0000	427
ID	1.0000	1.0000	1.0000	75
JOB	0.8919	0.4783	0.6226	69
O	0.9957	0.9972	0.9965	8359
ORG	0.8509	0.9327	0.8899	104
ORG_BRANCH	0.9656	1.0000	0.9825	281
PERSON	0.9983	1.0000	0.9991	587