Anonymizer-v2 / README.md
QomSSLab's picture
Upload README.md with huggingface_hub
e0f95c0 verified
metadata
language: fa
pipeline_tag: token-classification
library_name: transformers

QomSSLab/Anonymizer-v2

This repository hosts an XLM-RoBERTa token-classification head trained.

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

model_id = "QomSSLab/Anonymizer-v2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id)
tagger = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="simple")

text = "مثال از یک ورودی فارسی"
for entity in tagger(text):
    print(entity)

Labels

  • ACOUNT
  • ADDRESS
  • AMOUNT
  • DATE
  • DOCUMENT_ID
  • ID
  • JOB
  • O
  • ORG
  • ORG_BRANCH
  • PERSON

Metrics

Validation Metrics

  • Precision: 0.9789
  • Recall: 0.9731
  • F1: 0.9760
  • Accuracy: 0.9932

Per-label Breakdown

Label Precision Recall F1 Support
ACOUNT 1.0000 1.0000 1.0000 0
ADDRESS 0.9944 0.9958 0.9951 712
AMOUNT 1.0000 1.0000 1.0000 41
DATE 0.9913 0.9785 0.9849 233
DOCUMENT_ID 1.0000 1.0000 1.0000 427
ID 1.0000 1.0000 1.0000 75
JOB 0.8919 0.4783 0.6226 69
O 0.9957 0.9972 0.9965 8359
ORG 0.8509 0.9327 0.8899 104
ORG_BRANCH 0.9656 1.0000 0.9825 281
PERSON 0.9983 1.0000 0.9991 587