|
|
--- |
|
|
language: fa |
|
|
pipeline_tag: token-classification |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# QomSSLab/Anonymizer-v2 |
|
|
|
|
|
This repository hosts an XLM-RoBERTa token-classification head trained. |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline |
|
|
|
|
|
model_id = "QomSSLab/Anonymizer-v2" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
model = AutoModelForTokenClassification.from_pretrained(model_id) |
|
|
tagger = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="simple") |
|
|
|
|
|
text = "مثال از یک ورودی فارسی" |
|
|
for entity in tagger(text): |
|
|
print(entity) |
|
|
``` |
|
|
|
|
|
## Labels |
|
|
|
|
|
- `ACOUNT` |
|
|
- `ADDRESS` |
|
|
- `AMOUNT` |
|
|
- `DATE` |
|
|
- `DOCUMENT_ID` |
|
|
- `ID` |
|
|
- `JOB` |
|
|
- `O` |
|
|
- `ORG` |
|
|
- `ORG_BRANCH` |
|
|
- `PERSON` |
|
|
|
|
|
## Metrics |
|
|
|
|
|
## Validation Metrics |
|
|
|
|
|
- Precision: 0.9789 |
|
|
- Recall: 0.9731 |
|
|
- F1: 0.9760 |
|
|
- Accuracy: 0.9932 |
|
|
|
|
|
### Per-label Breakdown |
|
|
|
|
|
| Label | Precision | Recall | F1 | Support | |
|
|
| --- | --- | --- | --- | --- | |
|
|
| ACOUNT | 1.0000 | 1.0000 | 1.0000 | 0 | |
|
|
| ADDRESS | 0.9944 | 0.9958 | 0.9951 | 712 | |
|
|
| AMOUNT | 1.0000 | 1.0000 | 1.0000 | 41 | |
|
|
| DATE | 0.9913 | 0.9785 | 0.9849 | 233 | |
|
|
| DOCUMENT_ID | 1.0000 | 1.0000 | 1.0000 | 427 | |
|
|
| ID | 1.0000 | 1.0000 | 1.0000 | 75 | |
|
|
| JOB | 0.8919 | 0.4783 | 0.6226 | 69 | |
|
|
| O | 0.9957 | 0.9972 | 0.9965 | 8359 | |
|
|
| ORG | 0.8509 | 0.9327 | 0.8899 | 104 | |
|
|
| ORG_BRANCH | 0.9656 | 1.0000 | 0.9825 | 281 | |
|
|
| PERSON | 0.9983 | 1.0000 | 0.9991 | 587 | |
|
|
|
|
|
|
|
|
|