File size: 1,938 Bytes
0845cae | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | ---
license: apache-2.0
language:
- en
base_model:
- dslim/distilbert-NER
tags:
- email
- org
---
# Zefty/distilbert-ner-email-org
distilbert-ner-email-org is a fine-tuned version of [dslim/distilbert-NER](https://huggingface.co/dslim/distilbert-NER) on a set of [job application emails](https://www.kaggle.com/datasets/rasho330/job-application-email-anonymized-and-feature-rich).
The model is fine-tuned specifically to identify the organizations (ORG) entity, thus it CANNOT identify location (LOC), person (PER), and Miscellaneous (MISC), which is available in the original model.
This model is fine-tuned specifically to identify the organizations for a personal side-project of mine to extract out companies from job application emails.
# How to use
This model can be utilized with the Transformers pipeline for NER.
```
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("Zefty/distilbert-ner-email-org")
model = AutoModelForTokenClassification.from_pretrained("Zefty/distilbert-ner-email-org")
nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="first")
example = "Thank you for Applying to Amazon!"
ner_results = nlp(example)
print(ner_results)
```
# Training data
This model was fine-tuned on a set of [job application emails](https://www.kaggle.com/datasets/rasho330/job-application-email-anonymized-and-feature-rich).
Instead of using the full tokens from the CoNLL-2003 English Dataset, this dataset only includes the ORG token.
|Abbreviation|Description|
| -------- | ------- |
|O|Outside of a named entity|
|B-ORG|Beginning of an organization right after another organization|
|I-ORG|organization|
# Eval results
|Metric|Score|
| -------- | ------- |
|Loss|0.0898725837469101|
|Precision|0.7111111111111111|
|Recall|0.8205128205128205|
|F1|0.7619047619047619|
|Accuracy|0.9760986309658876| |