---
license: apache-2.0
language:
- en
base_model:
- dslim/distilbert-NER
tags:
- email
- org
---
# Zefty/distilbert-ner-email-org

distilbert-ner-email-org is a fine-tuned version of [dslim/distilbert-NER](https://huggingface.co/dslim/distilbert-NER) on a set of [job application emails](https://www.kaggle.com/datasets/rasho330/job-application-email-anonymized-and-feature-rich).
The model is fine-tuned specifically to identify the organizations (ORG) entity, thus it CANNOT identify location (LOC), person (PER), and Miscellaneous (MISC), which is available in the original model. 
This model is fine-tuned specifically to identify the organizations for a personal side-project of mine to extract out companies from job application emails. 


# How to use
This model can be utilized with the Transformers pipeline for NER.

```
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("Zefty/distilbert-ner-email-org")
model = AutoModelForTokenClassification.from_pretrained("Zefty/distilbert-ner-email-org")

nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="first")
example = "Thank you for Applying to Amazon!"

ner_results = nlp(example)
print(ner_results)
```

# Training data
This model was fine-tuned on a set of [job application emails](https://www.kaggle.com/datasets/rasho330/job-application-email-anonymized-and-feature-rich).
Instead of using the full tokens from the CoNLL-2003 English Dataset, this dataset only includes the ORG token.

|Abbreviation|Description|
| -------- | ------- |
|O|Outside of a named entity|
|B-ORG|Beginning of an organization right after another organization|
|I-ORG|organization|

# Eval results
|Metric|Score|
| -------- | ------- |
|Loss|0.0898725837469101|
|Precision|0.7111111111111111|
|Recall|0.8205128205128205|
|F1|0.7619047619047619|
|Accuracy|0.9760986309658876|