File size: 1,938 Bytes
0845cae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
---
license: apache-2.0
language:
- en
base_model:
- dslim/distilbert-NER
tags:
- email
- org
---
# Zefty/distilbert-ner-email-org

distilbert-ner-email-org is a fine-tuned version of [dslim/distilbert-NER](https://huggingface.co/dslim/distilbert-NER) on a set of [job application emails](https://www.kaggle.com/datasets/rasho330/job-application-email-anonymized-and-feature-rich).
The model is fine-tuned specifically to identify the organizations (ORG) entity, thus it CANNOT identify location (LOC), person (PER), and Miscellaneous (MISC), which is available in the original model. 
This model is fine-tuned specifically to identify the organizations for a personal side-project of mine to extract out companies from job application emails. 


# How to use
This model can be utilized with the Transformers pipeline for NER.

```
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("Zefty/distilbert-ner-email-org")
model = AutoModelForTokenClassification.from_pretrained("Zefty/distilbert-ner-email-org")

nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="first")
example = "Thank you for Applying to Amazon!"

ner_results = nlp(example)
print(ner_results)
```

# Training data
This model was fine-tuned on a set of [job application emails](https://www.kaggle.com/datasets/rasho330/job-application-email-anonymized-and-feature-rich).
Instead of using the full tokens from the CoNLL-2003 English Dataset, this dataset only includes the ORG token.

|Abbreviation|Description|
| -------- | ------- |
|O|Outside of a named entity|
|B-ORG|Beginning of an organization right after another organization|
|I-ORG|organization|

# Eval results
|Metric|Score|
| -------- | ------- |
|Loss|0.0898725837469101|
|Precision|0.7111111111111111|
|Recall|0.8205128205128205|
|F1|0.7619047619047619|
|Accuracy|0.9760986309658876|