Zefty commited on
Commit
0845cae
·
verified ·
1 Parent(s): c6e177e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -0
README.md ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - dslim/distilbert-NER
7
+ tags:
8
+ - email
9
+ - org
10
+ ---
11
+ # Zefty/distilbert-ner-email-org
12
+
13
+ distilbert-ner-email-org is a fine-tuned version of [dslim/distilbert-NER](https://huggingface.co/dslim/distilbert-NER) on a set of [job application emails](https://www.kaggle.com/datasets/rasho330/job-application-email-anonymized-and-feature-rich).
14
+ The model is fine-tuned specifically to identify the organizations (ORG) entity, thus it CANNOT identify location (LOC), person (PER), and Miscellaneous (MISC), which is available in the original model.
15
+ This model is fine-tuned specifically to identify the organizations for a personal side-project of mine to extract out companies from job application emails.
16
+
17
+
18
+ # How to use
19
+ This model can be utilized with the Transformers pipeline for NER.
20
+
21
+ ```
22
+ from transformers import AutoTokenizer, AutoModelForTokenClassification
23
+ from transformers import pipeline
24
+
25
+ tokenizer = AutoTokenizer.from_pretrained("Zefty/distilbert-ner-email-org")
26
+ model = AutoModelForTokenClassification.from_pretrained("Zefty/distilbert-ner-email-org")
27
+
28
+ nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="first")
29
+ example = "Thank you for Applying to Amazon!"
30
+
31
+ ner_results = nlp(example)
32
+ print(ner_results)
33
+ ```
34
+
35
+ # Training data
36
+ This model was fine-tuned on a set of [job application emails](https://www.kaggle.com/datasets/rasho330/job-application-email-anonymized-and-feature-rich).
37
+ Instead of using the full tokens from the CoNLL-2003 English Dataset, this dataset only includes the ORG token.
38
+
39
+ |Abbreviation|Description|
40
+ | -------- | ------- |
41
+ |O|Outside of a named entity|
42
+ |B-ORG|Beginning of an organization right after another organization|
43
+ |I-ORG|organization|
44
+
45
+ # Eval results
46
+ |Metric|Score|
47
+ | -------- | ------- |
48
+ |Loss|0.0898725837469101|
49
+ |Precision|0.7111111111111111|
50
+ |Recall|0.8205128205128205|
51
+ |F1|0.7619047619047619|
52
+ |Accuracy|0.9760986309658876|