| | --- |
| | license: apache-2.0 |
| | language: |
| | - en |
| | base_model: |
| | - dslim/distilbert-NER |
| | tags: |
| | - email |
| | - org |
| | --- |
| | # Zefty/distilbert-ner-email-org |
| |
|
| | distilbert-ner-email-org is a fine-tuned version of [dslim/distilbert-NER](https://huggingface.co/dslim/distilbert-NER) on a set of [job application emails](https://www.kaggle.com/datasets/rasho330/job-application-email-anonymized-and-feature-rich). |
| | The model is fine-tuned specifically to identify the organizations (ORG) entity, thus it CANNOT identify location (LOC), person (PER), and Miscellaneous (MISC), which is available in the original model. |
| | This model is fine-tuned specifically to identify the organizations for a personal side-project of mine to extract out companies from job application emails. |
| |
|
| |
|
| | # How to use |
| | This model can be utilized with the Transformers pipeline for NER. |
| |
|
| | ``` |
| | from transformers import AutoTokenizer, AutoModelForTokenClassification |
| | from transformers import pipeline |
| | |
| | tokenizer = AutoTokenizer.from_pretrained("Zefty/distilbert-ner-email-org") |
| | model = AutoModelForTokenClassification.from_pretrained("Zefty/distilbert-ner-email-org") |
| | |
| | nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="first") |
| | example = "Thank you for Applying to Amazon!" |
| | |
| | ner_results = nlp(example) |
| | print(ner_results) |
| | ``` |
| |
|
| | # Training data |
| | This model was fine-tuned on a set of [job application emails](https://www.kaggle.com/datasets/rasho330/job-application-email-anonymized-and-feature-rich). |
| | Instead of using the full tokens from the CoNLL-2003 English Dataset, this dataset only includes the ORG token. |
| |
|
| | |Abbreviation|Description| |
| | | -------- | ------- | |
| | |O|Outside of a named entity| |
| | |B-ORG|Beginning of an organization right after another organization| |
| | |I-ORG|organization| |
| |
|
| | # Eval results |
| | |Metric|Score| |
| | | -------- | ------- | |
| | |Loss|0.0898725837469101| |
| | |Precision|0.7111111111111111| |
| | |Recall|0.8205128205128205| |
| | |F1|0.7619047619047619| |
| | |Accuracy|0.9760986309658876| |