--- license: apache-2.0 language: - en base_model: - dslim/distilbert-NER tags: - email - org --- # Zefty/distilbert-ner-email-org distilbert-ner-email-org is a fine-tuned version of [dslim/distilbert-NER](https://huggingface.co/dslim/distilbert-NER) on a set of [job application emails](https://www.kaggle.com/datasets/rasho330/job-application-email-anonymized-and-feature-rich). The model is fine-tuned specifically to identify the organizations (ORG) entity, thus it CANNOT identify location (LOC), person (PER), and Miscellaneous (MISC), which is available in the original model. This model is fine-tuned specifically to identify the organizations for a personal side-project of mine to extract out companies from job application emails. # How to use This model can be utilized with the Transformers pipeline for NER. ``` from transformers import AutoTokenizer, AutoModelForTokenClassification from transformers import pipeline tokenizer = AutoTokenizer.from_pretrained("Zefty/distilbert-ner-email-org") model = AutoModelForTokenClassification.from_pretrained("Zefty/distilbert-ner-email-org") nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="first") example = "Thank you for Applying to Amazon!" ner_results = nlp(example) print(ner_results) ``` # Training data This model was fine-tuned on a set of [job application emails](https://www.kaggle.com/datasets/rasho330/job-application-email-anonymized-and-feature-rich). Instead of using the full tokens from the CoNLL-2003 English Dataset, this dataset only includes the ORG token. |Abbreviation|Description| | -------- | ------- | |O|Outside of a named entity| |B-ORG|Beginning of an organization right after another organization| |I-ORG|organization| # Eval results |Metric|Score| | -------- | ------- | |Loss|0.0898725837469101| |Precision|0.7111111111111111| |Recall|0.8205128205128205| |F1|0.7619047619047619| |Accuracy|0.9760986309658876|