Zefty
/

distilbert-ner-email-org

Model card Files Files and versions

distilbert-ner-email-org / README.md

Zefty's picture

Create README.md

0845cae verified 7 months ago

|

history blame contribute delete

1.94 kB

	---
	license: apache-2.0
	language:
	- en
	base_model:
	- dslim/distilbert-NER
	tags:
	- email
	- org
	---
	# Zefty/distilbert-ner-email-org

	distilbert-ner-email-org is a fine-tuned version of [dslim/distilbert-NER](https://huggingface.co/dslim/distilbert-NER) on a set of [job application emails](https://www.kaggle.com/datasets/rasho330/job-application-email-anonymized-and-feature-rich).
	The model is fine-tuned specifically to identify the organizations (ORG) entity, thus it CANNOT identify location (LOC), person (PER), and Miscellaneous (MISC), which is available in the original model.
	This model is fine-tuned specifically to identify the organizations for a personal side-project of mine to extract out companies from job application emails.


	# How to use
	This model can be utilized with the Transformers pipeline for NER.

	```
	from transformers import AutoTokenizer, AutoModelForTokenClassification
	from transformers import pipeline

	tokenizer = AutoTokenizer.from_pretrained("Zefty/distilbert-ner-email-org")
	model = AutoModelForTokenClassification.from_pretrained("Zefty/distilbert-ner-email-org")

	nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="first")
	example = "Thank you for Applying to Amazon!"

	ner_results = nlp(example)
	print(ner_results)
	```

	# Training data
	This model was fine-tuned on a set of [job application emails](https://www.kaggle.com/datasets/rasho330/job-application-email-anonymized-and-feature-rich).
	Instead of using the full tokens from the CoNLL-2003 English Dataset, this dataset only includes the ORG token.

	\|Abbreviation\|Description\|
	\| -------- \| ------- \|
	\|O\|Outside of a named entity\|
	\|B-ORG\|Beginning of an organization right after another organization\|
	\|I-ORG\|organization\|

	# Eval results
	\|Metric\|Score\|
	\| -------- \| ------- \|
	\|Loss\|0.0898725837469101\|
	\|Precision\|0.7111111111111111\|
	\|Recall\|0.8205128205128205\|
	\|F1\|0.7619047619047619\|
	\|Accuracy\|0.9760986309658876\|