Learning to Rank Context for Named Entity Recognition Using a Synthetic Dataset
Paper • 2310.10118 • Published
How to use compnet-renard/bert-base-cased-NER-reranker with Transformers:
# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("compnet-renard/bert-base-cased-NER-reranker")
model = AutoModelForSequenceClassification.from_pretrained("compnet-renard/bert-base-cased-NER-reranker")How to use compnet-renard/bert-base-cased-NER-reranker with sentence-transformers:
from sentence_transformers import CrossEncoder
model = CrossEncoder("compnet-renard/bert-base-cased-NER-reranker")
query = "Which planet is known as the Red Planet?"
passages = [
"Venus is often called Earth's twin because of its similar size and proximity.",
"Mars, known for its reddish appearance, is often referred to as the Red Planet.",
"Jupiter, the largest planet in our solar system, has a prominent red spot.",
"Saturn, famous for its rings, is sometimes mistaken for the Red Planet."
]
scores = model.predict([(query, passage) for passage in passages])
print(scores)A BERT model trained on the synthetic literary NER context retrieval dataset Amalvy et. al, 2023 (arXiv).
To use this model, construct a text of the form NER-sentence [SEP] context-sentence. The model should predict the positive class if context-sentence is useful to predict NER-sentence, and the negative class otherwise.
The model obtains 98.34 F1 on the synthetic test set. See Amalvy et. al, 2023 for details about NER performance gains when using this retriever model to assit a NER model at inference.
See the training script here.
If you use this model in your research, please cite:
@InProceedings{Amalvy2023,
title = {Learning to Rank Context for Named Entity Recognition Using a Synthetic Dataset},
author = {Amalvy, A. and Labatut, V. and Dufour, R.},
booktitle = {2023 Conference on Empirical Methods in Natural Language Processing},
year = {2023},
doi = {10.18653/v1/2023.emnlp-main.642},
pages = {10372-10382},
}