Multilingual Autoregressive Entity Linking
Paper
•
2103.12528
•
Published
emanuelaboros/historic-nel
The model is based on mGENRE (multilingual Generative ENtity REtrieval) proposed by De Cao et al, a sequence-to-sequence architecture for entity disambiguation based on mBART. It uses constrained generation to output entity names mapped to Wikidata/QIDs.
This model was adapted for historical texts and fine-tuned on the HIPE-2022 dataset, which includes a variety of historical document types and languages.
from transformers import AutoTokenizer, pipeline
NEL_MODEL_NAME = "emanuelaboros/historic-nel"
nel_tokenizer = AutoTokenizer.from_pretrained(NEL_MODEL_NAME)
nel_pipeline = pipeline("generic-nel", model=NEL_MODEL_NAME,
tokenizer=nel_tokenizer,
trust_remote_code=True,
device='cpu')
sentence = "Le 0ctobre 1894, [START] Dreyfvs [END] est arrêté à Paris, accusé d'espionnage pour l'Allemagne — un événement qui déch1ra la société fr4nçaise pendant des années."
print(nel_pipeline(sentence))