Model Card for `emanuelaboros/historic-nel`

The model is based on mGENRE (multilingual Generative ENtity REtrieval) proposed by De Cao et al, a sequence-to-sequence architecture for entity disambiguation based on mBART. It uses constrained generation to output entity names mapped to Wikidata/QIDs.

This model was adapted for historical texts and fine-tuned on the HIPE-2022 dataset, which includes a variety of historical document types and languages.

How to Use

from transformers import AutoTokenizer, pipeline

NEL_MODEL_NAME = "emanuelaboros/historic-nel"
nel_tokenizer = AutoTokenizer.from_pretrained(NEL_MODEL_NAME)

nel_pipeline = pipeline("generic-nel", model=NEL_MODEL_NAME,
                        tokenizer=nel_tokenizer,
                        trust_remote_code=True,
                        device='cpu')

sentence = "Le 0ctobre 1894, [START] Dreyfvs [END] est arrêté à Paris, accusé d'espionnage pour l'Allemagne — un événement qui déch1ra la société fr4nçaise pendant des années."
print(nel_pipeline(sentence))

Downloads last month: 20

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for emanuelaboros/historic-nel

Multilingual Autoregressive Entity Linking

Paper • 2103.12528 • Published Mar 23, 2021

Multilingual Denoising Pre-training for Neural Machine Translation

Paper • 2001.08210 • Published Jan 22, 2020

Model Card for emanuelaboros/historic-nel

How to Use

Papers for emanuelaboros/historic-nel

Model Card for `emanuelaboros/historic-nel`