Quantifying the Carbon Emissions of Machine Learning
Paper
•
1910.09700
•
Published
•
24
This token-classification model aims to perform Named Entity Recognition on German-Austrian historical documents.
The model has been trained using the tagged entities 10319 samples provided by https://nerdpool-api.acdh-dev.oeaw.ac.at/.
The model has been trained to identify entities from the Minutes of the Austian Council of Ministries.
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
model = AutoModelForTokenClassification.from_pretrained("demigrigo/mpr_bert_german_ner")
tokenizer = AutoTokenizer.from_pretrained("demigrigo/mpr_bert_german_ner")
nlp = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="average")
text = "Ernennung FML. Peter Zaninis zum Kriegsminister" ##example sentence
print(nlp(text))
Training data from: https://nerdpool-api.acdh-dev.oeaw.ac.at/
The data transformed into BIO tagging style required by the original model.
Base model
google-bert/bert-base-cased