Instructions to use marianaossilva/LitBERT-CRF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use marianaossilva/LitBERT-CRF with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="marianaossilva/LitBERT-CRF")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("marianaossilva/LitBERT-CRF") model = AutoModelForMaskedLM.from_pretrained("marianaossilva/LitBERT-CRF") - Notebooks
- Google Colab
- Kaggle
LitBERT-CRF
LitBERT-CRF model is a fine-tuned BERT-CRF architecture specifically designed for Named Entity Recognition (NER) in Portuguese-written literature.
Model Details
Model Description
LitBERT-CRF leverages a BERT-CRF architecture, initially pre-trained on the brWaC corpus and fine-tuned on the HAREM dataset for enhanced NER performance in Portuguese. It incorporates domain-specific literary data through Masked Language Modeling (MLM), making it well-suited for identifying named entities in literary texts.
- Model type: BERT-CRF for NER
- Language: Portuguese
- Fine-tuned from model: BERT-CRF on brWaC and HAREM
Evaluation
Testing Data, Factors & Metrics
Testing Data
PPORTAL_ner dataset
Metrics
- Precision: 0.783
- Recall: 0.774
- F1-score: 0.779
Citation
BibTeX:
@inproceedings{silva-moro-2024-evaluating,
title = "Evaluating Pre-training Strategies for Literary Named Entity Recognition in {P}ortuguese",
author = "Silva, Mariana O. and
Moro, Mirella M.",
editor = "Gamallo, Pablo and
Claro, Daniela and
Teixeira, Ant{\'o}nio and
Real, Livy and
Garcia, Marcos and
Oliveira, Hugo Gon{\c{c}}alo and
Amaro, Raquel",
booktitle = "Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1",
month = mar,
year = "2024",
address = "Santiago de Compostela, Galicia/Spain",
publisher = "Association for Computational Lingustics",
url = "https://aclanthology.org/2024.propor-1.39",
pages = "384--393",
}
APA:
Mariana O. Silva and Mirella M. Moro. 2024. Evaluating Pre-training Strategies for Literary Named Entity Recognition in Portuguese. In Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1, pages 384โ393, Santiago de Compostela, Galicia/Spain. Association for Computational Lingustics.
- Downloads last month
- 8