nielsaxe
/

BookTitleNERDutch

Token Classification

Model card Files Files and versions

nielsaxe commited on Jun 16, 2024

Commit

080053f

·

verified ·

1 Parent(s): 614b5db

Update README.md

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -27,12 +27,16 @@ This Named Entity Recognition (NER) model is designed to extract book titles fro
 The model has been fine-tuned and evaluated on a Dutch dataset consisting of 12,535 book reviews from the Leeuwarder Courant, identifying 23,529 book titles. The dataset utilizes the IO Tagging Schema. The data was divided into a training set (70%), validation set (15%), and test set (15%). Training involved the Majority or Minority loss function, achieving an F1 score of 84.3%, Precision of 83.4%, and Recall of 85.2% on the test set.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/661fcac6ccc447675983951b/Ap95lefSlrwJGDg6eupVF.png)
-### Model Description
 - **Model type:** XML-RoBERTa
 - **Language(s):** Dutch
 - **Fine-tuned from model:** [FacebookAI/xlm-roberta-large-finetuned-conll03-english](https://huggingface.co/FacebookAI/xlm-roberta-large-finetuned-conll03-english)
 ## Uses
 This model is intended for extracting book titles from Dutch texts, particularly useful for applications involving text analysis in the literary domain.

 The model has been fine-tuned and evaluated on a Dutch dataset consisting of 12,535 book reviews from the Leeuwarder Courant, identifying 23,529 book titles. The dataset utilizes the IO Tagging Schema. The data was divided into a training set (70%), validation set (15%), and test set (15%). Training involved the Majority or Minority loss function, achieving an F1 score of 84.3%, Precision of 83.4%, and Recall of 85.2% on the test set.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/661fcac6ccc447675983951b/Ap95lefSlrwJGDg6eupVF.png)
+## Model Description
 - **Model type:** XML-RoBERTa
 - **Language(s):** Dutch
 - **Fine-tuned from model:** [FacebookAI/xlm-roberta-large-finetuned-conll03-english](https://huggingface.co/FacebookAI/xlm-roberta-large-finetuned-conll03-english)
+## Model Flaws
+- Struggles with accurately identifying subtitles of book titles.
+- When a book title is mentioned multiple times within the same review, the model tends to mark it only once, missing subsequent occurrences.
 ## Uses
 This model is intended for extracting book titles from Dutch texts, particularly useful for applications involving text analysis in the literary domain.