lukasweber
/

WG_BERT

Token Classification

Model card Files Files and versions

lukasweber commited on Feb 27, 2023

Commit

2ba71c4

·

1 Parent(s): a585556

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -8,7 +8,7 @@ tags:
 WG-BERT (Warranty and Goodwill) is a pretrained encoder based model to analyze automotive entities in automotive-related texts. WG-BERT is trained by continually
 pretraining the BERT language model in the automotive domain by using a corpus of automotive (workshop feedback) texts via the masked language modeling (MLM) approach.
 WG-BERT is further fine-tuned for automotive entity recognition (subtask of Named Entity Recognition (NER)) to extract components and their complaints out of automotive texts.
-The dataset for continual pretraining consists of ~ 4 million sentences.
 The dataset for fine-tuning consists of ~5.500 gold annotated sentences by automotive domain experts.
 We choose as the training architecture the BERT-base-uncased version.

 WG-BERT (Warranty and Goodwill) is a pretrained encoder based model to analyze automotive entities in automotive-related texts. WG-BERT is trained by continually
 pretraining the BERT language model in the automotive domain by using a corpus of automotive (workshop feedback) texts via the masked language modeling (MLM) approach.
 WG-BERT is further fine-tuned for automotive entity recognition (subtask of Named Entity Recognition (NER)) to extract components and their complaints out of automotive texts.
+The dataset for continual pretraining consists of 1.8 million workshop feedback texts which contain ~4 million sentences.
 The dataset for fine-tuning consists of ~5.500 gold annotated sentences by automotive domain experts.
 We choose as the training architecture the BERT-base-uncased version.