Instructions to use lukasweber/WG_BERT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use lukasweber/WG_BERT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="lukasweber/WG_BERT")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("lukasweber/WG_BERT") model = AutoModelForTokenClassification.from_pretrained("lukasweber/WG_BERT") - Notebooks
- Google Colab
- Kaggle
Commit ·
2ba71c4
1
Parent(s): a585556
Update README.md
Browse files
README.md
CHANGED
|
@@ -8,7 +8,7 @@ tags:
|
|
| 8 |
WG-BERT (Warranty and Goodwill) is a pretrained encoder based model to analyze automotive entities in automotive-related texts. WG-BERT is trained by continually
|
| 9 |
pretraining the BERT language model in the automotive domain by using a corpus of automotive (workshop feedback) texts via the masked language modeling (MLM) approach.
|
| 10 |
WG-BERT is further fine-tuned for automotive entity recognition (subtask of Named Entity Recognition (NER)) to extract components and their complaints out of automotive texts.
|
| 11 |
-
The dataset for continual pretraining consists of
|
| 12 |
The dataset for fine-tuning consists of ~5.500 gold annotated sentences by automotive domain experts.
|
| 13 |
We choose as the training architecture the BERT-base-uncased version.
|
| 14 |
|
|
|
|
| 8 |
WG-BERT (Warranty and Goodwill) is a pretrained encoder based model to analyze automotive entities in automotive-related texts. WG-BERT is trained by continually
|
| 9 |
pretraining the BERT language model in the automotive domain by using a corpus of automotive (workshop feedback) texts via the masked language modeling (MLM) approach.
|
| 10 |
WG-BERT is further fine-tuned for automotive entity recognition (subtask of Named Entity Recognition (NER)) to extract components and their complaints out of automotive texts.
|
| 11 |
+
The dataset for continual pretraining consists of 1.8 million workshop feedback texts which contain ~4 million sentences.
|
| 12 |
The dataset for fine-tuning consists of ~5.500 gold annotated sentences by automotive domain experts.
|
| 13 |
We choose as the training architecture the BERT-base-uncased version.
|
| 14 |
|