Text classification model for definition recognition in German scientific texts
gbert-base-definition_classification is a text classification model in the scientific domain in German, finetuned from the model gbert-base. It was trained using a custom annotated dataset of around 10,000 training and 2,000 test examples containing definition- and non-definition-related sentences from wikipedia articles in german. The model was selected because it overall achieved the best score in the NER task
The model is specifically designed to recognize and classify sentences as definition or non-definition sentences:
| Text Classification Tag | Text Classification Label | Description |
|---|---|---|
| 0 | NON_DEF_SENTENCE | Text equals a non-definitional sentence |
| 1 | DEF_SENTENCE | Text equals a definitional sentence |
Training was conducted using a standard Text classification objective. The model achieves an accuracy of approximately 96% on the evaluation set.
Here are the overall final metrics on the test dataset after 4 epochs of training:
- Accuracy: 0.9630331753554502
- Loss: 0.17300711572170258
Usage
from transformers import pipeline
pipe = pipeline("text-classification", model="samirmsallem/gbert-base-definition_classification")
results = pipe(['Natural Language Processing ist ein Verfahren der künstlichen Intelligenz.',
'Rosen sind rot, Veilchen sind blau.'])
print(results)
# [{'label': 'DEF_SENTENCE', 'score': 0.9995753169059753}, {'label': 'NON_DEF_SENTENCE', 'score': 0.999630331993103}]
- Downloads last month
- 150
Model tree for samirmsallem/gbert-base-definition_classification
Base model
deepset/gbert-baseDataset used to train samirmsallem/gbert-base-definition_classification
Collection including samirmsallem/gbert-base-definition_classification
Evaluation results
- Accuracy on samirmsallem/wiki_definitions_de_multitaskself-reported0.963