samirmsallem's picture
Update README.md
36dc731 verified
---
datasets:
- samirmsallem/wiki_definitions_de_multitask
language:
- de
base_model:
- deepset/gbert-base
pipeline_tag: text-classification
library_name: transformers
tags:
- science
- ner
- def_extraction
- definitions
metrics:
- accuracy
model-index:
- name: checkpoints
results:
- task:
name: Text Classification
type: text-classification
dataset:
name: samirmsallem/wiki_definitions_de_multitask
type: samirmsallem/wiki_definitions_de_multitask
metrics:
- name: Accuracy
type: accuracy
value: 0.9630331753554502
---
## Text classification model for definition recognition in German scientific texts
**gbert-base-definition_classification** is a text classification model in the scientific domain in German, finetuned from the model [gbert-base](https://huggingface.co/deepset/gbert-base).
It was trained using a custom annotated dataset of around 10,000 training and 2,000 test examples containing definition- and non-definition-related sentences from wikipedia articles in german.
The model was selected because it overall achieved the best score in the [NER task](https://huggingface.co/samirmsallem/gbert-base-definitions_ner#model-performance-comparision-on-wiki_definitions_de_multitask)
The model is specifically designed to recognize and classify sentences as definition or non-definition sentences:
|Text Classification Tag| Text Classification Label | Description |
| :----: | :----: | :----: |
| 0 | NON_DEF_SENTENCE | Text equals a non-definitional sentence |
| 1 | DEF_SENTENCE | Text equals a definitional sentence |
Training was conducted using a standard Text classification objective. The model achieves an accuracy of approximately 96% on the evaluation set.
Here are the overall final metrics on the test dataset after 4 epochs of training:
- **Accuracy**: 0.9630331753554502
- **Loss**: 0.17300711572170258
### Usage
```
from transformers import pipeline
pipe = pipeline("text-classification", model="samirmsallem/gbert-base-definition_classification")
results = pipe(['Natural Language Processing ist ein Verfahren der künstlichen Intelligenz.',
'Rosen sind rot, Veilchen sind blau.'])
print(results)
# [{'label': 'DEF_SENTENCE', 'score': 0.9995753169059753}, {'label': 'NON_DEF_SENTENCE', 'score': 0.999630331993103}]
```