---
datasets:
- samirmsallem/wiki_definitions_de_multitask
language:
- de
base_model:
- deepset/gbert-base
pipeline_tag: text-classification
library_name: transformers
tags:
- science
- ner
- def_extraction
- definitions
metrics:
- accuracy
model-index:
- name: checkpoints
  results:
  - task:
      name: Text Classification
      type: text-classification
    dataset:
      name: samirmsallem/wiki_definitions_de_multitask
      type: samirmsallem/wiki_definitions_de_multitask
    metrics:
    - name: Accuracy
      type: accuracy
      value: 0.9630331753554502
---

## Text classification model for definition recognition in German scientific texts

**gbert-base-definition_classification** is a text classification model in the scientific domain in German, finetuned from the model [gbert-base](https://huggingface.co/deepset/gbert-base). 
It was trained using a custom annotated dataset of around 10,000 training and 2,000 test examples containing definition- and non-definition-related sentences from wikipedia articles in german.
The model was selected because it overall achieved the best score in the [NER task](https://huggingface.co/samirmsallem/gbert-base-definitions_ner#model-performance-comparision-on-wiki_definitions_de_multitask)

The model is specifically designed to recognize and classify sentences as definition or non-definition sentences:

|Text Classification Tag| Text Classification Label | Description                             |
| :----:                |    :----:                 |    :----:   |
| 0                     | NON_DEF_SENTENCE          | Text equals a non-definitional sentence |
| 1                     | DEF_SENTENCE              | Text equals a definitional sentence |

Training was conducted using a standard Text classification objective. The model achieves an accuracy of approximately 96% on the evaluation set.

Here are the overall final metrics on the test dataset after 4 epochs of training:
  - **Accuracy**: 0.9630331753554502
  - **Loss**: 0.17300711572170258


### Usage


```
from transformers import pipeline

pipe = pipeline("text-classification", model="samirmsallem/gbert-base-definition_classification")

results = pipe(['Natural Language Processing ist ein Verfahren der künstlichen Intelligenz.',
                'Rosen sind rot, Veilchen sind blau.'])
print(results)

# [{'label': 'DEF_SENTENCE', 'score': 0.9995753169059753}, {'label': 'NON_DEF_SENTENCE', 'score': 0.999630331993103}]
```