--- datasets: - samirmsallem/wiki_definitions_de_multitask language: - de base_model: - deepset/gbert-base pipeline_tag: text-classification library_name: transformers tags: - science - ner - def_extraction - definitions metrics: - accuracy model-index: - name: checkpoints results: - task: name: Text Classification type: text-classification dataset: name: samirmsallem/wiki_definitions_de_multitask type: samirmsallem/wiki_definitions_de_multitask metrics: - name: Accuracy type: accuracy value: 0.9630331753554502 --- ## Text classification model for definition recognition in German scientific texts **gbert-base-definition_classification** is a text classification model in the scientific domain in German, finetuned from the model [gbert-base](https://huggingface.co/deepset/gbert-base). It was trained using a custom annotated dataset of around 10,000 training and 2,000 test examples containing definition- and non-definition-related sentences from wikipedia articles in german. The model was selected because it overall achieved the best score in the [NER task](https://huggingface.co/samirmsallem/gbert-base-definitions_ner#model-performance-comparision-on-wiki_definitions_de_multitask) The model is specifically designed to recognize and classify sentences as definition or non-definition sentences: |Text Classification Tag| Text Classification Label | Description | | :----: | :----: | :----: | | 0 | NON_DEF_SENTENCE | Text equals a non-definitional sentence | | 1 | DEF_SENTENCE | Text equals a definitional sentence | Training was conducted using a standard Text classification objective. The model achieves an accuracy of approximately 96% on the evaluation set. Here are the overall final metrics on the test dataset after 4 epochs of training: - **Accuracy**: 0.9630331753554502 - **Loss**: 0.17300711572170258 ### Usage ``` from transformers import pipeline pipe = pipeline("text-classification", model="samirmsallem/gbert-base-definition_classification") results = pipe(['Natural Language Processing ist ein Verfahren der künstlichen Intelligenz.', 'Rosen sind rot, Veilchen sind blau.']) print(results) # [{'label': 'DEF_SENTENCE', 'score': 0.9995753169059753}, {'label': 'NON_DEF_SENTENCE', 'score': 0.999630331993103}] ```