Text classification model for definition recognition in German scientific texts

G-SciEdBERT-definition_classification is a text classification model in the scientific domain in German, finetuned from the model G-SciEdBERT. It was trained using a custom annotated dataset of around 10,000 training and 2,000 test examples containing definition- and non-definition-related sentences from wikipedia articles in german. The model was selected to compare it to gbert-base-definition_classification which achieves slightly higher accuracy and less loss.

The model is specifically designed to recognize and classify sentences as definition or non-definition sentences:

Text Classification Tag	Text Classification Label	Description
0	NON_DEF_SENTENCE	Text equals a non-definitional sentence
1	DEF_SENTENCE	Text equals a definitional sentence

Training was conducted using a standard Text classification objective. The model achieves an accuracy of approximately 96% on the evaluation set.

Here are the overall final metrics on the test dataset after 4 epochs of training:

Accuracy: 0.9597156398104265
Loss: 0.20282548666000366

Usage

from transformers import pipeline

pipe = pipeline("text-classification", model="samirmsallem/G-SciEdBERT-definition_classification")

results = pipe(['Natural Language Processing ist ein Verfahren der künstlichen Intelligenz.',
                'Rosen sind rot, Veilchen sind blau.'])
print(results)

# [{'label': 'DEF_SENTENCE', 'score': 0.9990215301513672}, {'label': 'NON_DEF_SENTENCE', 'score': 0.9968277812004089}]

Downloads last month: 3

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for samirmsallem/G-SciEdBERT-definition_classification

Base model

ai4stem-uga/G-SciEdBERT

Finetuned

(1)

this model

Dataset used to train samirmsallem/G-SciEdBERT-definition_classification

Collection including samirmsallem/G-SciEdBERT-definition_classification

Definition classification

Collection

Definition classification based on Sentence Classification • 3 items • Updated May 23, 2025

Evaluation results

Accuracy on samirmsallem/wiki_definitions_de_multitask
self-reported

0.960