|
|
--- |
|
|
datasets: |
|
|
- samirmsallem/wiki_definitions_de_multitask |
|
|
language: |
|
|
- de |
|
|
base_model: |
|
|
- deepset/gbert-base |
|
|
pipeline_tag: text-classification |
|
|
library_name: transformers |
|
|
tags: |
|
|
- science |
|
|
- ner |
|
|
- def_extraction |
|
|
- definitions |
|
|
metrics: |
|
|
- accuracy |
|
|
model-index: |
|
|
- name: checkpoints |
|
|
results: |
|
|
- task: |
|
|
name: Text Classification |
|
|
type: text-classification |
|
|
dataset: |
|
|
name: samirmsallem/wiki_definitions_de_multitask |
|
|
type: samirmsallem/wiki_definitions_de_multitask |
|
|
metrics: |
|
|
- name: Accuracy |
|
|
type: accuracy |
|
|
value: 0.9630331753554502 |
|
|
--- |
|
|
|
|
|
## Text classification model for definition recognition in German scientific texts |
|
|
|
|
|
**gbert-base-definition_classification** is a text classification model in the scientific domain in German, finetuned from the model [gbert-base](https://huggingface.co/deepset/gbert-base). |
|
|
It was trained using a custom annotated dataset of around 10,000 training and 2,000 test examples containing definition- and non-definition-related sentences from wikipedia articles in german. |
|
|
The model was selected because it overall achieved the best score in the [NER task](https://huggingface.co/samirmsallem/gbert-base-definitions_ner#model-performance-comparision-on-wiki_definitions_de_multitask) |
|
|
|
|
|
The model is specifically designed to recognize and classify sentences as definition or non-definition sentences: |
|
|
|
|
|
|Text Classification Tag| Text Classification Label | Description | |
|
|
| :----: | :----: | :----: | |
|
|
| 0 | NON_DEF_SENTENCE | Text equals a non-definitional sentence | |
|
|
| 1 | DEF_SENTENCE | Text equals a definitional sentence | |
|
|
|
|
|
Training was conducted using a standard Text classification objective. The model achieves an accuracy of approximately 96% on the evaluation set. |
|
|
|
|
|
Here are the overall final metrics on the test dataset after 4 epochs of training: |
|
|
- **Accuracy**: 0.9630331753554502 |
|
|
- **Loss**: 0.17300711572170258 |
|
|
|
|
|
|
|
|
### Usage |
|
|
|
|
|
|
|
|
``` |
|
|
from transformers import pipeline |
|
|
|
|
|
pipe = pipeline("text-classification", model="samirmsallem/gbert-base-definition_classification") |
|
|
|
|
|
results = pipe(['Natural Language Processing ist ein Verfahren der künstlichen Intelligenz.', |
|
|
'Rosen sind rot, Veilchen sind blau.']) |
|
|
print(results) |
|
|
|
|
|
# [{'label': 'DEF_SENTENCE', 'score': 0.9995753169059753}, {'label': 'NON_DEF_SENTENCE', 'score': 0.999630331993103}] |
|
|
``` |