samirmsallem's picture
Update README.md
3b2e1ca verified
---
datasets:
- samirmsallem/argument_mining_de
language:
- de
metrics:
- accuracy
base_model:
- deepset/gbert-base
pipeline_tag: text-classification
library_name: transformers
model-index:
- name: checkpoints
results:
- task:
name: Text Classification
type: text-classification
dataset:
name: samirmsallem/argument_mining_de
type: samirmsallem/argument_mining_de
metrics:
- name: Accuracy
type: accuracy
value: 0.9657534246575342
---
## Text classification model for argument mining and detection
**gbert-base-argument_mining** is a text classification model in the scientific domain in German, finetuned from the model [gbert-base](https://huggingface.co/deepset/gbert-base).
It was trained using a [synthetically created, annotated dataset](https://huggingface.co/datasets/samirmsallem/argument_mining_de) containing different sentence types occuring in conclusions of scientific theses and papers.
### Training
Training was conducted on a 10 epoch fine-tuning approach, however this repository contains the results of the fourth epoch, since it has the best accuracy:
| epoch | accuracy | loss |
|-------|-------------------|--------------------|
| 1.0 | 0.9315 | 0.3872 |
| 2.0 | 0.9178 | 0.2987 |
| 3.0 | 0.9589 | 0.1519 |
| 4.0 | **0.9658** | **0.1162** |
| 5.0 | 0.9521 | 0.2100 |
| 6.0 | 0.9521 | 0.1979 |
| 7.0 | 0.9521 | 0.2453 |
| 8.0 | 0.9521 | 0.2251 |
| 9.0 | 0.9452 | 0.2225 |
| 10.0 | 0.9521 | 0.2286 |
In relation to the dataset, the model demonstrates that it can effectively learn to distinguish between the two classes claim and premise. However, the rapid onset of overfitting after epoch 4 suggests that the dataset is imbalanced and noisy. Further work should enable the model to be trained on more robust data to ensure better evaluation results.
### Text Classification Tags
|Text Classification Tag| Text Classification Label |
| :----: | :----: |
| 0 | CLAIM |
| 1 | COUNTERCLAIM |
| 2 | LINK |
| 3 | CONC |
| 4 | FUT |
| 5 | OTH |