samirmsallem
/

gbert-base-argument_mining

Text Classification

Model card Files Files and versions

samirmsallem commited on May 28, 2025

Commit

cc90b80

·

verified ·

1 Parent(s): 631dfb6

Create README.md

Files changed (1) hide show

README.md +64 -0

README.md ADDED Viewed

	@@ -0,0 +1,64 @@

+---
+datasets:
+- samirmsallem/argument_mining_de
+language:
+- de
+metrics:
+- accuracy
+base_model:
+- deepset/gbert-base
+pipeline_tag: text-classification
+library_name: transformers
+model-index:
+- name: checkpoints
+  results:
+  - task:
+      name: Text Classification
+      type: text-classification
+    dataset:
+      name: samirmsallem/argument_mining_de
+      type: samirmsallem/argument_mining_de
+    metrics:
+    - name: Accuracy
+      type: accuracy
+      value: 0.9657534246575342
+---
+## Text classification model for argument mining and detection
+**gbert-base-argument_mining** is a text classification model in the scientific domain in German, finetuned from the model [gbert-base](https://huggingface.co/deepset/gbert-base).
+It was trained using a [synthetically created, annotated dataset](https://huggingface.co/datasets/samirmsallem/argument_mining_de) containing different sentence types occuring in conclusions of scientific theses and papers.
+### Training
+Training was conducted on a 10 epoch fine-tuning approach, however this repository contains the results of the second epoch, since it has the best accuracy:
+| epoch | accuracy          | loss               |
+|-------|-------------------|--------------------|
+| 1.0   | 0.9315            | 0.3872             |
+| 2.0   | 0.9178            | 0.2987             |
+| 3.0   | 0.9589            | 0.1519             |
+| 4.0   | **0.9658**        | **0.1162**         |
+| 5.0   | 0.9521            | 0.2100             |
+| 6.0   | 0.9521            | 0.1979             |
+| 7.0   | 0.9521            | 0.2453             |
+| 8.0   | 0.9521            | 0.2251             |
+| 9.0   | 0.9452            | 0.2225             |
+| 10.0  | 0.9521            | 0.2286             |
+In relation to the dataset, the model demonstrates that it can effectively learn to distinguish between the two classes claim and premise. However, the rapid onset of overfitting after epoch 2 suggests that the dataset is imbalanced and noisy. Further work should enable the model to be trained on more robust data to ensure better evaluation results.
+### Text Classification Tags
+|Text Classification Tag| Text Classification Label |
+| :----:                |    :----:                 |
+| 0                     | CLAIM          |
+| 1                     | COUNTERCLAIM          |
+| 2                     | LINK          |
+| 3                     | CONC          |
+| 4                     | FUT          |
+| 5                     | OTH          |