Update README.md
Browse files
README.md
CHANGED
|
@@ -4,7 +4,7 @@ Welcome to **RoBERTArg**!
|
|
| 4 |
|
| 5 |
This model was trained on ~25k heterogeneous manually annotated sentences (π Stab et al. 2018) of controversial topics to classify text into one of two labels: π· **NON-ARGUMENT** (0) and **ARGUMENT** (1).
|
| 6 |
|
| 7 |
-
**Dataset**
|
| 8 |
|
| 9 |
The dataset (π Stab et al. 2018) consists of **ARGUMENTS** (\~11k) that either support or oppose a topic if it includes a relevant reason for supporting or opposing the topic, or as a **NON-ARGUMENT** (\~14k) if it does not include reasons. The authors focus on controversial topics, i.e., topics that include an obvious polarity to the possible outcomes and compile a final set of eight controversial topics: _abortion, school uniforms, death penalty, marijuana legalization, nuclear energy, cloning, gun control, and minimum wage_.
|
| 10 |
|
|
@@ -19,7 +19,7 @@ The dataset (π Stab et al. 2018) consists of **ARGUMENTS** (\~11k) that eithe
|
|
| 19 |
| gun control | 325 | 1,889 |
|
| 20 |
| minimum wage | 325 | 1,346 |
|
| 21 |
|
| 22 |
-
|
| 23 |
|
| 24 |
**RoBERTArg** was fine-tuned on a RoBERTA (base) pre-trained model from HuggingFace using the HuggingFace trainer with the following hyperparameters. The hyperparameters were determined using a hyperparameter search on a 20% validation set.
|
| 25 |
|
|
@@ -33,7 +33,7 @@ training_args = TrainingArguments(
|
|
| 33 |
)
|
| 34 |
```
|
| 35 |
|
| 36 |
-
**Evaluation**
|
| 37 |
|
| 38 |
The model was evaluated using 20% of the sentences (80-20 train-test split).
|
| 39 |
|
|
@@ -48,7 +48,7 @@ Showing the **confusion matrix** using the 20% of the sentences as an evaluation
|
|
| 48 |
| ARGUMENT | 2213 | 558 |
|
| 49 |
| NON-ARGUMENT | 325 | 1790 |
|
| 50 |
|
| 51 |
-
**Intended Uses & Potential Limitations**
|
| 52 |
|
| 53 |
The model can be a starting point to dive into the exciting area of argument mining. But be aware. An argument is a complex structure, topic-dependent, and often differs between different text types. Therefore, the model may perform less well on different topics and text types, which are not included in the training set.
|
| 54 |
|
|
|
|
| 4 |
|
| 5 |
This model was trained on ~25k heterogeneous manually annotated sentences (π Stab et al. 2018) of controversial topics to classify text into one of two labels: π· **NON-ARGUMENT** (0) and **ARGUMENT** (1).
|
| 6 |
|
| 7 |
+
π **Dataset**
|
| 8 |
|
| 9 |
The dataset (π Stab et al. 2018) consists of **ARGUMENTS** (\~11k) that either support or oppose a topic if it includes a relevant reason for supporting or opposing the topic, or as a **NON-ARGUMENT** (\~14k) if it does not include reasons. The authors focus on controversial topics, i.e., topics that include an obvious polarity to the possible outcomes and compile a final set of eight controversial topics: _abortion, school uniforms, death penalty, marijuana legalization, nuclear energy, cloning, gun control, and minimum wage_.
|
| 10 |
|
|
|
|
| 19 |
| gun control | 325 | 1,889 |
|
| 20 |
| minimum wage | 325 | 1,346 |
|
| 21 |
|
| 22 |
+
ππΌββοΈ**Model training**
|
| 23 |
|
| 24 |
**RoBERTArg** was fine-tuned on a RoBERTA (base) pre-trained model from HuggingFace using the HuggingFace trainer with the following hyperparameters. The hyperparameters were determined using a hyperparameter search on a 20% validation set.
|
| 25 |
|
|
|
|
| 33 |
)
|
| 34 |
```
|
| 35 |
|
| 36 |
+
π **Evaluation**
|
| 37 |
|
| 38 |
The model was evaluated using 20% of the sentences (80-20 train-test split).
|
| 39 |
|
|
|
|
| 48 |
| ARGUMENT | 2213 | 558 |
|
| 49 |
| NON-ARGUMENT | 325 | 1790 |
|
| 50 |
|
| 51 |
+
β οΈ **Intended Uses & Potential Limitations**
|
| 52 |
|
| 53 |
The model can be a starting point to dive into the exciting area of argument mining. But be aware. An argument is a complex structure, topic-dependent, and often differs between different text types. Therefore, the model may perform less well on different topics and text types, which are not included in the training set.
|
| 54 |
|