Upload README.md with huggingface_hub
#2
by
lbourdois
- opened
README.md
CHANGED
|
@@ -1,63 +1,3 @@
|
|
| 1 |
---
|
| 2 |
-
language:
|
| 3 |
-
widget:
|
| 4 |
-
- text: "It has been determined that the amount of greenhouse gases have decreased by almost half because of the prevalence in the utilization of nuclear power."
|
| 5 |
---
|
| 6 |
-
|
| 7 |
-
### Welcome to RoBERTArg!
|
| 8 |
-
|
| 9 |
-
🤖 **Model description**
|
| 10 |
-
|
| 11 |
-
This model was trained on ~25k heterogeneous manually annotated sentences (📚 [Stab et al. 2018](https://www.aclweb.org/anthology/D18-1402/)) of controversial topics to classify text into one of two labels: 🏷 **NON-ARGUMENT** (0) and **ARGUMENT** (1).
|
| 12 |
-
|
| 13 |
-
🗃 **Dataset**
|
| 14 |
-
|
| 15 |
-
The dataset (📚 Stab et al. 2018) consists of **ARGUMENTS** (\~11k) that either support or oppose a topic if it includes a relevant reason for supporting or opposing the topic, or as a **NON-ARGUMENT** (\~14k) if it does not include reasons. The authors focus on controversial topics, i.e., topics that include "an obvious polarity to the possible outcomes" and compile a final set of eight controversial topics: _abortion, school uniforms, death penalty, marijuana legalization, nuclear energy, cloning, gun control, and minimum wage_.
|
| 16 |
-
|
| 17 |
-
| TOPIC | ARGUMENT | NON-ARGUMENT |
|
| 18 |
-
|----|----|----|
|
| 19 |
-
| abortion | 2213 | 2,427 |
|
| 20 |
-
| school uniforms | 325 | 1,734 |
|
| 21 |
-
| death penalty | 325 | 2,083 |
|
| 22 |
-
| marijuana legalization | 325 | 1,262 |
|
| 23 |
-
| nuclear energy | 325 | 2,118 |
|
| 24 |
-
| cloning | 325 | 1,494 |
|
| 25 |
-
| gun control | 325 | 1,889 |
|
| 26 |
-
| minimum wage | 325 | 1,346 |
|
| 27 |
-
|
| 28 |
-
🏃🏼♂️**Model training**
|
| 29 |
-
|
| 30 |
-
**RoBERTArg** was fine-tuned on a RoBERTA (base) pre-trained model from HuggingFace using the HuggingFace trainer with the following hyperparameters:
|
| 31 |
-
|
| 32 |
-
```
|
| 33 |
-
training_args = TrainingArguments(
|
| 34 |
-
num_train_epochs=2,
|
| 35 |
-
learning_rate=2.3102e-06,
|
| 36 |
-
seed=8,
|
| 37 |
-
per_device_train_batch_size=64,
|
| 38 |
-
per_device_eval_batch_size=64,
|
| 39 |
-
)
|
| 40 |
-
```
|
| 41 |
-
|
| 42 |
-
📊 **Evaluation**
|
| 43 |
-
|
| 44 |
-
The model was evaluated on an evaluation set (20%):
|
| 45 |
-
|
| 46 |
-
| Model | Acc | F1 | R arg | R non | P arg | P non |
|
| 47 |
-
|----|----|----|----|----|----|----|
|
| 48 |
-
| RoBERTArg | 0.8193 | 0.8021 | 0.8463 | 0.7986 | 0.7623 | 0.8719 |
|
| 49 |
-
|
| 50 |
-
Showing the **confusion matrix** using again the evaluation set:
|
| 51 |
-
|
| 52 |
-
| | ARGUMENT | NON-ARGUMENT |
|
| 53 |
-
|----|----|----|
|
| 54 |
-
| ARGUMENT | 2213 | 558 |
|
| 55 |
-
| NON-ARGUMENT | 325 | 1790 |
|
| 56 |
-
|
| 57 |
-
⚠️ **Intended Uses & Potential Limitations**
|
| 58 |
-
|
| 59 |
-
The model can only be a starting point to dive into the exciting field of argument mining. But be aware. An argument is a complex structure, with multiple dependencies. Therefore, the model may perform less well on different topics and text types not included in the training set.
|
| 60 |
-
|
| 61 |
-
Enjoy and stay tuned! 🚀
|
| 62 |
-
|
| 63 |
-
🐦 Twitter: [@chklamm](http://twitter.com/chklamm)
|
|
|
|
| 1 |
---
|
| 2 |
+
language: en
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|