|
|
--- |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- accuracy |
|
|
- precision |
|
|
- recall |
|
|
- f1 |
|
|
library_name: transformers |
|
|
pipeline_tag: text-classification |
|
|
--- |
|
|
# bert-large-relation14 |
|
|
|
|
|
Finetuned BERT model for 14-class classification. It was introduced in the paper: [Automatic Slide Generation Using Discourse Relations](https://link.springer.com/chapter/10.1007/978-3-031-36336-8_61) and first released in this repository. This model is uncased: it does not make a difference between english and English. |
|
|
|
|
|
In our proposed method in this [paper](https://link.springer.com/chapter/10.1007/978-3-031-36336-8_61), we only used this model for the classification of discourse relation between the FIRST and SECOND sentence in summarized sentences. The model that is used between the other sentences is [this model](https://huggingface.co/teppei727/bert_woco). If you are curious about our proposed method, it's better to see that model. |
|
|
|
|
|
# Descliption |
|
|
|
|
|
This model can classify the relation between the sentence pair of input. |
|
|
|
|
|
Now we are working on preparing the Model card. Please wait for a few days. |
|
|
|
|
|
|
|
|
The model trained from [bert-large-uncased](https://huggingface.co/bert-large-uncased) on the dataset published in the paper: [Automatic Prediction of Discourse Connectives](https://arxiv.org/abs/1702.00992). |
|
|
|
|
|
The dataset to make this model is based on English Wikipedia data and has 20 labels. However, this model will classify into 14 labels. This is because the 20-class data set was restructured to 14 classes to suit our research objective of "automatic slide generation. This distribution is shown below. |
|
|
|
|
|
|
|
|
|Level 1|Level 2|Level 3|Connectives (20)| |
|
|
|-------------|-----------------|------------------|--------------------| |
|
|
| Temporal | Synchronous | | meanwhile | |
|
|
| Temporal | Asynchronous | Precedence | then, | |
|
|
| Temporal | Asynchronous | Precedence | finally, | |
|
|
| Temporal | Asynchronous | Succession | by then | |
|
|
| Contingency | Cause | Result | therefore | |
|
|
| Comparison | Concession | Arg2-as-denier | however, | |
|
|
| Comparison | Concession | Arg2-as-denier | nevertheless | |
|
|
| Comparison | Contrast | | on the other hand, | |
|
|
| Comparison | Contrast | | by contrast, | |
|
|
| Expansion | Conjunction | | and | |
|
|
| Expansion | Conjunction | | moreover | |
|
|
| Expansion | Conjunction | | indeed | |
|
|
| Expansion | Equivalence | | in other words | |
|
|
| Expansion | Exception | Arg1-as-excpt | otherwise | |
|
|
| Expansion | Instantiation | Arg2-as-instance | for example, | |
|
|
| Expansion | Level-of-detail | Arg1-as-detail | overall, | |
|
|
| Expansion | Level-of-detail | Arg2-as-detail | in particular, | |
|
|
| Expansion | Substitution | Arg2-as-subst | instead | |
|
|
| Expansion | Substitution | Arg2-as-subst | rather | |
|
|
|
|
|
# Training |
|
|
|
|
|
The model was trained using AutoModelForSequenceClassification.from_pretrained |
|
|
|
|
|
``` |
|
|
training_args = TrainingArguments( |
|
|
output_dir = output_dir, |
|
|
save_strategy="epoch", |
|
|
num_train_epochs = 5, |
|
|
per_device_train_batch_size=16, |
|
|
per_device_eval_batch_size=32, |
|
|
warmup_steps=0, |
|
|
weight_decay=0.01, |
|
|
logging_dir="./logs", |
|
|
evaluation_strategy="epoch", |
|
|
learning_rate=2e-5, |
|
|
metric_for_best_model="f1", |
|
|
load_best_model_at_end=True |
|
|
) |
|
|
``` |
|
|
|
|
|
# Evaluation (14 labels and original 20 labels classification) using the dataset test split gives: |
|
|
|
|
|
| Model | Macro F1 | Accuracy | Precision | Recall | |
|
|
|--------------------------|-----------------|-----------------|------------------|---------------| |
|
|
| 14 labels classification | 0.586 | 0.589 | 0.630 | 0.591 | |
|
|
| 20 labels classification | 0.478 | 0.488 | 0.536 | 0.488 | |