Text Multi-Label Sequence Classification model used to decode if passages contain a misfortunate event, a cause for misfortune, and/or an action to mollify or prevent some misfortune. 8293 passages were used for Training and split into 5 folds (~6634 for the train set, ~1659 for the validation set over 5 folds). However, as previous comparisons between folds have shown no difference in accuracy or overfitting, to save on computation, we only trained on the 1st fold.

Parameters:
Transformer: roberta-base
Tokenizer: roberta-base
Head: multi_label_classification
learning rate: 2e-05
weight decay: .01
Dropout: .1
Batch Size: 8
Epochs: 15
Metric for best model: F1 micro

Using epoch 15, the current F1 micro score of 2074 passages not used for training is .664 which is an improvement compared to the identical model with distilbert which achieved F1 micro of .637. Individual class f1 scores are shown below. Note that some labels have been excluded as they are not relevant for the final use of the model.

EVENT: -
- Illness: .876
- Accident: .458
- Other: .588
CAUSE: -
- Just Happens: -
- Material Physical: .476
- Spirits and Gods: .728
- Witchcraft and Sorcery: .651
- Rule Violation Taboo: .517
- Jealous Evil Eye: -
ACTION: -
- Physical Material: .672
- Technical Specialist: .5
- Divination: .406
- Shaman Medium Healer: .582
- Priest High Religion: .375
- Other: -

The quick demo is no longer available at this time in hugging face's API

Downloads last month: 4

Safetensors

Model size

0.1B params

Tensor type

F32