| precision recall f1-score support | |
| 0 0.90 0.80 0.85 100 | |
| 1 0.96 0.98 0.97 532 | |
| accuracy 0.95 632 | |
| macro avg 0.93 0.89 0.91 632 | |
| weighted avg 0.95 0.95 0.95 632 | |
| # Notes | |
| # Best Model - Test Accuracy: 0.9541 | |
| # Best epoch: 3 (val F1 0.9840) | |
| # Model: roberta-large, 10 epochs, binary single-label classification | |
| # Train/Dev/Test rows: 3396 / 627 / 632 | |
| # Label semantics: 0 = no_relation, 1 = causal (positive) | |
| # Train label dist: 1=0.9167, 0=0.0833 | |