precision recall f1-score support 0 0.90 0.80 0.85 100 1 0.96 0.98 0.97 532 accuracy 0.95 632 macro avg 0.93 0.89 0.91 632 weighted avg 0.95 0.95 0.95 632 # Notes # Best Model - Test Accuracy: 0.9541 # Best epoch: 3 (val F1 0.9840) # Model: roberta-large, 10 epochs, binary single-label classification # Train/Dev/Test rows: 3396 / 627 / 632 # Label semantics: 0 = no_relation, 1 = causal (positive) # Train label dist: 1=0.9167, 0=0.0833