Based on microsoft/deberta-v3-base, finetuned on a synthetic dataset (6 labels were converted to 3 labels). Performance on test dataset: precision recall f1-score support 0 0.98 0.99 0.98 94 1 0.96 0.96 0.96 28 2 1.00 0.98 0.99 66 accuracy 0.98 188 macro avg 0.98 0.98 0.98 188 weighted avg 0.98 0.98 0.98 188 Performance on similar benchmark: precision recall f1-score support 0 0.13 0.52 0.21 23 1 0.44 0.15 0.22 75 2 0.00 0.00 0.00 19 accuracy 0.20 117 macro avg 0.19 0.22 0.14 117 weighted avg 0.31 0.20 0.18 117