Based on microsoft/deberta-v3-base, finetuned on a synthetic dataset (6 labels). Performance on test dataset: precision recall f1-score support 0 0.56 0.73 0.63 26 1 0.70 1.00 0.82 28 2 0.68 0.53 0.60 32 3 0.97 1.00 0.99 33 4 1.00 0.97 0.98 33 5 0.52 0.33 0.41 36 accuracy 0.75 188 macro avg 0.74 0.76 0.74 188 weighted avg 0.74 0.75 0.74 188 Performance on similar benchmark: precision recall f1-score support 0 0.22 0.83 0.34 23 1 0.50 0.01 0.03 75 2 0.19 0.26 0.22 19 accuracy 0.21 117 macro avg 0.30 0.37 0.20 117 weighted avg 0.39 0.21 0.12 117