Based on microsoft/deberta-v3-base, finetuned on a synthetic dataset (6 labels).

Performance on test dataset:

              precision    recall  f1-score   support

           0       0.56      0.73      0.63        26
           1       0.70      1.00      0.82        28
           2       0.68      0.53      0.60        32
           3       0.97      1.00      0.99        33
           4       1.00      0.97      0.98        33
           5       0.52      0.33      0.41        36

    accuracy                           0.75       188
   macro avg       0.74      0.76      0.74       188
weighted avg       0.74      0.75      0.74       188


Performance on similar benchmark:

             precision    recall  f1-score   support

           0       0.22      0.83      0.34        23
           1       0.50      0.01      0.03        75
           2       0.19      0.26      0.22        19

    accuracy                           0.21       117
   macro avg       0.30      0.37      0.20       117
weighted avg       0.39      0.21      0.12       117