Based on microsoft/deberta-v3-base, finetuned on a synthetic dataset (6 labels were converted to 3 labels).

Performance on test dataset:
              precision    recall  f1-score   support

           0       0.98      0.99      0.98        94
           1       0.96      0.96      0.96        28
           2       1.00      0.98      0.99        66

    accuracy                           0.98       188
    
   macro avg       0.98      0.98      0.98       188
   
weighted avg       0.98      0.98      0.98       188


Performance on similar benchmark:
             precision    recall  f1-score   support

           0       0.13      0.52      0.21        23
           1       0.44      0.15      0.22        75
           2       0.00      0.00      0.00        19

    accuracy                           0.20       117
    
   macro avg       0.19      0.22      0.14       117
   
weighted avg       0.31      0.20      0.18       117