lgsilvaesilva's picture
Upload folder using huggingface_hub
7adc93a verified

AMIS Commodity Classifier Training Report

  • Dataset: faodl/amis-agri-utilization
  • Dataset subset: ``
  • Dataset revision: ada4a04088a98f8f64bc7485c57d4c7f422c2151
  • Text column: chunk_text
  • Label column: label
  • Transformer: FacebookAI/xlm-roberta-base
  • Generated at: 2026-06-10T20:30:54.345579+00:00

Dataset Summary

Split Rows Label 0 Label 1 Unique groups Mean text length
train 4877 4347 530 2513 696.6
validation 978 899 79 538 690.6
test 1016 904 112 539 690.7

Threshold Comparison on Validation Split

Validation metrics document threshold selection and tuning behavior; test metrics remain the primary estimate of out-of-sample performance.

Model Threshold Accuracy Precision Recall F1 ROC AUC Average precision
logistic_tfidf 0.500 0.912 0.465 0.582 0.517 0.872 0.594
logistic_tfidf 0.608 0.942 0.696 0.494 0.578 0.872 0.594
xgboost_tfidf 0.500 0.945 0.931 0.342 0.500 0.823 0.588
xgboost_tfidf 0.177 0.934 0.592 0.570 0.581 0.823 0.588
embedding-logistic_sentence_embeddings 0.500 0.912 0.476 0.861 0.613 0.953 0.762
embedding-logistic_sentence_embeddings 0.722 0.957 0.703 0.810 0.753 0.953 0.762
embedding-svm_sentence_embeddings 0.500 0.955 0.807 0.582 0.676 0.952 0.754
embedding-svm_sentence_embeddings 0.310 0.957 0.713 0.785 0.747 0.952 0.754
embedding-lightgbm_sentence_embeddings 0.500 0.954 0.750 0.646 0.694 0.948 0.782
embedding-lightgbm_sentence_embeddings 0.042 0.952 0.670 0.797 0.728 0.948 0.782
transformer 0.500 0.964 0.739 0.861 0.795 0.970 0.874
transformer 0.853 0.970 0.812 0.823 0.818 0.970 0.874

Threshold Comparison on Test Split

Model Threshold Accuracy Precision Recall F1 ROC AUC Average precision
logistic_tfidf 0.500 0.926 0.691 0.598 0.641 0.899 0.726
logistic_tfidf 0.608 0.930 0.902 0.411 0.564 0.899 0.726
xgboost_tfidf 0.500 0.924 1.000 0.312 0.476 0.892 0.692
xgboost_tfidf 0.177 0.918 0.663 0.527 0.587 0.892 0.692
embedding-logistic_sentence_embeddings 0.500 0.891 0.503 0.884 0.641 0.955 0.710
embedding-logistic_sentence_embeddings 0.722 0.935 0.689 0.750 0.718 0.955 0.710
embedding-svm_sentence_embeddings 0.500 0.930 0.741 0.562 0.640 0.956 0.704
embedding-svm_sentence_embeddings 0.310 0.934 0.686 0.741 0.712 0.956 0.704
embedding-lightgbm_sentence_embeddings 0.500 0.937 0.740 0.661 0.698 0.960 0.791
embedding-lightgbm_sentence_embeddings 0.042 0.929 0.639 0.821 0.719 0.960 0.791
transformer 0.500 0.939 0.689 0.812 0.746 0.968 0.794
transformer 0.853 0.947 0.754 0.768 0.761 0.968 0.794

Confusion Matrices on Test Split

Rows are true labels and columns are predicted labels.

logistic_tfidf at threshold 0.500

True / Predicted NOT_RELEVANT RELEVANT
NOT_RELEVANT 874 30
RELEVANT 45 67

logistic_tfidf at threshold 0.608

True / Predicted NOT_RELEVANT RELEVANT
NOT_RELEVANT 899 5
RELEVANT 66 46

xgboost_tfidf at threshold 0.500

True / Predicted NOT_RELEVANT RELEVANT
NOT_RELEVANT 904 0
RELEVANT 77 35

xgboost_tfidf at threshold 0.177

True / Predicted NOT_RELEVANT RELEVANT
NOT_RELEVANT 874 30
RELEVANT 53 59

embedding-logistic_sentence_embeddings at threshold 0.500

True / Predicted NOT_RELEVANT RELEVANT
NOT_RELEVANT 806 98
RELEVANT 13 99

embedding-logistic_sentence_embeddings at threshold 0.722

True / Predicted NOT_RELEVANT RELEVANT
NOT_RELEVANT 866 38
RELEVANT 28 84

embedding-svm_sentence_embeddings at threshold 0.500

True / Predicted NOT_RELEVANT RELEVANT
NOT_RELEVANT 882 22
RELEVANT 49 63

embedding-svm_sentence_embeddings at threshold 0.310

True / Predicted NOT_RELEVANT RELEVANT
NOT_RELEVANT 866 38
RELEVANT 29 83

embedding-lightgbm_sentence_embeddings at threshold 0.500

True / Predicted NOT_RELEVANT RELEVANT
NOT_RELEVANT 878 26
RELEVANT 38 74

embedding-lightgbm_sentence_embeddings at threshold 0.042

True / Predicted NOT_RELEVANT RELEVANT
NOT_RELEVANT 852 52
RELEVANT 20 92

transformer at threshold 0.500

True / Predicted NOT_RELEVANT RELEVANT
NOT_RELEVANT 863 41
RELEVANT 21 91

transformer at threshold 0.853

True / Predicted NOT_RELEVANT RELEVANT
NOT_RELEVANT 876 28
RELEVANT 26 86

Validation-Tuned Thresholds

  • logistic_tfidf: threshold 0.608 (validation F1 0.578); test F1 change vs 0.5: -0.077.
  • xgboost_tfidf: threshold 0.177 (validation F1 0.581); test F1 change vs 0.5: +0.111.
  • embedding-logistic_sentence_embeddings: threshold 0.722 (validation F1 0.753); test F1 change vs 0.5: +0.077.
  • embedding-svm_sentence_embeddings: threshold 0.310 (validation F1 0.747); test F1 change vs 0.5: +0.073.
  • embedding-lightgbm_sentence_embeddings: threshold 0.042 (validation F1 0.728); test F1 change vs 0.5: +0.021.
  • transformer: threshold 0.853 (validation F1 0.818); test F1 change vs 0.5: +0.015.

Artifacts

  • logistic_tfidf: /content/agri-utilization-classifier/baselines/logistic
  • xgboost_tfidf: /content/agri-utilization-classifier/baselines/xgboost
  • embedding-logistic_sentence_embeddings: /content/agri-utilization-classifier/baselines/embedding-logistic
  • embedding-svm_sentence_embeddings: /content/agri-utilization-classifier/baselines/embedding-svm
  • embedding-lightgbm_sentence_embeddings: /content/agri-utilization-classifier/baselines/embedding-lightgbm
  • transformer: /content/agri-utilization-classifier/transformer