File size: 6,531 Bytes

f2e02af
 
 
 
7adc93a
f2e02af
 
 
7adc93a
f2e02af
 
 
 
 
7adc93a
 
 
f2e02af
1efc71c
 
 
 
 
 
7adc93a
 
 
 
 
 
 
 
 
 
 
 
1efc71c
f2e02af
 
 
 
7adc93a
 
 
 
 
 
 
 
 
 
 
 
f2e02af
 
 
 
 
 
 
 
 
7adc93a
 
f2e02af
7adc93a
f2e02af
 
 
7adc93a
 
f2e02af
 
 
 
 
7adc93a
 
f2e02af
7adc93a
f2e02af
 
 
7adc93a
 
f2e02af
 
 
 
 
7adc93a
 
f2e02af
7adc93a
f2e02af
 
 
7adc93a
 
f2e02af
 
 
 
 
7adc93a
 
f2e02af
7adc93a
f2e02af
 
 
7adc93a
 
f2e02af
 
 
 
 
7adc93a
 
f2e02af
7adc93a
f2e02af
 
 
7adc93a
 
f2e02af
 
 
 
 
7adc93a
 
f2e02af
7adc93a
f2e02af
 
 
7adc93a
 
f2e02af
 
 
 
7adc93a
 
 
 
 
 
f2e02af

# AMIS Commodity Classifier Training Report

- Dataset: `faodl/amis-agri-utilization`
- Dataset subset: ``
- Dataset revision: `ada4a04088a98f8f64bc7485c57d4c7f422c2151`
- Text column: `chunk_text`
- Label column: `label`
- Transformer: `FacebookAI/xlm-roberta-base`
- Generated at: `2026-06-10T20:30:54.345579+00:00`

## Dataset Summary

| Split | Rows | Label 0 | Label 1 | Unique groups | Mean text length |
| --- | ---: | ---: | ---: | ---: | ---: |
| train | 4877 | 4347 | 530 | 2513 | 696.6 |
| validation | 978 | 899 | 79 | 538 | 690.6 |
| test | 1016 | 904 | 112 | 539 | 690.7 |

## Threshold Comparison on Validation Split

Validation metrics document threshold selection and tuning behavior; test metrics remain the primary estimate of out-of-sample performance.

| Model | Threshold | Accuracy | Precision | Recall | F1 | ROC AUC | Average precision |
| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| logistic_tfidf | 0.500 | 0.912 | 0.465 | 0.582 | 0.517 | 0.872 | 0.594 |
| logistic_tfidf | 0.608 | 0.942 | 0.696 | 0.494 | 0.578 | 0.872 | 0.594 |
| xgboost_tfidf | 0.500 | 0.945 | 0.931 | 0.342 | 0.500 | 0.823 | 0.588 |
| xgboost_tfidf | 0.177 | 0.934 | 0.592 | 0.570 | 0.581 | 0.823 | 0.588 |
| embedding-logistic_sentence_embeddings | 0.500 | 0.912 | 0.476 | 0.861 | 0.613 | 0.953 | 0.762 |
| embedding-logistic_sentence_embeddings | 0.722 | 0.957 | 0.703 | 0.810 | 0.753 | 0.953 | 0.762 |
| embedding-svm_sentence_embeddings | 0.500 | 0.955 | 0.807 | 0.582 | 0.676 | 0.952 | 0.754 |
| embedding-svm_sentence_embeddings | 0.310 | 0.957 | 0.713 | 0.785 | 0.747 | 0.952 | 0.754 |
| embedding-lightgbm_sentence_embeddings | 0.500 | 0.954 | 0.750 | 0.646 | 0.694 | 0.948 | 0.782 |
| embedding-lightgbm_sentence_embeddings | 0.042 | 0.952 | 0.670 | 0.797 | 0.728 | 0.948 | 0.782 |
| transformer | 0.500 | 0.964 | 0.739 | 0.861 | 0.795 | 0.970 | 0.874 |
| transformer | 0.853 | 0.970 | 0.812 | 0.823 | 0.818 | 0.970 | 0.874 |

## Threshold Comparison on Test Split

| Model | Threshold | Accuracy | Precision | Recall | F1 | ROC AUC | Average precision |
| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| logistic_tfidf | 0.500 | 0.926 | 0.691 | 0.598 | 0.641 | 0.899 | 0.726 |
| logistic_tfidf | 0.608 | 0.930 | 0.902 | 0.411 | 0.564 | 0.899 | 0.726 |
| xgboost_tfidf | 0.500 | 0.924 | 1.000 | 0.312 | 0.476 | 0.892 | 0.692 |
| xgboost_tfidf | 0.177 | 0.918 | 0.663 | 0.527 | 0.587 | 0.892 | 0.692 |
| embedding-logistic_sentence_embeddings | 0.500 | 0.891 | 0.503 | 0.884 | 0.641 | 0.955 | 0.710 |
| embedding-logistic_sentence_embeddings | 0.722 | 0.935 | 0.689 | 0.750 | 0.718 | 0.955 | 0.710 |
| embedding-svm_sentence_embeddings | 0.500 | 0.930 | 0.741 | 0.562 | 0.640 | 0.956 | 0.704 |
| embedding-svm_sentence_embeddings | 0.310 | 0.934 | 0.686 | 0.741 | 0.712 | 0.956 | 0.704 |
| embedding-lightgbm_sentence_embeddings | 0.500 | 0.937 | 0.740 | 0.661 | 0.698 | 0.960 | 0.791 |
| embedding-lightgbm_sentence_embeddings | 0.042 | 0.929 | 0.639 | 0.821 | 0.719 | 0.960 | 0.791 |
| transformer | 0.500 | 0.939 | 0.689 | 0.812 | 0.746 | 0.968 | 0.794 |
| transformer | 0.853 | 0.947 | 0.754 | 0.768 | 0.761 | 0.968 | 0.794 |

## Confusion Matrices on Test Split

Rows are true labels and columns are predicted labels.

### logistic_tfidf at threshold 0.500

| True / Predicted | NOT_RELEVANT | RELEVANT |
| --- | ---: | ---: |
| NOT_RELEVANT | 874 | 30 |
| RELEVANT | 45 | 67 |

### logistic_tfidf at threshold 0.608

| True / Predicted | NOT_RELEVANT | RELEVANT |
| --- | ---: | ---: |
| NOT_RELEVANT | 899 | 5 |
| RELEVANT | 66 | 46 |

### xgboost_tfidf at threshold 0.500

| True / Predicted | NOT_RELEVANT | RELEVANT |
| --- | ---: | ---: |
| NOT_RELEVANT | 904 | 0 |
| RELEVANT | 77 | 35 |

### xgboost_tfidf at threshold 0.177

| True / Predicted | NOT_RELEVANT | RELEVANT |
| --- | ---: | ---: |
| NOT_RELEVANT | 874 | 30 |
| RELEVANT | 53 | 59 |

### embedding-logistic_sentence_embeddings at threshold 0.500

| True / Predicted | NOT_RELEVANT | RELEVANT |
| --- | ---: | ---: |
| NOT_RELEVANT | 806 | 98 |
| RELEVANT | 13 | 99 |

### embedding-logistic_sentence_embeddings at threshold 0.722

| True / Predicted | NOT_RELEVANT | RELEVANT |
| --- | ---: | ---: |
| NOT_RELEVANT | 866 | 38 |
| RELEVANT | 28 | 84 |

### embedding-svm_sentence_embeddings at threshold 0.500

| True / Predicted | NOT_RELEVANT | RELEVANT |
| --- | ---: | ---: |
| NOT_RELEVANT | 882 | 22 |
| RELEVANT | 49 | 63 |

### embedding-svm_sentence_embeddings at threshold 0.310

| True / Predicted | NOT_RELEVANT | RELEVANT |
| --- | ---: | ---: |
| NOT_RELEVANT | 866 | 38 |
| RELEVANT | 29 | 83 |

### embedding-lightgbm_sentence_embeddings at threshold 0.500

| True / Predicted | NOT_RELEVANT | RELEVANT |
| --- | ---: | ---: |
| NOT_RELEVANT | 878 | 26 |
| RELEVANT | 38 | 74 |

### embedding-lightgbm_sentence_embeddings at threshold 0.042

| True / Predicted | NOT_RELEVANT | RELEVANT |
| --- | ---: | ---: |
| NOT_RELEVANT | 852 | 52 |
| RELEVANT | 20 | 92 |

### transformer at threshold 0.500

| True / Predicted | NOT_RELEVANT | RELEVANT |
| --- | ---: | ---: |
| NOT_RELEVANT | 863 | 41 |
| RELEVANT | 21 | 91 |

### transformer at threshold 0.853

| True / Predicted | NOT_RELEVANT | RELEVANT |
| --- | ---: | ---: |
| NOT_RELEVANT | 876 | 28 |
| RELEVANT | 26 | 86 |


## Validation-Tuned Thresholds

- `logistic_tfidf`: threshold `0.608` (validation F1 `0.578`); test F1 change vs 0.5: `-0.077`.
- `xgboost_tfidf`: threshold `0.177` (validation F1 `0.581`); test F1 change vs 0.5: `+0.111`.
- `embedding-logistic_sentence_embeddings`: threshold `0.722` (validation F1 `0.753`); test F1 change vs 0.5: `+0.077`.
- `embedding-svm_sentence_embeddings`: threshold `0.310` (validation F1 `0.747`); test F1 change vs 0.5: `+0.073`.
- `embedding-lightgbm_sentence_embeddings`: threshold `0.042` (validation F1 `0.728`); test F1 change vs 0.5: `+0.021`.
- `transformer`: threshold `0.853` (validation F1 `0.818`); test F1 change vs 0.5: `+0.015`.

## Artifacts

- `logistic_tfidf`: `/content/agri-utilization-classifier/baselines/logistic`
- `xgboost_tfidf`: `/content/agri-utilization-classifier/baselines/xgboost`
- `embedding-logistic_sentence_embeddings`: `/content/agri-utilization-classifier/baselines/embedding-logistic`
- `embedding-svm_sentence_embeddings`: `/content/agri-utilization-classifier/baselines/embedding-svm`
- `embedding-lightgbm_sentence_embeddings`: `/content/agri-utilization-classifier/baselines/embedding-lightgbm`
- `transformer`: `/content/agri-utilization-classifier/transformer`