Text Classification
Transformers
Joblib
Safetensors
multilingual
binary-classification
amis
agriculture
Instructions to use faodl/agri-utilization-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use faodl/agri-utilization-classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="faodl/agri-utilization-classifier")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("faodl/agri-utilization-classifier", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| # AMIS Commodity Classifier Training Report | |
| - Dataset: `faodl/amis-agri-utilization` | |
| - Dataset subset: `` | |
| - Dataset revision: `ada4a04088a98f8f64bc7485c57d4c7f422c2151` | |
| - Text column: `chunk_text` | |
| - Label column: `label` | |
| - Transformer: `FacebookAI/xlm-roberta-base` | |
| - Generated at: `2026-06-10T20:30:54.345579+00:00` | |
| ## Dataset Summary | |
| | Split | Rows | Label 0 | Label 1 | Unique groups | Mean text length | | |
| | --- | ---: | ---: | ---: | ---: | ---: | | |
| | train | 4877 | 4347 | 530 | 2513 | 696.6 | | |
| | validation | 978 | 899 | 79 | 538 | 690.6 | | |
| | test | 1016 | 904 | 112 | 539 | 690.7 | | |
| ## Threshold Comparison on Validation Split | |
| Validation metrics document threshold selection and tuning behavior; test metrics remain the primary estimate of out-of-sample performance. | |
| | Model | Threshold | Accuracy | Precision | Recall | F1 | ROC AUC | Average precision | | |
| | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | | |
| | logistic_tfidf | 0.500 | 0.912 | 0.465 | 0.582 | 0.517 | 0.872 | 0.594 | | |
| | logistic_tfidf | 0.608 | 0.942 | 0.696 | 0.494 | 0.578 | 0.872 | 0.594 | | |
| | xgboost_tfidf | 0.500 | 0.945 | 0.931 | 0.342 | 0.500 | 0.823 | 0.588 | | |
| | xgboost_tfidf | 0.177 | 0.934 | 0.592 | 0.570 | 0.581 | 0.823 | 0.588 | | |
| | embedding-logistic_sentence_embeddings | 0.500 | 0.912 | 0.476 | 0.861 | 0.613 | 0.953 | 0.762 | | |
| | embedding-logistic_sentence_embeddings | 0.722 | 0.957 | 0.703 | 0.810 | 0.753 | 0.953 | 0.762 | | |
| | embedding-svm_sentence_embeddings | 0.500 | 0.955 | 0.807 | 0.582 | 0.676 | 0.952 | 0.754 | | |
| | embedding-svm_sentence_embeddings | 0.310 | 0.957 | 0.713 | 0.785 | 0.747 | 0.952 | 0.754 | | |
| | embedding-lightgbm_sentence_embeddings | 0.500 | 0.954 | 0.750 | 0.646 | 0.694 | 0.948 | 0.782 | | |
| | embedding-lightgbm_sentence_embeddings | 0.042 | 0.952 | 0.670 | 0.797 | 0.728 | 0.948 | 0.782 | | |
| | transformer | 0.500 | 0.964 | 0.739 | 0.861 | 0.795 | 0.970 | 0.874 | | |
| | transformer | 0.853 | 0.970 | 0.812 | 0.823 | 0.818 | 0.970 | 0.874 | | |
| ## Threshold Comparison on Test Split | |
| | Model | Threshold | Accuracy | Precision | Recall | F1 | ROC AUC | Average precision | | |
| | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | | |
| | logistic_tfidf | 0.500 | 0.926 | 0.691 | 0.598 | 0.641 | 0.899 | 0.726 | | |
| | logistic_tfidf | 0.608 | 0.930 | 0.902 | 0.411 | 0.564 | 0.899 | 0.726 | | |
| | xgboost_tfidf | 0.500 | 0.924 | 1.000 | 0.312 | 0.476 | 0.892 | 0.692 | | |
| | xgboost_tfidf | 0.177 | 0.918 | 0.663 | 0.527 | 0.587 | 0.892 | 0.692 | | |
| | embedding-logistic_sentence_embeddings | 0.500 | 0.891 | 0.503 | 0.884 | 0.641 | 0.955 | 0.710 | | |
| | embedding-logistic_sentence_embeddings | 0.722 | 0.935 | 0.689 | 0.750 | 0.718 | 0.955 | 0.710 | | |
| | embedding-svm_sentence_embeddings | 0.500 | 0.930 | 0.741 | 0.562 | 0.640 | 0.956 | 0.704 | | |
| | embedding-svm_sentence_embeddings | 0.310 | 0.934 | 0.686 | 0.741 | 0.712 | 0.956 | 0.704 | | |
| | embedding-lightgbm_sentence_embeddings | 0.500 | 0.937 | 0.740 | 0.661 | 0.698 | 0.960 | 0.791 | | |
| | embedding-lightgbm_sentence_embeddings | 0.042 | 0.929 | 0.639 | 0.821 | 0.719 | 0.960 | 0.791 | | |
| | transformer | 0.500 | 0.939 | 0.689 | 0.812 | 0.746 | 0.968 | 0.794 | | |
| | transformer | 0.853 | 0.947 | 0.754 | 0.768 | 0.761 | 0.968 | 0.794 | | |
| ## Confusion Matrices on Test Split | |
| Rows are true labels and columns are predicted labels. | |
| ### logistic_tfidf at threshold 0.500 | |
| | True / Predicted | NOT_RELEVANT | RELEVANT | | |
| | --- | ---: | ---: | | |
| | NOT_RELEVANT | 874 | 30 | | |
| | RELEVANT | 45 | 67 | | |
| ### logistic_tfidf at threshold 0.608 | |
| | True / Predicted | NOT_RELEVANT | RELEVANT | | |
| | --- | ---: | ---: | | |
| | NOT_RELEVANT | 899 | 5 | | |
| | RELEVANT | 66 | 46 | | |
| ### xgboost_tfidf at threshold 0.500 | |
| | True / Predicted | NOT_RELEVANT | RELEVANT | | |
| | --- | ---: | ---: | | |
| | NOT_RELEVANT | 904 | 0 | | |
| | RELEVANT | 77 | 35 | | |
| ### xgboost_tfidf at threshold 0.177 | |
| | True / Predicted | NOT_RELEVANT | RELEVANT | | |
| | --- | ---: | ---: | | |
| | NOT_RELEVANT | 874 | 30 | | |
| | RELEVANT | 53 | 59 | | |
| ### embedding-logistic_sentence_embeddings at threshold 0.500 | |
| | True / Predicted | NOT_RELEVANT | RELEVANT | | |
| | --- | ---: | ---: | | |
| | NOT_RELEVANT | 806 | 98 | | |
| | RELEVANT | 13 | 99 | | |
| ### embedding-logistic_sentence_embeddings at threshold 0.722 | |
| | True / Predicted | NOT_RELEVANT | RELEVANT | | |
| | --- | ---: | ---: | | |
| | NOT_RELEVANT | 866 | 38 | | |
| | RELEVANT | 28 | 84 | | |
| ### embedding-svm_sentence_embeddings at threshold 0.500 | |
| | True / Predicted | NOT_RELEVANT | RELEVANT | | |
| | --- | ---: | ---: | | |
| | NOT_RELEVANT | 882 | 22 | | |
| | RELEVANT | 49 | 63 | | |
| ### embedding-svm_sentence_embeddings at threshold 0.310 | |
| | True / Predicted | NOT_RELEVANT | RELEVANT | | |
| | --- | ---: | ---: | | |
| | NOT_RELEVANT | 866 | 38 | | |
| | RELEVANT | 29 | 83 | | |
| ### embedding-lightgbm_sentence_embeddings at threshold 0.500 | |
| | True / Predicted | NOT_RELEVANT | RELEVANT | | |
| | --- | ---: | ---: | | |
| | NOT_RELEVANT | 878 | 26 | | |
| | RELEVANT | 38 | 74 | | |
| ### embedding-lightgbm_sentence_embeddings at threshold 0.042 | |
| | True / Predicted | NOT_RELEVANT | RELEVANT | | |
| | --- | ---: | ---: | | |
| | NOT_RELEVANT | 852 | 52 | | |
| | RELEVANT | 20 | 92 | | |
| ### transformer at threshold 0.500 | |
| | True / Predicted | NOT_RELEVANT | RELEVANT | | |
| | --- | ---: | ---: | | |
| | NOT_RELEVANT | 863 | 41 | | |
| | RELEVANT | 21 | 91 | | |
| ### transformer at threshold 0.853 | |
| | True / Predicted | NOT_RELEVANT | RELEVANT | | |
| | --- | ---: | ---: | | |
| | NOT_RELEVANT | 876 | 28 | | |
| | RELEVANT | 26 | 86 | | |
| ## Validation-Tuned Thresholds | |
| - `logistic_tfidf`: threshold `0.608` (validation F1 `0.578`); test F1 change vs 0.5: `-0.077`. | |
| - `xgboost_tfidf`: threshold `0.177` (validation F1 `0.581`); test F1 change vs 0.5: `+0.111`. | |
| - `embedding-logistic_sentence_embeddings`: threshold `0.722` (validation F1 `0.753`); test F1 change vs 0.5: `+0.077`. | |
| - `embedding-svm_sentence_embeddings`: threshold `0.310` (validation F1 `0.747`); test F1 change vs 0.5: `+0.073`. | |
| - `embedding-lightgbm_sentence_embeddings`: threshold `0.042` (validation F1 `0.728`); test F1 change vs 0.5: `+0.021`. | |
| - `transformer`: threshold `0.853` (validation F1 `0.818`); test F1 change vs 0.5: `+0.015`. | |
| ## Artifacts | |
| - `logistic_tfidf`: `/content/agri-utilization-classifier/baselines/logistic` | |
| - `xgboost_tfidf`: `/content/agri-utilization-classifier/baselines/xgboost` | |
| - `embedding-logistic_sentence_embeddings`: `/content/agri-utilization-classifier/baselines/embedding-logistic` | |
| - `embedding-svm_sentence_embeddings`: `/content/agri-utilization-classifier/baselines/embedding-svm` | |
| - `embedding-lightgbm_sentence_embeddings`: `/content/agri-utilization-classifier/baselines/embedding-lightgbm` | |
| - `transformer`: `/content/agri-utilization-classifier/transformer` | |