|
|
--- |
|
|
language: en |
|
|
tags: |
|
|
- crop-yield |
|
|
- agriculture |
|
|
- regression |
|
|
- classification |
|
|
- xgboost |
|
|
- tabular |
|
|
license: mit |
|
|
datasets: |
|
|
- fao |
|
|
pipeline_tag: tabular-regression |
|
|
--- |
|
|
|
|
|
# πΎ CropYQ β Crop Yield & Quality Prediction Models |
|
|
|
|
|
**Repository:** `BirendraSharma/cropyq` |
|
|
|
|
|
This repository hosts two trained machine learning models for predicting agricultural crop yield and quality across India, Nepal, and the Netherlands, based on FAO crop production data. |
|
|
|
|
|
--- |
|
|
|
|
|
## π¦ Models Included |
|
|
|
|
|
| File | Type | Description | |
|
|
|------|------|-------------| |
|
|
| `regression_model.pkl` | Scikit-learn Regressor | Predicts crop yield in **kg/ha** (log-transformed target, inverse-transformed on output) | |
|
|
| `xgboostClassification_model.pkl` | XGBoost Classifier | Predicts crop quality as **Low / Medium / High** | |
|
|
|
|
|
--- |
|
|
|
|
|
## ποΈ Input Features |
|
|
|
|
|
Both models share the same 5-feature input vector: |
|
|
|
|
|
| Feature | Type | Description | |
|
|
|---------|------|-------------| |
|
|
| `Area` | Encoded int | Country (India=0, Nepal=1, Netherlands=2) | |
|
|
| `Item` | Encoded int | Crop type (38 categories, e.g. Wheat=36, Rice=27) | |
|
|
| `Crop Group` | Encoded int | Cereal=0, Fruit=1, Oilseed=2, Pulse=3, Root=4, Vegetable=5 | |
|
|
| `Flag` | Encoded int | FAO data flag β A=0, E=1 | |
|
|
| `Year` | int | Year offset from 1961 (e.g. 2020 β 59) | |
|
|
|
|
|
--- |
|
|
|
|
|
## π Supported Areas |
|
|
|
|
|
- India |
|
|
- Nepal |
|
|
- Netherlands (Kingdom of the) |
|
|
|
|
|
--- |
|
|
|
|
|
## π± Supported Crops (38 total) |
|
|
|
|
|
Apples, Bananas, Barley, Beans (dry), Broad beans, Cabbages, Carrots & turnips, Cassava, Cauliflowers & broccoli, Chick peas, Chillies & peppers, Eggplants, Grapes, Groundnuts, Lentils, Linseed, Maize (corn), Mangoes, Millet, Mustard seed, Oats, Onions & shallots, Oranges, Peas (dry), Pigeon peas, Potatoes, Rape/colza seed, Rice, Rye, Sesame seed, Sorghum, Soya beans, Sunflower seed, Sweet potatoes, Tomatoes, Triticale, Wheat, Yams |
|
|
|
|
|
--- |
|
|
|
|
|
## π Quickstart |
|
|
|
|
|
### Install dependencies |
|
|
|
|
|
```bash |
|
|
pip install huggingface_hub scikit-learn xgboost numpy |
|
|
``` |
|
|
|
|
|
### Load and use the models |
|
|
|
|
|
```python |
|
|
import pickle |
|
|
import numpy as np |
|
|
from huggingface_hub import hf_hub_download |
|
|
|
|
|
REPO_ID = "BirendraSharma/cropyq" |
|
|
|
|
|
# Download and load regression model |
|
|
reg_path = hf_hub_download(repo_id=REPO_ID, filename="regression_model.pkl") |
|
|
with open(reg_path, "rb") as f: |
|
|
reg_model = pickle.load(f) |
|
|
|
|
|
# Download and load classification model |
|
|
clf_path = hf_hub_download(repo_id=REPO_ID, filename="xgboostClassification_model.pkl") |
|
|
with open(clf_path, "rb") as f: |
|
|
clf_model = pickle.load(f) |
|
|
|
|
|
# Example: Wheat in India, Cereal group, Flag A, Year 2020 |
|
|
# area=0 (India), item=36 (Wheat), cropgroup=0 (Cereal), flag=0 (A), year=2020-1961=59 |
|
|
inputs = np.array([[0, 36, 0, 0, 59]], dtype=np.float32) |
|
|
|
|
|
# Predict yield (kg/ha) β model was trained on log1p target |
|
|
log_yield = reg_model.predict(inputs)[0] |
|
|
yield_kgha = np.expm1(log_yield) |
|
|
print(f"Predicted Yield: {yield_kgha:.2f} kg/ha") |
|
|
|
|
|
# Predict quality |
|
|
quality_map = {0: "Low", 1: "Medium", 2: "High"} |
|
|
quality_pred = clf_model.predict(inputs)[0] |
|
|
print(f"Predicted Quality: {quality_map[int(quality_pred)]}") |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π₯οΈ Desktop GUI App |
|
|
|
|
|
A Tkinter-based desktop app is available that provides a point-and-click interface for running predictions. |
|
|
|
|
|
### Run the app |
|
|
|
|
|
```bash |
|
|
pip install huggingface_hub scikit-learn xgboost numpy tkinter |
|
|
python crop_yield_app.py |
|
|
``` |
|
|
|
|
|
The app will automatically download both model files from this repository on first launch. |
|
|
|
|
|
**Features:** |
|
|
- Dropdown selectors for Area, Item, Crop Group, and Flag |
|
|
- Text entry for Year |
|
|
- **Predict Yield** button β returns estimated kg/ha |
|
|
- **Predict Quality** button β returns Low / Medium / High |
|
|
|
|
|
--- |
|
|
|
|
|
## π’ Encoding Reference |
|
|
|
|
|
<details> |
|
|
<summary>Area Encoding</summary> |
|
|
|
|
|
| Area | Code | |
|
|
|------|------| |
|
|
| India | 0 | |
|
|
| Nepal | 1 | |
|
|
| Netherlands (Kingdom of the) | 2 | |
|
|
|
|
|
</details> |
|
|
|
|
|
<details> |
|
|
<summary>Crop Group Encoding</summary> |
|
|
|
|
|
| Crop Group | Code | |
|
|
|------------|------| |
|
|
| Cereal | 0 | |
|
|
| Fruit | 1 | |
|
|
| Oilseed | 2 | |
|
|
| Pulse | 3 | |
|
|
| Root | 4 | |
|
|
| Vegetable | 5 | |
|
|
|
|
|
</details> |
|
|
|
|
|
<details> |
|
|
<summary>Flag Encoding</summary> |
|
|
|
|
|
| Flag | Code | Meaning | |
|
|
|------|------|---------| |
|
|
| A | 0 | Official figure | |
|
|
| E | 1 | Estimated value | |
|
|
|
|
|
</details> |
|
|
|
|
|
<details> |
|
|
<summary>Year Encoding</summary> |
|
|
|
|
|
Year values are offset from 1961: |
|
|
|
|
|
``` |
|
|
encoded_year = actual_year - 1961 |
|
|
# e.g. 2020 β 59, 1990 β 29, 1961 β 0 |
|
|
``` |
|
|
|
|
|
</details> |
|
|
|
|
|
--- |
|
|
|
|
|
## π Model Details |
|
|
|
|
|
### Regression Model (`regression_model.pkl`) |
|
|
- **Task:** Tabular regression |
|
|
- **Target:** Log-transformed crop yield (`log1p(kg/ha)`), back-transformed with `expm1` at inference |
|
|
- **Output:** Yield in kg/ha |
|
|
|
|
|
### Classification Model (`xgboostClassification_model.pkl`) |
|
|
- **Task:** Multi-class tabular classification |
|
|
- **Framework:** XGBoost |
|
|
- **Output classes:** Low (0), Medium (1), High (2) |
|
|
|
|
|
--- |
|
|
|
|
|
## π Repository Structure |
|
|
|
|
|
``` |
|
|
BirendraSharma/cropyq/ |
|
|
βββ regression_model.pkl # Sklearn regression model |
|
|
βββ xgboostClassification_model.pkl # XGBoost classification model |
|
|
βββ README.md # This file |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π License |
|
|
|
|
|
This project is licensed under the [MIT License](https://opensource.org/licenses/MIT). |
|
|
|
|
|
--- |
|
|
|
|
|
## π Acknowledgements |
|
|
|
|
|
Data sourced from the [FAO (Food and Agriculture Organization of the United Nations)](https://www.fao.org/faostat/) crop production statistics. |