---
language: en
tags:
- crop-yield
- agriculture
- regression
- classification
- xgboost
- tabular
license: mit
datasets:
- fao
pipeline_tag: tabular-regression
---
# 🌾 CropYQ — Crop Yield & Quality Prediction Models
**Repository:** `BirendraSharma/cropyq`
This repository hosts two trained machine learning models for predicting agricultural crop yield and quality across India, Nepal, and the Netherlands, based on FAO crop production data.
---
## 📦 Models Included
| File | Type | Description |
|------|------|-------------|
| `regression_model.pkl` | Scikit-learn Regressor | Predicts crop yield in **kg/ha** (log-transformed target, inverse-transformed on output) |
| `xgboostClassification_model.pkl` | XGBoost Classifier | Predicts crop quality as **Low / Medium / High** |
---
## 🗂️ Input Features
Both models share the same 5-feature input vector:
| Feature | Type | Description |
|---------|------|-------------|
| `Area` | Encoded int | Country (India=0, Nepal=1, Netherlands=2) |
| `Item` | Encoded int | Crop type (38 categories, e.g. Wheat=36, Rice=27) |
| `Crop Group` | Encoded int | Cereal=0, Fruit=1, Oilseed=2, Pulse=3, Root=4, Vegetable=5 |
| `Flag` | Encoded int | FAO data flag — A=0, E=1 |
| `Year` | int | Year offset from 1961 (e.g. 2020 → 59) |
---
## 🌍 Supported Areas
- India
- Nepal
- Netherlands (Kingdom of the)
---
## 🌱 Supported Crops (38 total)
Apples, Bananas, Barley, Beans (dry), Broad beans, Cabbages, Carrots & turnips, Cassava, Cauliflowers & broccoli, Chick peas, Chillies & peppers, Eggplants, Grapes, Groundnuts, Lentils, Linseed, Maize (corn), Mangoes, Millet, Mustard seed, Oats, Onions & shallots, Oranges, Peas (dry), Pigeon peas, Potatoes, Rape/colza seed, Rice, Rye, Sesame seed, Sorghum, Soya beans, Sunflower seed, Sweet potatoes, Tomatoes, Triticale, Wheat, Yams
---
## 🚀 Quickstart
### Install dependencies
```bash
pip install huggingface_hub scikit-learn xgboost numpy
```
### Load and use the models
```python
import pickle
import numpy as np
from huggingface_hub import hf_hub_download
REPO_ID = "BirendraSharma/cropyq"
# Download and load regression model
reg_path = hf_hub_download(repo_id=REPO_ID, filename="regression_model.pkl")
with open(reg_path, "rb") as f:
reg_model = pickle.load(f)
# Download and load classification model
clf_path = hf_hub_download(repo_id=REPO_ID, filename="xgboostClassification_model.pkl")
with open(clf_path, "rb") as f:
clf_model = pickle.load(f)
# Example: Wheat in India, Cereal group, Flag A, Year 2020
# area=0 (India), item=36 (Wheat), cropgroup=0 (Cereal), flag=0 (A), year=2020-1961=59
inputs = np.array([[0, 36, 0, 0, 59]], dtype=np.float32)
# Predict yield (kg/ha) — model was trained on log1p target
log_yield = reg_model.predict(inputs)[0]
yield_kgha = np.expm1(log_yield)
print(f"Predicted Yield: {yield_kgha:.2f} kg/ha")
# Predict quality
quality_map = {0: "Low", 1: "Medium", 2: "High"}
quality_pred = clf_model.predict(inputs)[0]
print(f"Predicted Quality: {quality_map[int(quality_pred)]}")
```
---
## 🖥️ Desktop GUI App
A Tkinter-based desktop app is available that provides a point-and-click interface for running predictions.
### Run the app
```bash
pip install huggingface_hub scikit-learn xgboost numpy tkinter
python crop_yield_app.py
```
The app will automatically download both model files from this repository on first launch.
**Features:**
- Dropdown selectors for Area, Item, Crop Group, and Flag
- Text entry for Year
- **Predict Yield** button → returns estimated kg/ha
- **Predict Quality** button → returns Low / Medium / High
---
## 🔢 Encoding Reference
Area Encoding
| Area | Code |
|------|------|
| India | 0 |
| Nepal | 1 |
| Netherlands (Kingdom of the) | 2 |
Crop Group Encoding
| Crop Group | Code |
|------------|------|
| Cereal | 0 |
| Fruit | 1 |
| Oilseed | 2 |
| Pulse | 3 |
| Root | 4 |
| Vegetable | 5 |
Flag Encoding
| Flag | Code | Meaning |
|------|------|---------|
| A | 0 | Official figure |
| E | 1 | Estimated value |
Year Encoding
Year values are offset from 1961:
```
encoded_year = actual_year - 1961
# e.g. 2020 → 59, 1990 → 29, 1961 → 0
```
---
## 📊 Model Details
### Regression Model (`regression_model.pkl`)
- **Task:** Tabular regression
- **Target:** Log-transformed crop yield (`log1p(kg/ha)`), back-transformed with `expm1` at inference
- **Output:** Yield in kg/ha
### Classification Model (`xgboostClassification_model.pkl`)
- **Task:** Multi-class tabular classification
- **Framework:** XGBoost
- **Output classes:** Low (0), Medium (1), High (2)
---
## 📁 Repository Structure
```
BirendraSharma/cropyq/
├── regression_model.pkl # Sklearn regression model
├── xgboostClassification_model.pkl # XGBoost classification model
└── README.md # This file
```
---
## 📜 License
This project is licensed under the [MIT License](https://opensource.org/licenses/MIT).
---
## 🙏 Acknowledgements
Data sourced from the [FAO (Food and Agriculture Organization of the United Nations)](https://www.fao.org/faostat/) crop production statistics.