---
language: en
tags:
  - crop-yield
  - agriculture
  - regression
  - classification
  - xgboost
  - tabular
license: mit
datasets:
  - fao
pipeline_tag: tabular-regression
---

# 🌾 CropYQ — Crop Yield & Quality Prediction Models

**Repository:** `BirendraSharma/cropyq`

This repository hosts two trained machine learning models for predicting agricultural crop yield and quality across India, Nepal, and the Netherlands, based on FAO crop production data.

---

## 📦 Models Included

| File | Type | Description |
|------|------|-------------|
| `regression_model.pkl` | Scikit-learn Regressor | Predicts crop yield in **kg/ha** (log-transformed target, inverse-transformed on output) |
| `xgboostClassification_model.pkl` | XGBoost Classifier | Predicts crop quality as **Low / Medium / High** |

---

## 🗂️ Input Features

Both models share the same 5-feature input vector:

| Feature | Type | Description |
|---------|------|-------------|
| `Area` | Encoded int | Country (India=0, Nepal=1, Netherlands=2) |
| `Item` | Encoded int | Crop type (38 categories, e.g. Wheat=36, Rice=27) |
| `Crop Group` | Encoded int | Cereal=0, Fruit=1, Oilseed=2, Pulse=3, Root=4, Vegetable=5 |
| `Flag` | Encoded int | FAO data flag — A=0, E=1 |
| `Year` | int | Year offset from 1961 (e.g. 2020 → 59) |

---

## 🌍 Supported Areas

- India
- Nepal
- Netherlands (Kingdom of the)

---

## 🌱 Supported Crops (38 total)

Apples, Bananas, Barley, Beans (dry), Broad beans, Cabbages, Carrots & turnips, Cassava, Cauliflowers & broccoli, Chick peas, Chillies & peppers, Eggplants, Grapes, Groundnuts, Lentils, Linseed, Maize (corn), Mangoes, Millet, Mustard seed, Oats, Onions & shallots, Oranges, Peas (dry), Pigeon peas, Potatoes, Rape/colza seed, Rice, Rye, Sesame seed, Sorghum, Soya beans, Sunflower seed, Sweet potatoes, Tomatoes, Triticale, Wheat, Yams

---

## 🚀 Quickstart

### Install dependencies

```bash
pip install huggingface_hub scikit-learn xgboost numpy
```

### Load and use the models

```python
import pickle
import numpy as np
from huggingface_hub import hf_hub_download

REPO_ID = "BirendraSharma/cropyq"

# Download and load regression model
reg_path = hf_hub_download(repo_id=REPO_ID, filename="regression_model.pkl")
with open(reg_path, "rb") as f:
    reg_model = pickle.load(f)

# Download and load classification model
clf_path = hf_hub_download(repo_id=REPO_ID, filename="xgboostClassification_model.pkl")
with open(clf_path, "rb") as f:
    clf_model = pickle.load(f)

# Example: Wheat in India, Cereal group, Flag A, Year 2020
# area=0 (India), item=36 (Wheat), cropgroup=0 (Cereal), flag=0 (A), year=2020-1961=59
inputs = np.array([[0, 36, 0, 0, 59]], dtype=np.float32)

# Predict yield (kg/ha) — model was trained on log1p target
log_yield = reg_model.predict(inputs)[0]
yield_kgha = np.expm1(log_yield)
print(f"Predicted Yield: {yield_kgha:.2f} kg/ha")

# Predict quality
quality_map = {0: "Low", 1: "Medium", 2: "High"}
quality_pred = clf_model.predict(inputs)[0]
print(f"Predicted Quality: {quality_map[int(quality_pred)]}")
```

---

## 🖥️ Desktop GUI App

A Tkinter-based desktop app is available that provides a point-and-click interface for running predictions.

### Run the app

```bash
pip install huggingface_hub scikit-learn xgboost numpy tkinter
python crop_yield_app.py
```

The app will automatically download both model files from this repository on first launch.

**Features:**
- Dropdown selectors for Area, Item, Crop Group, and Flag
- Text entry for Year
- **Predict Yield** button → returns estimated kg/ha
- **Predict Quality** button → returns Low / Medium / High

---

## 🔢 Encoding Reference

<details>
<summary>Area Encoding</summary>

| Area | Code |
|------|------|
| India | 0 |
| Nepal | 1 |
| Netherlands (Kingdom of the) | 2 |

</details>

<details>
<summary>Crop Group Encoding</summary>

| Crop Group | Code |
|------------|------|
| Cereal | 0 |
| Fruit | 1 |
| Oilseed | 2 |
| Pulse | 3 |
| Root | 4 |
| Vegetable | 5 |

</details>

<details>
<summary>Flag Encoding</summary>

| Flag | Code | Meaning |
|------|------|---------|
| A | 0 | Official figure |
| E | 1 | Estimated value |

</details>

<details>
<summary>Year Encoding</summary>

Year values are offset from 1961:

```
encoded_year = actual_year - 1961
# e.g. 2020 → 59,  1990 → 29,  1961 → 0
```

</details>

---

## 📊 Model Details

### Regression Model (`regression_model.pkl`)
- **Task:** Tabular regression
- **Target:** Log-transformed crop yield (`log1p(kg/ha)`), back-transformed with `expm1` at inference
- **Output:** Yield in kg/ha

### Classification Model (`xgboostClassification_model.pkl`)
- **Task:** Multi-class tabular classification
- **Framework:** XGBoost
- **Output classes:** Low (0), Medium (1), High (2)

---

## 📁 Repository Structure

```
BirendraSharma/cropyq/
├── regression_model.pkl             # Sklearn regression model
├── xgboostClassification_model.pkl  # XGBoost classification model
└── README.md                        # This file
```

---

## 📜 License

This project is licensed under the [MIT License](https://opensource.org/licenses/MIT).

---

## 🙏 Acknowledgements

Data sourced from the [FAO (Food and Agriculture Organization of the United Nations)](https://www.fao.org/faostat/) crop production statistics.