--- language: en tags: - crop-yield - agriculture - regression - classification - xgboost - tabular license: mit datasets: - fao pipeline_tag: tabular-regression --- # 🌾 CropYQ — Crop Yield & Quality Prediction Models **Repository:** `BirendraSharma/cropyq` This repository hosts two trained machine learning models for predicting agricultural crop yield and quality across India, Nepal, and the Netherlands, based on FAO crop production data. --- ## 📦 Models Included | File | Type | Description | |------|------|-------------| | `regression_model.pkl` | Scikit-learn Regressor | Predicts crop yield in **kg/ha** (log-transformed target, inverse-transformed on output) | | `xgboostClassification_model.pkl` | XGBoost Classifier | Predicts crop quality as **Low / Medium / High** | --- ## 🗂️ Input Features Both models share the same 5-feature input vector: | Feature | Type | Description | |---------|------|-------------| | `Area` | Encoded int | Country (India=0, Nepal=1, Netherlands=2) | | `Item` | Encoded int | Crop type (38 categories, e.g. Wheat=36, Rice=27) | | `Crop Group` | Encoded int | Cereal=0, Fruit=1, Oilseed=2, Pulse=3, Root=4, Vegetable=5 | | `Flag` | Encoded int | FAO data flag — A=0, E=1 | | `Year` | int | Year offset from 1961 (e.g. 2020 → 59) | --- ## 🌍 Supported Areas - India - Nepal - Netherlands (Kingdom of the) --- ## 🌱 Supported Crops (38 total) Apples, Bananas, Barley, Beans (dry), Broad beans, Cabbages, Carrots & turnips, Cassava, Cauliflowers & broccoli, Chick peas, Chillies & peppers, Eggplants, Grapes, Groundnuts, Lentils, Linseed, Maize (corn), Mangoes, Millet, Mustard seed, Oats, Onions & shallots, Oranges, Peas (dry), Pigeon peas, Potatoes, Rape/colza seed, Rice, Rye, Sesame seed, Sorghum, Soya beans, Sunflower seed, Sweet potatoes, Tomatoes, Triticale, Wheat, Yams --- ## 🚀 Quickstart ### Install dependencies ```bash pip install huggingface_hub scikit-learn xgboost numpy ``` ### Load and use the models ```python import pickle import numpy as np from huggingface_hub import hf_hub_download REPO_ID = "BirendraSharma/cropyq" # Download and load regression model reg_path = hf_hub_download(repo_id=REPO_ID, filename="regression_model.pkl") with open(reg_path, "rb") as f: reg_model = pickle.load(f) # Download and load classification model clf_path = hf_hub_download(repo_id=REPO_ID, filename="xgboostClassification_model.pkl") with open(clf_path, "rb") as f: clf_model = pickle.load(f) # Example: Wheat in India, Cereal group, Flag A, Year 2020 # area=0 (India), item=36 (Wheat), cropgroup=0 (Cereal), flag=0 (A), year=2020-1961=59 inputs = np.array([[0, 36, 0, 0, 59]], dtype=np.float32) # Predict yield (kg/ha) — model was trained on log1p target log_yield = reg_model.predict(inputs)[0] yield_kgha = np.expm1(log_yield) print(f"Predicted Yield: {yield_kgha:.2f} kg/ha") # Predict quality quality_map = {0: "Low", 1: "Medium", 2: "High"} quality_pred = clf_model.predict(inputs)[0] print(f"Predicted Quality: {quality_map[int(quality_pred)]}") ``` --- ## 🖥️ Desktop GUI App A Tkinter-based desktop app is available that provides a point-and-click interface for running predictions. ### Run the app ```bash pip install huggingface_hub scikit-learn xgboost numpy tkinter python crop_yield_app.py ``` The app will automatically download both model files from this repository on first launch. **Features:** - Dropdown selectors for Area, Item, Crop Group, and Flag - Text entry for Year - **Predict Yield** button → returns estimated kg/ha - **Predict Quality** button → returns Low / Medium / High --- ## 🔢 Encoding Reference
Area Encoding | Area | Code | |------|------| | India | 0 | | Nepal | 1 | | Netherlands (Kingdom of the) | 2 |
Crop Group Encoding | Crop Group | Code | |------------|------| | Cereal | 0 | | Fruit | 1 | | Oilseed | 2 | | Pulse | 3 | | Root | 4 | | Vegetable | 5 |
Flag Encoding | Flag | Code | Meaning | |------|------|---------| | A | 0 | Official figure | | E | 1 | Estimated value |
Year Encoding Year values are offset from 1961: ``` encoded_year = actual_year - 1961 # e.g. 2020 → 59, 1990 → 29, 1961 → 0 ```
--- ## 📊 Model Details ### Regression Model (`regression_model.pkl`) - **Task:** Tabular regression - **Target:** Log-transformed crop yield (`log1p(kg/ha)`), back-transformed with `expm1` at inference - **Output:** Yield in kg/ha ### Classification Model (`xgboostClassification_model.pkl`) - **Task:** Multi-class tabular classification - **Framework:** XGBoost - **Output classes:** Low (0), Medium (1), High (2) --- ## 📁 Repository Structure ``` BirendraSharma/cropyq/ ├── regression_model.pkl # Sklearn regression model ├── xgboostClassification_model.pkl # XGBoost classification model └── README.md # This file ``` --- ## 📜 License This project is licensed under the [MIT License](https://opensource.org/licenses/MIT). --- ## 🙏 Acknowledgements Data sourced from the [FAO (Food and Agriculture Organization of the United Nations)](https://www.fao.org/faostat/) crop production statistics.