vohtaski commited on
Commit
0b93f90
·
verified ·
1 Parent(s): 778b176

Upload 4 files

Browse files
BuildModels_open_source.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
README ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Yield Prediction Model - Open Source Demo
2
+
3
+ ## Overview
4
+
5
+ This project demonstrates machine learning models to predict vines yield at harvest time using remote sensing data, weather information, soil properties, and agronomic attributes.
6
+
7
+ The models predict:
8
+ - **TCH (Tons of Grapes per Hectare)**: Vines yield at harvest
9
+
10
+ ## Files
11
+
12
+ - **`training_features_anonymized.csv`** - Dataset for model training (926 rows × 589 columns)
13
+ - **`BuildModels_open_source.ipynb`** - Self-contained notebook for training prediction models
14
+
15
+ ## BuildModels_open_source.ipynb - Quick Start Guide
16
+
17
+ ### What It Does
18
+ This notebook trains machine learning models to predict vines yield (TCH) at harvest time.
19
+
20
+ ### Input
21
+ - **File**: `training_features_anonymized.csv` (926 harvest observations)
22
+ - **Features**: Satellite data, weather, soil properties, and crop characteristics
23
+
24
+ ### Process
25
+ 1. **Load and prepare data**
26
+ - Read CSV file
27
+ - Encode categorical variables (variety, rootstock type)
28
+
29
+ 2. **Train models using Leave-One-Season-Out Cross-Validation**
30
+ - For each season: train on all other seasons, test on held-out season
31
+ - Algorithm: LightGBM with 31 leaves, 100 trees
32
+ - Remove outliers: TCH (0.1-60 tons/ha)
33
+
34
+ 3. **Evaluate performance**
35
+ - Calculate metrics: RMSE, MAE, R², MAPE
36
+ - Generate scatter plots and feature importance charts
37
+
38
+ 4. **Save final models**
39
+ - Train on complete dataset
40
+ - Export as `.joblib` files for future use
41
+
42
+ ### Output Files
43
+ - `tch_model.joblib` - Yield prediction model
44
+ - `tch_encoders.joblib` - Label encoders for categorical variables
45
+
46
+ ### Feature Set Used
47
+ **Weather + Soil + Extra**:
48
+ - 5 satellite spectral indices (NDVI, EVI, VARI, NDRE, NDWI) × 42 time steps = 210 features
49
+ - Weather time series: precipitation, temperature, degree days (28 features)
50
+ - Soil properties: clay, sand, nitrogen, at 4 depths (25 features)
51
+ - Agronomic: variety, age, cut cycle, day of year (4 features)
52
+ - Extra: rootstock type, spacing, coordinates (5 features)
53
+ - **Total**: ~272 features
54
+
55
+
56
+ ## Requirements
57
+
58
+ ```
59
+ numpy>=1.21.0
60
+ pandas>=1.3.0
61
+ scikit-learn>=1.0.0
62
+ lightgbm>=3.3.0
63
+ matplotlib>=3.4.0
64
+ seaborn>=0.11.0
65
+ jupyter>=1.0.0
66
+ joblib>=1.0.0
67
+ ```
68
+
69
+ ## Usage
70
+
71
+ ### Installation
72
+
73
+ ```bash
74
+ # Install dependencies
75
+ pip install numpy pandas scikit-learn lightgbm matplotlib seaborn jupyter joblib
76
+ ```
77
+
78
+ ### Running the Notebook
79
+
80
+ ```bash
81
+ # Navigate to the notebook directory
82
+ cd open_source_model/
83
+
84
+ # Launch Jupyter
85
+ jupyter notebook
86
+
87
+ # Open BuildModels_open_source.ipynb and run all cells
88
+ ```
89
+
90
+ ### Using the Trained Models
91
+
92
+ ```python
93
+ import joblib
94
+ import pandas as pd
95
+
96
+ # Load models and encoders
97
+ tch_model = joblib.load('tch_model.joblib')
98
+ tch_encoders = joblib.load('tch_encoders.joblib')
99
+
100
+ # Prepare your data (must have the same features)
101
+ # X = pd.DataFrame(...) # Your feature data
102
+
103
+ # Make predictions
104
+ tch_predictions = tch_model.predict(X)
105
+ ```
tch_encoders.joblib ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:11dff23d7c255d3c41e958ed0e3a94ee244b359ac198046684d569ef81344bf9
3
+ size 3864
tch_model.joblib ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f64c13288d8233bf9f23dfc54bae8b75a31729f9c8c242b46a59c0a4e28b64b8
3
+ size 303279