ArabovMK commited on
Commit
e4321e1
Β·
verified Β·
1 Parent(s): a34f68c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +309 -308
README.md CHANGED
@@ -1,308 +1,309 @@
1
- ---
2
- language:
3
- - en
4
- - ru
5
- license: mit
6
- tags:
7
- - pectin
8
- - chemical-engineering
9
- - machine-learning
10
- - regression
11
- - biotechnology
12
- - food-technology
13
- - production-optimization
14
- - ml-in-chemistry
15
- ---
16
-
17
- # Pectin Production Models
18
-
19
- **Machine Learning Models for Predicting Pectin Production Parameters from Process Conditions**
20
-
21
- This repository contains trained machine learning models for predicting pectin quality parameters based on production process conditions. The models were trained on experimental data from various raw materials and extraction methods.
22
-
23
- ## 🎯 Model Overview
24
-
25
- ### Performance Summary
26
-
27
- | Model | Type | RΒ² Score | MAE | Description |
28
- |-------|------|----------|-----|-------------|
29
- | **Best Model** | Gradient Boosting | 0.9427 | 868.44 | **Best overall model for pectin production** |
30
- | Extra Trees | extra_trees | 0.9135 | 1060.1741 | Extra Trees model for pectin parameter prediction |
31
- | Gradient Boosting | gradient_boosting | 0.9427 | 868.4403 | Gradient Boosting model - best performance for multi-target regression |
32
- | K-Neighbors | k-neighbors | 0.8684 | 1287.5126 | Machine learning model for pectin production |
33
- | Lasso Regression | lasso_regression | 0.3846 | 3702.0325 | Lasso Regression model with L1 regularization |
34
- | Linear Regression | linear_regression | 0.6965 | 3730.7550 | Linear Regression baseline model |
35
- | MultiLayer Perceptron | multilayer_perceptron | 0.8046 | 4253.8431 | Machine learning model for pectin production |
36
- | Random Forest | random_forest | 0.9259 | 978.0065 | Random Forest model for robust pectin quality prediction |
37
- | Ridge Regression | ridge_regression | 0.5553 | 3665.3101 | Ridge Regression model with L2 regularization |
38
- | Support Vector Regression | support_vector_regression | 0.4832 | 6612.2360 | Machine learning model for pectin production |
39
- | XGBoost | xgboost | 0.9203 | 1074.2310 | XGBoost model with excellent performance on tabular data |
40
-
41
-
42
- ### Best Model Performance
43
- - **Average RΒ²**: 0.9427
44
- - **Average MAE**: 868.44
45
- - **Targets Predicted**: 4 parameters simultaneously
46
-
47
- ## πŸ“Š Model Details
48
-
49
- ### Target Variables
50
- - `pectin_yield`: Pectin yield (%)
51
- - `galacturonic_acid`: Galacturonic acid content (%)
52
- - `molecular_weight`: Molecular weight (Da)
53
- - `esterification_degree`: Esterification degree (%)
54
-
55
-
56
- ### Feature Variables
57
- - `time_min`
58
- - `temperature_c`
59
- - `pressure_atm`
60
- - `ph`
61
- - `sample_encoded`
62
- - `method_encoded`
63
-
64
- ## πŸš€ Quick Start
65
-
66
- ### Installation
67
- ```bash
68
- pip install transformers huggingface-hub scikit-learn xgboost pandas numpy joblib
69
- ```
70
-
71
- ### Basic Usage
72
-
73
-
74
- ### Using the Best Model
75
-
76
- ```python
77
- from huggingface_hub import hf_hub_download
78
- import joblib
79
- import pandas as pd
80
- import numpy as np
81
-
82
- # Download model and supporting files
83
- model_path = hf_hub_download(
84
- repo_id="arabovs-ai-lab/PectinProductionModels",
85
- filename="best_model/model.pkl",
86
- repo_type="model"
87
- )
88
-
89
- scaler_path = hf_hub_download(
90
- repo_id="arabovs-ai-lab/PectinProductionModels",
91
- filename="scaler.pkl",
92
- repo_type="model"
93
- )
94
-
95
- encoder_path = hf_hub_download(
96
- repo_id="arabovs-ai-lab/PectinProductionModels",
97
- filename="label_encoder.pkl",
98
- repo_type="model"
99
- )
100
-
101
- # Load artifacts
102
- model = joblib.load(model_path)
103
- scaler = joblib.load(scaler_path)
104
- with open(encoder_path, 'rb') as f:
105
- label_encoder = pickle.load(f)
106
-
107
- # Prepare input data
108
- input_data = {'sample': 'Айв.', 'time_min': 5, 'temperature_c': 120, 'pressure_atm': 1.0, 'ph': 2.5}
109
-
110
- # Create DataFrame
111
- df = pd.DataFrame([input_data])
112
-
113
- # Preprocess: encode sample type
114
- df['sample_encoded'] = label_encoder.transform([input_data['sample']])[0]
115
-
116
- # Create method_encoded feature
117
- df['method_encoded'] = 1 if input_data['time_min'] <= 15 else 0
118
-
119
- # Select features in correct order
120
- features = ['time_min', 'temperature_c', 'pressure_atm', 'ph', 'sample_encoded', 'method_encoded']
121
- X = df[features]
122
-
123
- # Scale features
124
- X_scaled = scaler.transform(X)
125
-
126
- # Make prediction
127
- predictions = model.predict(X_scaled)
128
-
129
- # Create results dictionary
130
- results = {}
131
- for i, target in enumerate(['pectin_yield', 'galacturonic_acid', 'molecular_weight', 'esterification_degree']):
132
- results[target] = predictions[0, i]
133
-
134
- print("Prediction results:")
135
- for target, value in results.items():
136
- print(f" {target}: {value:.4f}")
137
- ```
138
-
139
- ### Batch Prediction from File
140
-
141
- ```python
142
- import pandas as pd
143
- from huggingface_hub import hf_hub_download
144
- import joblib
145
- import pickle
146
-
147
- class PectinPredictor:
148
- def __init__(self, repo_id="arabovs-ai-lab/PectinProductionModels"):
149
- self.repo_id = repo_id
150
- self.model = None
151
- self.scaler = None
152
- self.label_encoder = None
153
- self.feature_columns = ['time_min', 'temperature_c', 'pressure_atm', 'ph', 'sample_encoded', 'method_encoded']
154
- self.target_columns = ['pectin_yield', 'galacturonic_acid', 'molecular_weight', 'esterification_degree']
155
-
156
- def load_from_hub(self):
157
- """Load model and artifacts from Hugging Face Hub."""
158
- # Download model
159
- model_path = hf_hub_download(
160
- repo_id=self.repo_id,
161
- filename="best_model/model.pkl",
162
- repo_type="model"
163
- )
164
- self.model = joblib.load(model_path)
165
-
166
- # Download scaler
167
- scaler_path = hf_hub_download(
168
- repo_id=self.repo_id,
169
- filename="scaler.pkl",
170
- repo_type="model"
171
- )
172
- self.scaler = joblib.load(scaler_path)
173
-
174
- # Download label encoder
175
- encoder_path = hf_hub_download(
176
- repo_id=self.repo_id,
177
- filename="label_encoder.pkl",
178
- repo_type="model"
179
- )
180
- with open(encoder_path, 'rb') as f:
181
- self.label_encoder = pickle.load(f)
182
-
183
- def predict_batch(self, input_df):
184
- """Predict on batch data."""
185
- # Preprocessing
186
- processed_df = input_df.copy()
187
-
188
- # Encode sample type
189
- processed_df['sample_encoded'] = self.label_encoder.transform(processed_df['sample'])
190
-
191
- # Create method_encoded
192
- processed_df['method_encoded'] = np.where(processed_df['time_min'] <= 15, 1, 0)
193
-
194
- # Select and scale features
195
- X = processed_df[self.feature_columns]
196
- X_scaled = self.scaler.transform(X)
197
-
198
- # Predict
199
- predictions = self.model.predict(X_scaled)
200
-
201
- # Add predictions to results
202
- result_df = input_df.copy()
203
- for i, target in enumerate(self.target_columns):
204
- result_df[f'predicted_{target}'] = predictions[:, i]
205
-
206
- return result_df
207
-
208
- # Usage
209
- predictor = PectinPredictor()
210
- predictor.load_from_hub()
211
-
212
- # Load your data
213
- # df = pd.read_excel("your_data.xlsx")
214
- # results = predictor.predict_batch(df)
215
- ```
216
-
217
- ### Comparing Different Models
218
-
219
- ```python
220
- from huggingface_hub import hf_hub_download
221
- import joblib
222
-
223
- def compare_models(input_data, repo_id="arabovs-ai-lab/PectinProductionModels"):
224
- """Compare predictions from different models."""
225
- models_to_compare = [
226
- "best_model/model.pkl",
227
- "gradient_boosting/model.pkl",
228
- "random_forest/model.pkl",
229
- "xgboost/model.pkl"
230
- ]
231
-
232
- results = {}
233
-
234
- for model_path in models_to_compare:
235
- model_name = model_path.split('/')[0]
236
-
237
- # Download model
238
- local_path = hf_hub_download(
239
- repo_id=repo_id,
240
- filename=model_path,
241
- repo_type="model"
242
- )
243
-
244
- model = joblib.load(local_path)
245
-
246
- # Make prediction (assuming preprocessed input)
247
- # predictions = model.predict(preprocessed_input)
248
- # results[model_name] = predictions
249
-
250
- return results
251
- ```
252
-
253
-
254
- ## πŸ“ Repository Structure
255
-
256
- ```
257
- arabovs-ai-lab/PectinProductionModels/
258
- β”œβ”€β”€ best_model/ # Best overall model (Gradient Boosting)
259
- β”‚ β”œβ”€β”€ model.pkl # Serialized model file
260
- β”‚ └── metadata.json # Model metadata
261
- β”œβ”€β”€ random_forest/ # Random Forest model
262
- β”œβ”€β”€ gradient_boosting/ # Gradient Boosting model
263
- β”œβ”€β”€ xgboost/ # XGBoost model
264
- β”œβ”€β”€ extra_trees/ # Extra Trees model
265
- β”œβ”€β”€ linear_regression/ # Linear Regression model
266
- β”œβ”€β”€ ridge_regression/ # Ridge Regression model
267
- β”œβ”€β”€ lasso_regression/ # Lasso Regression model
268
- β”œβ”€β”€ support_vector_regression/ # SVR model
269
- β”œβ”€β”€ k_neighbors/ # K-Neighbors model
270
- β”œβ”€β”€ multilayer_perceptron/ # MLP model
271
- β”œβ”€β”€ scaler.pkl # Feature scaler
272
- β”œβ”€β”€ label_encoder.pkl # Label encoder for categories
273
- β”œβ”€β”€ model_metadata.json # Training metadata
274
- β”œβ”€β”€ models_metadata.json # All models metadata
275
- └── README.md # This file
276
- ```
277
-
278
- ## πŸ§ͺ Training Information
279
-
280
- - **Dataset**: 1000 experimental records
281
- - **Features**: 6 process parameters
282
- - **Targets**: 4 quality parameters
283
- - **Validation**: 80/20 train-test split
284
- - **Cross-validation**: 5-fold
285
- - **Best Algorithm**: Gradient Boosting
286
-
287
- ## πŸ’‘ Key Features
288
-
289
- - **Multi-target regression**: Predicts 4 parameters simultaneously
290
- - **Process optimization**: Helps optimize pectin production conditions
291
- - **Quality prediction**: Estimates pectin quality from process variables
292
- - **Multiple algorithms**: 10 different ML algorithms for comparison
293
-
294
- ## πŸ“„ License
295
-
296
- MIT License
297
-
298
- ---
299
-
300
- *Last updated: 2025-11-21*
301
- *Repository: https://huggingface.co/arabovs-ai-lab/PectinProductionModels*
302
-
303
- ## πŸ”— References
304
-
305
- - [Pectin Production Technology](https://en.wikipedia.org/wiki/Pectin)
306
- - [Scikit-learn](https://scikit-learn.org/)
307
- - [Hugging Face Hub](https://huggingface.co/docs/hub/)
308
-
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - ru
5
+ license: mit
6
+ tags:
7
+ - pectin
8
+ - chemical-engineering
9
+ - machine-learning
10
+ - regression
11
+ - biotechnology
12
+ - food-technology
13
+ - production-optimization
14
+ - ml-in-chemistry
15
+ ---
16
+
17
+ # Pectin Production Models
18
+
19
+ **Machine Learning Models for Predicting Pectin Production Parameters from Process Conditions**
20
+
21
+ This repository contains trained machine learning models for predicting pectin quality parameters based on production process conditions. The models were trained on experimental data from various raw materials and extraction methods.
22
+
23
+ ## 🎯 Model Overview
24
+
25
+ ### Performance Summary
26
+
27
+ | Model | Type | RΒ² Score | MAE | Description |
28
+ |-------|------|----------|-----|-------------|
29
+ | **Best Model** | Gradient Boosting | 0.9427 | 868.44 | **Best overall model for pectin production** |
30
+ | Extra Trees | extra_trees | 0.9135 | 1060.1741 | Extra Trees model for pectin parameter prediction |
31
+ | Gradient Boosting | gradient_boosting | 0.9427 | 868.4403 | Gradient Boosting model - best performance for multi-target regression |
32
+ | K-Neighbors | k-neighbors | 0.8684 | 1287.5126 | Machine learning model for pectin production |
33
+ | Lasso Regression | lasso_regression | 0.3846 | 3702.0325 | Lasso Regression model with L1 regularization |
34
+ | Linear Regression | linear_regression | 0.6965 | 3730.7550 | Linear Regression baseline model |
35
+ | MultiLayer Perceptron | multilayer_perceptron | 0.8046 | 4253.8431 | Machine learning model for pectin production |
36
+ | Random Forest | random_forest | 0.9259 | 978.0065 | Random Forest model for robust pectin quality prediction |
37
+ | Ridge Regression | ridge_regression | 0.5553 | 3665.3101 | Ridge Regression model with L2 regularization |
38
+ | Support Vector Regression | support_vector_regression | 0.4832 | 6612.2360 | Machine learning model for pectin production |
39
+ | XGBoost | xgboost | 0.9203 | 1074.2310 | XGBoost model with excellent performance on tabular data |
40
+
41
+
42
+ ### Best Model Performance
43
+ - **Average RΒ²**: 0.9427
44
+ - **Average MAE**: 868.44
45
+ - **Targets Predicted**: 4 parameters simultaneously
46
+
47
+ ## πŸ“Š Model Details
48
+
49
+ ### Target Variables
50
+ - `pectin_yield`: Pectin yield (%)
51
+ - `galacturonic_acid`: Galacturonic acid content (%)
52
+ - `molecular_weight`: Molecular weight (Da)
53
+ - `esterification_degree`: Esterification degree (%)
54
+
55
+
56
+ ### Feature Variables
57
+ - `time_min`
58
+ - `temperature_c`
59
+ - `pressure_atm`
60
+ - `ph`
61
+ - `sample_encoded`
62
+ - `method_encoded`
63
+
64
+ ## πŸš€ Quick Start
65
+
66
+ ### Installation
67
+ ```bash
68
+ pip install transformers huggingface-hub scikit-learn xgboost pandas numpy joblib
69
+ ```
70
+
71
+ ### Basic Usage
72
+
73
+
74
+ ### Using the Best Model
75
+
76
+ ```python
77
+ from huggingface_hub import hf_hub_download
78
+ import joblib
79
+ import pandas as pd
80
+ import numpy as np
81
+ import pickle
82
+
83
+ # Download model and supporting files
84
+ model_path = hf_hub_download(
85
+ repo_id="arabovs-ai-lab/PectinProductionModels",
86
+ filename="best_model/model.pkl",
87
+ repo_type="model"
88
+ )
89
+
90
+ scaler_path = hf_hub_download(
91
+ repo_id="arabovs-ai-lab/PectinProductionModels",
92
+ filename="scaler.pkl",
93
+ repo_type="model"
94
+ )
95
+
96
+ encoder_path = hf_hub_download(
97
+ repo_id="arabovs-ai-lab/PectinProductionModels",
98
+ filename="label_encoder.pkl",
99
+ repo_type="model"
100
+ )
101
+
102
+ # Load artifacts
103
+ model = joblib.load(model_path)
104
+ scaler = joblib.load(scaler_path)
105
+ with open(encoder_path, 'rb') as f:
106
+ label_encoder = pickle.load(f)
107
+
108
+ # Prepare input data
109
+ input_data = {'sample': 'Айв.', 'time_min': 5, 'temperature_c': 120, 'pressure_atm': 1.0, 'ph': 2.5}
110
+
111
+ # Create DataFrame
112
+ df = pd.DataFrame([input_data])
113
+
114
+ # Preprocess: encode sample type
115
+ df['sample_encoded'] = label_encoder.transform([input_data['sample']])[0]
116
+
117
+ # Create method_encoded feature
118
+ df['method_encoded'] = 1 if input_data['time_min'] <= 15 else 0
119
+
120
+ # Select features in correct order
121
+ features = ['time_min', 'temperature_c', 'pressure_atm', 'ph', 'sample_encoded', 'method_encoded']
122
+ X = df[features]
123
+
124
+ # Scale features
125
+ X_scaled = scaler.transform(X)
126
+
127
+ # Make prediction
128
+ predictions = model.predict(X_scaled)
129
+
130
+ # Create results dictionary
131
+ results = {}
132
+ for i, target in enumerate(['pectin_yield', 'galacturonic_acid', 'molecular_weight', 'esterification_degree']):
133
+ results[target] = predictions[0, i]
134
+
135
+ print("Prediction results:")
136
+ for target, value in results.items():
137
+ print(f" {target}: {value:.4f}")
138
+ ```
139
+
140
+ ### Batch Prediction from File
141
+
142
+ ```python
143
+ import pandas as pd
144
+ from huggingface_hub import hf_hub_download
145
+ import joblib
146
+ import pickle
147
+
148
+ class PectinPredictor:
149
+ def __init__(self, repo_id="arabovs-ai-lab/PectinProductionModels"):
150
+ self.repo_id = repo_id
151
+ self.model = None
152
+ self.scaler = None
153
+ self.label_encoder = None
154
+ self.feature_columns = ['time_min', 'temperature_c', 'pressure_atm', 'ph', 'sample_encoded', 'method_encoded']
155
+ self.target_columns = ['pectin_yield', 'galacturonic_acid', 'molecular_weight', 'esterification_degree']
156
+
157
+ def load_from_hub(self):
158
+ """Load model and artifacts from Hugging Face Hub."""
159
+ # Download model
160
+ model_path = hf_hub_download(
161
+ repo_id=self.repo_id,
162
+ filename="best_model/model.pkl",
163
+ repo_type="model"
164
+ )
165
+ self.model = joblib.load(model_path)
166
+
167
+ # Download scaler
168
+ scaler_path = hf_hub_download(
169
+ repo_id=self.repo_id,
170
+ filename="scaler.pkl",
171
+ repo_type="model"
172
+ )
173
+ self.scaler = joblib.load(scaler_path)
174
+
175
+ # Download label encoder
176
+ encoder_path = hf_hub_download(
177
+ repo_id=self.repo_id,
178
+ filename="label_encoder.pkl",
179
+ repo_type="model"
180
+ )
181
+ with open(encoder_path, 'rb') as f:
182
+ self.label_encoder = pickle.load(f)
183
+
184
+ def predict_batch(self, input_df):
185
+ """Predict on batch data."""
186
+ # Preprocessing
187
+ processed_df = input_df.copy()
188
+
189
+ # Encode sample type
190
+ processed_df['sample_encoded'] = self.label_encoder.transform(processed_df['sample'])
191
+
192
+ # Create method_encoded
193
+ processed_df['method_encoded'] = np.where(processed_df['time_min'] <= 15, 1, 0)
194
+
195
+ # Select and scale features
196
+ X = processed_df[self.feature_columns]
197
+ X_scaled = self.scaler.transform(X)
198
+
199
+ # Predict
200
+ predictions = self.model.predict(X_scaled)
201
+
202
+ # Add predictions to results
203
+ result_df = input_df.copy()
204
+ for i, target in enumerate(self.target_columns):
205
+ result_df[f'predicted_{target}'] = predictions[:, i]
206
+
207
+ return result_df
208
+
209
+ # Usage
210
+ predictor = PectinPredictor()
211
+ predictor.load_from_hub()
212
+
213
+ # Load your data
214
+ # df = pd.read_excel("your_data.xlsx")
215
+ # results = predictor.predict_batch(df)
216
+ ```
217
+
218
+ ### Comparing Different Models
219
+
220
+ ```python
221
+ from huggingface_hub import hf_hub_download
222
+ import joblib
223
+
224
+ def compare_models(input_data, repo_id="arabovs-ai-lab/PectinProductionModels"):
225
+ """Compare predictions from different models."""
226
+ models_to_compare = [
227
+ "best_model/model.pkl",
228
+ "gradient_boosting/model.pkl",
229
+ "random_forest/model.pkl",
230
+ "xgboost/model.pkl"
231
+ ]
232
+
233
+ results = {}
234
+
235
+ for model_path in models_to_compare:
236
+ model_name = model_path.split('/')[0]
237
+
238
+ # Download model
239
+ local_path = hf_hub_download(
240
+ repo_id=repo_id,
241
+ filename=model_path,
242
+ repo_type="model"
243
+ )
244
+
245
+ model = joblib.load(local_path)
246
+
247
+ # Make prediction (assuming preprocessed input)
248
+ # predictions = model.predict(preprocessed_input)
249
+ # results[model_name] = predictions
250
+
251
+ return results
252
+ ```
253
+
254
+
255
+ ## πŸ“ Repository Structure
256
+
257
+ ```
258
+ arabovs-ai-lab/PectinProductionModels/
259
+ β”œβ”€β”€ best_model/ # Best overall model (Gradient Boosting)
260
+ β”‚ β”œβ”€β”€ model.pkl # Serialized model file
261
+ β”‚ └── metadata.json # Model metadata
262
+ β”œβ”€β”€ random_forest/ # Random Forest model
263
+ β”œβ”€β”€ gradient_boosting/ # Gradient Boosting model
264
+ β”œβ”€β”€ xgboost/ # XGBoost model
265
+ β”œβ”€β”€ extra_trees/ # Extra Trees model
266
+ β”œβ”€β”€ linear_regression/ # Linear Regression model
267
+ β”œβ”€β”€ ridge_regression/ # Ridge Regression model
268
+ β”œβ”€β”€ lasso_regression/ # Lasso Regression model
269
+ β”œβ”€β”€ support_vector_regression/ # SVR model
270
+ β”œβ”€β”€ k_neighbors/ # K-Neighbors model
271
+ β”œβ”€β”€ multilayer_perceptron/ # MLP model
272
+ β”œβ”€β”€ scaler.pkl # Feature scaler
273
+ β”œβ”€β”€ label_encoder.pkl # Label encoder for categories
274
+ β”œβ”€β”€ model_metadata.json # Training metadata
275
+ β”œβ”€β”€ models_metadata.json # All models metadata
276
+ └── README.md # This file
277
+ ```
278
+
279
+ ## πŸ§ͺ Training Information
280
+
281
+ - **Dataset**: 1000 experimental records
282
+ - **Features**: 6 process parameters
283
+ - **Targets**: 4 quality parameters
284
+ - **Validation**: 80/20 train-test split
285
+ - **Cross-validation**: 5-fold
286
+ - **Best Algorithm**: Gradient Boosting
287
+
288
+ ## πŸ’‘ Key Features
289
+
290
+ - **Multi-target regression**: Predicts 4 parameters simultaneously
291
+ - **Process optimization**: Helps optimize pectin production conditions
292
+ - **Quality prediction**: Estimates pectin quality from process variables
293
+ - **Multiple algorithms**: 10 different ML algorithms for comparison
294
+
295
+ ## πŸ“„ License
296
+
297
+ MIT License
298
+
299
+ ---
300
+
301
+ *Last updated: 2025-11-21*
302
+ *Repository: https://huggingface.co/arabovs-ai-lab/PectinProductionModels*
303
+
304
+ ## πŸ”— References
305
+
306
+ - [Pectin Production Technology](https://en.wikipedia.org/wiki/Pectin)
307
+ - [Scikit-learn](https://scikit-learn.org/)
308
+ - [Hugging Face Hub](https://huggingface.co/docs/hub/)
309
+