ArabovMK commited on
Commit
a34f68c
Β·
verified Β·
1 Parent(s): 7d06c3c

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +308 -0
README.md ADDED
@@ -0,0 +1,308 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - ru
5
+ license: mit
6
+ tags:
7
+ - pectin
8
+ - chemical-engineering
9
+ - machine-learning
10
+ - regression
11
+ - biotechnology
12
+ - food-technology
13
+ - production-optimization
14
+ - ml-in-chemistry
15
+ ---
16
+
17
+ # Pectin Production Models
18
+
19
+ **Machine Learning Models for Predicting Pectin Production Parameters from Process Conditions**
20
+
21
+ This repository contains trained machine learning models for predicting pectin quality parameters based on production process conditions. The models were trained on experimental data from various raw materials and extraction methods.
22
+
23
+ ## 🎯 Model Overview
24
+
25
+ ### Performance Summary
26
+
27
+ | Model | Type | RΒ² Score | MAE | Description |
28
+ |-------|------|----------|-----|-------------|
29
+ | **Best Model** | Gradient Boosting | 0.9427 | 868.44 | **Best overall model for pectin production** |
30
+ | Extra Trees | extra_trees | 0.9135 | 1060.1741 | Extra Trees model for pectin parameter prediction |
31
+ | Gradient Boosting | gradient_boosting | 0.9427 | 868.4403 | Gradient Boosting model - best performance for multi-target regression |
32
+ | K-Neighbors | k-neighbors | 0.8684 | 1287.5126 | Machine learning model for pectin production |
33
+ | Lasso Regression | lasso_regression | 0.3846 | 3702.0325 | Lasso Regression model with L1 regularization |
34
+ | Linear Regression | linear_regression | 0.6965 | 3730.7550 | Linear Regression baseline model |
35
+ | MultiLayer Perceptron | multilayer_perceptron | 0.8046 | 4253.8431 | Machine learning model for pectin production |
36
+ | Random Forest | random_forest | 0.9259 | 978.0065 | Random Forest model for robust pectin quality prediction |
37
+ | Ridge Regression | ridge_regression | 0.5553 | 3665.3101 | Ridge Regression model with L2 regularization |
38
+ | Support Vector Regression | support_vector_regression | 0.4832 | 6612.2360 | Machine learning model for pectin production |
39
+ | XGBoost | xgboost | 0.9203 | 1074.2310 | XGBoost model with excellent performance on tabular data |
40
+
41
+
42
+ ### Best Model Performance
43
+ - **Average RΒ²**: 0.9427
44
+ - **Average MAE**: 868.44
45
+ - **Targets Predicted**: 4 parameters simultaneously
46
+
47
+ ## πŸ“Š Model Details
48
+
49
+ ### Target Variables
50
+ - `pectin_yield`: Pectin yield (%)
51
+ - `galacturonic_acid`: Galacturonic acid content (%)
52
+ - `molecular_weight`: Molecular weight (Da)
53
+ - `esterification_degree`: Esterification degree (%)
54
+
55
+
56
+ ### Feature Variables
57
+ - `time_min`
58
+ - `temperature_c`
59
+ - `pressure_atm`
60
+ - `ph`
61
+ - `sample_encoded`
62
+ - `method_encoded`
63
+
64
+ ## πŸš€ Quick Start
65
+
66
+ ### Installation
67
+ ```bash
68
+ pip install transformers huggingface-hub scikit-learn xgboost pandas numpy joblib
69
+ ```
70
+
71
+ ### Basic Usage
72
+
73
+
74
+ ### Using the Best Model
75
+
76
+ ```python
77
+ from huggingface_hub import hf_hub_download
78
+ import joblib
79
+ import pandas as pd
80
+ import numpy as np
81
+
82
+ # Download model and supporting files
83
+ model_path = hf_hub_download(
84
+ repo_id="arabovs-ai-lab/PectinProductionModels",
85
+ filename="best_model/model.pkl",
86
+ repo_type="model"
87
+ )
88
+
89
+ scaler_path = hf_hub_download(
90
+ repo_id="arabovs-ai-lab/PectinProductionModels",
91
+ filename="scaler.pkl",
92
+ repo_type="model"
93
+ )
94
+
95
+ encoder_path = hf_hub_download(
96
+ repo_id="arabovs-ai-lab/PectinProductionModels",
97
+ filename="label_encoder.pkl",
98
+ repo_type="model"
99
+ )
100
+
101
+ # Load artifacts
102
+ model = joblib.load(model_path)
103
+ scaler = joblib.load(scaler_path)
104
+ with open(encoder_path, 'rb') as f:
105
+ label_encoder = pickle.load(f)
106
+
107
+ # Prepare input data
108
+ input_data = {'sample': 'Айв.', 'time_min': 5, 'temperature_c': 120, 'pressure_atm': 1.0, 'ph': 2.5}
109
+
110
+ # Create DataFrame
111
+ df = pd.DataFrame([input_data])
112
+
113
+ # Preprocess: encode sample type
114
+ df['sample_encoded'] = label_encoder.transform([input_data['sample']])[0]
115
+
116
+ # Create method_encoded feature
117
+ df['method_encoded'] = 1 if input_data['time_min'] <= 15 else 0
118
+
119
+ # Select features in correct order
120
+ features = ['time_min', 'temperature_c', 'pressure_atm', 'ph', 'sample_encoded', 'method_encoded']
121
+ X = df[features]
122
+
123
+ # Scale features
124
+ X_scaled = scaler.transform(X)
125
+
126
+ # Make prediction
127
+ predictions = model.predict(X_scaled)
128
+
129
+ # Create results dictionary
130
+ results = {}
131
+ for i, target in enumerate(['pectin_yield', 'galacturonic_acid', 'molecular_weight', 'esterification_degree']):
132
+ results[target] = predictions[0, i]
133
+
134
+ print("Prediction results:")
135
+ for target, value in results.items():
136
+ print(f" {target}: {value:.4f}")
137
+ ```
138
+
139
+ ### Batch Prediction from File
140
+
141
+ ```python
142
+ import pandas as pd
143
+ from huggingface_hub import hf_hub_download
144
+ import joblib
145
+ import pickle
146
+
147
+ class PectinPredictor:
148
+ def __init__(self, repo_id="arabovs-ai-lab/PectinProductionModels"):
149
+ self.repo_id = repo_id
150
+ self.model = None
151
+ self.scaler = None
152
+ self.label_encoder = None
153
+ self.feature_columns = ['time_min', 'temperature_c', 'pressure_atm', 'ph', 'sample_encoded', 'method_encoded']
154
+ self.target_columns = ['pectin_yield', 'galacturonic_acid', 'molecular_weight', 'esterification_degree']
155
+
156
+ def load_from_hub(self):
157
+ """Load model and artifacts from Hugging Face Hub."""
158
+ # Download model
159
+ model_path = hf_hub_download(
160
+ repo_id=self.repo_id,
161
+ filename="best_model/model.pkl",
162
+ repo_type="model"
163
+ )
164
+ self.model = joblib.load(model_path)
165
+
166
+ # Download scaler
167
+ scaler_path = hf_hub_download(
168
+ repo_id=self.repo_id,
169
+ filename="scaler.pkl",
170
+ repo_type="model"
171
+ )
172
+ self.scaler = joblib.load(scaler_path)
173
+
174
+ # Download label encoder
175
+ encoder_path = hf_hub_download(
176
+ repo_id=self.repo_id,
177
+ filename="label_encoder.pkl",
178
+ repo_type="model"
179
+ )
180
+ with open(encoder_path, 'rb') as f:
181
+ self.label_encoder = pickle.load(f)
182
+
183
+ def predict_batch(self, input_df):
184
+ """Predict on batch data."""
185
+ # Preprocessing
186
+ processed_df = input_df.copy()
187
+
188
+ # Encode sample type
189
+ processed_df['sample_encoded'] = self.label_encoder.transform(processed_df['sample'])
190
+
191
+ # Create method_encoded
192
+ processed_df['method_encoded'] = np.where(processed_df['time_min'] <= 15, 1, 0)
193
+
194
+ # Select and scale features
195
+ X = processed_df[self.feature_columns]
196
+ X_scaled = self.scaler.transform(X)
197
+
198
+ # Predict
199
+ predictions = self.model.predict(X_scaled)
200
+
201
+ # Add predictions to results
202
+ result_df = input_df.copy()
203
+ for i, target in enumerate(self.target_columns):
204
+ result_df[f'predicted_{target}'] = predictions[:, i]
205
+
206
+ return result_df
207
+
208
+ # Usage
209
+ predictor = PectinPredictor()
210
+ predictor.load_from_hub()
211
+
212
+ # Load your data
213
+ # df = pd.read_excel("your_data.xlsx")
214
+ # results = predictor.predict_batch(df)
215
+ ```
216
+
217
+ ### Comparing Different Models
218
+
219
+ ```python
220
+ from huggingface_hub import hf_hub_download
221
+ import joblib
222
+
223
+ def compare_models(input_data, repo_id="arabovs-ai-lab/PectinProductionModels"):
224
+ """Compare predictions from different models."""
225
+ models_to_compare = [
226
+ "best_model/model.pkl",
227
+ "gradient_boosting/model.pkl",
228
+ "random_forest/model.pkl",
229
+ "xgboost/model.pkl"
230
+ ]
231
+
232
+ results = {}
233
+
234
+ for model_path in models_to_compare:
235
+ model_name = model_path.split('/')[0]
236
+
237
+ # Download model
238
+ local_path = hf_hub_download(
239
+ repo_id=repo_id,
240
+ filename=model_path,
241
+ repo_type="model"
242
+ )
243
+
244
+ model = joblib.load(local_path)
245
+
246
+ # Make prediction (assuming preprocessed input)
247
+ # predictions = model.predict(preprocessed_input)
248
+ # results[model_name] = predictions
249
+
250
+ return results
251
+ ```
252
+
253
+
254
+ ## πŸ“ Repository Structure
255
+
256
+ ```
257
+ arabovs-ai-lab/PectinProductionModels/
258
+ β”œβ”€β”€ best_model/ # Best overall model (Gradient Boosting)
259
+ β”‚ β”œβ”€β”€ model.pkl # Serialized model file
260
+ β”‚ └── metadata.json # Model metadata
261
+ β”œβ”€β”€ random_forest/ # Random Forest model
262
+ β”œβ”€β”€ gradient_boosting/ # Gradient Boosting model
263
+ β”œβ”€β”€ xgboost/ # XGBoost model
264
+ β”œβ”€β”€ extra_trees/ # Extra Trees model
265
+ β”œβ”€β”€ linear_regression/ # Linear Regression model
266
+ β”œβ”€β”€ ridge_regression/ # Ridge Regression model
267
+ β”œβ”€β”€ lasso_regression/ # Lasso Regression model
268
+ β”œβ”€β”€ support_vector_regression/ # SVR model
269
+ β”œβ”€β”€ k_neighbors/ # K-Neighbors model
270
+ β”œβ”€β”€ multilayer_perceptron/ # MLP model
271
+ β”œβ”€β”€ scaler.pkl # Feature scaler
272
+ β”œβ”€β”€ label_encoder.pkl # Label encoder for categories
273
+ β”œβ”€β”€ model_metadata.json # Training metadata
274
+ β”œβ”€β”€ models_metadata.json # All models metadata
275
+ └── README.md # This file
276
+ ```
277
+
278
+ ## πŸ§ͺ Training Information
279
+
280
+ - **Dataset**: 1000 experimental records
281
+ - **Features**: 6 process parameters
282
+ - **Targets**: 4 quality parameters
283
+ - **Validation**: 80/20 train-test split
284
+ - **Cross-validation**: 5-fold
285
+ - **Best Algorithm**: Gradient Boosting
286
+
287
+ ## πŸ’‘ Key Features
288
+
289
+ - **Multi-target regression**: Predicts 4 parameters simultaneously
290
+ - **Process optimization**: Helps optimize pectin production conditions
291
+ - **Quality prediction**: Estimates pectin quality from process variables
292
+ - **Multiple algorithms**: 10 different ML algorithms for comparison
293
+
294
+ ## πŸ“„ License
295
+
296
+ MIT License
297
+
298
+ ---
299
+
300
+ *Last updated: 2025-11-21*
301
+ *Repository: https://huggingface.co/arabovs-ai-lab/PectinProductionModels*
302
+
303
+ ## πŸ”— References
304
+
305
+ - [Pectin Production Technology](https://en.wikipedia.org/wiki/Pectin)
306
+ - [Scikit-learn](https://scikit-learn.org/)
307
+ - [Hugging Face Hub](https://huggingface.co/docs/hub/)
308
+