Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,308 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
- ru
|
| 5 |
+
license: mit
|
| 6 |
+
tags:
|
| 7 |
+
- pectin
|
| 8 |
+
- chemical-engineering
|
| 9 |
+
- machine-learning
|
| 10 |
+
- regression
|
| 11 |
+
- biotechnology
|
| 12 |
+
- food-technology
|
| 13 |
+
- production-optimization
|
| 14 |
+
- ml-in-chemistry
|
| 15 |
+
---
|
| 16 |
+
|
| 17 |
+
# Pectin Production Models
|
| 18 |
+
|
| 19 |
+
**Machine Learning Models for Predicting Pectin Production Parameters from Process Conditions**
|
| 20 |
+
|
| 21 |
+
This repository contains trained machine learning models for predicting pectin quality parameters based on production process conditions. The models were trained on experimental data from various raw materials and extraction methods.
|
| 22 |
+
|
| 23 |
+
## π― Model Overview
|
| 24 |
+
|
| 25 |
+
### Performance Summary
|
| 26 |
+
|
| 27 |
+
| Model | Type | RΒ² Score | MAE | Description |
|
| 28 |
+
|-------|------|----------|-----|-------------|
|
| 29 |
+
| **Best Model** | Gradient Boosting | 0.9427 | 868.44 | **Best overall model for pectin production** |
|
| 30 |
+
| Extra Trees | extra_trees | 0.9135 | 1060.1741 | Extra Trees model for pectin parameter prediction |
|
| 31 |
+
| Gradient Boosting | gradient_boosting | 0.9427 | 868.4403 | Gradient Boosting model - best performance for multi-target regression |
|
| 32 |
+
| K-Neighbors | k-neighbors | 0.8684 | 1287.5126 | Machine learning model for pectin production |
|
| 33 |
+
| Lasso Regression | lasso_regression | 0.3846 | 3702.0325 | Lasso Regression model with L1 regularization |
|
| 34 |
+
| Linear Regression | linear_regression | 0.6965 | 3730.7550 | Linear Regression baseline model |
|
| 35 |
+
| MultiLayer Perceptron | multilayer_perceptron | 0.8046 | 4253.8431 | Machine learning model for pectin production |
|
| 36 |
+
| Random Forest | random_forest | 0.9259 | 978.0065 | Random Forest model for robust pectin quality prediction |
|
| 37 |
+
| Ridge Regression | ridge_regression | 0.5553 | 3665.3101 | Ridge Regression model with L2 regularization |
|
| 38 |
+
| Support Vector Regression | support_vector_regression | 0.4832 | 6612.2360 | Machine learning model for pectin production |
|
| 39 |
+
| XGBoost | xgboost | 0.9203 | 1074.2310 | XGBoost model with excellent performance on tabular data |
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
### Best Model Performance
|
| 43 |
+
- **Average RΒ²**: 0.9427
|
| 44 |
+
- **Average MAE**: 868.44
|
| 45 |
+
- **Targets Predicted**: 4 parameters simultaneously
|
| 46 |
+
|
| 47 |
+
## π Model Details
|
| 48 |
+
|
| 49 |
+
### Target Variables
|
| 50 |
+
- `pectin_yield`: Pectin yield (%)
|
| 51 |
+
- `galacturonic_acid`: Galacturonic acid content (%)
|
| 52 |
+
- `molecular_weight`: Molecular weight (Da)
|
| 53 |
+
- `esterification_degree`: Esterification degree (%)
|
| 54 |
+
|
| 55 |
+
|
| 56 |
+
### Feature Variables
|
| 57 |
+
- `time_min`
|
| 58 |
+
- `temperature_c`
|
| 59 |
+
- `pressure_atm`
|
| 60 |
+
- `ph`
|
| 61 |
+
- `sample_encoded`
|
| 62 |
+
- `method_encoded`
|
| 63 |
+
|
| 64 |
+
## π Quick Start
|
| 65 |
+
|
| 66 |
+
### Installation
|
| 67 |
+
```bash
|
| 68 |
+
pip install transformers huggingface-hub scikit-learn xgboost pandas numpy joblib
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
### Basic Usage
|
| 72 |
+
|
| 73 |
+
|
| 74 |
+
### Using the Best Model
|
| 75 |
+
|
| 76 |
+
```python
|
| 77 |
+
from huggingface_hub import hf_hub_download
|
| 78 |
+
import joblib
|
| 79 |
+
import pandas as pd
|
| 80 |
+
import numpy as np
|
| 81 |
+
|
| 82 |
+
# Download model and supporting files
|
| 83 |
+
model_path = hf_hub_download(
|
| 84 |
+
repo_id="arabovs-ai-lab/PectinProductionModels",
|
| 85 |
+
filename="best_model/model.pkl",
|
| 86 |
+
repo_type="model"
|
| 87 |
+
)
|
| 88 |
+
|
| 89 |
+
scaler_path = hf_hub_download(
|
| 90 |
+
repo_id="arabovs-ai-lab/PectinProductionModels",
|
| 91 |
+
filename="scaler.pkl",
|
| 92 |
+
repo_type="model"
|
| 93 |
+
)
|
| 94 |
+
|
| 95 |
+
encoder_path = hf_hub_download(
|
| 96 |
+
repo_id="arabovs-ai-lab/PectinProductionModels",
|
| 97 |
+
filename="label_encoder.pkl",
|
| 98 |
+
repo_type="model"
|
| 99 |
+
)
|
| 100 |
+
|
| 101 |
+
# Load artifacts
|
| 102 |
+
model = joblib.load(model_path)
|
| 103 |
+
scaler = joblib.load(scaler_path)
|
| 104 |
+
with open(encoder_path, 'rb') as f:
|
| 105 |
+
label_encoder = pickle.load(f)
|
| 106 |
+
|
| 107 |
+
# Prepare input data
|
| 108 |
+
input_data = {'sample': 'ΠΠΉΠ².', 'time_min': 5, 'temperature_c': 120, 'pressure_atm': 1.0, 'ph': 2.5}
|
| 109 |
+
|
| 110 |
+
# Create DataFrame
|
| 111 |
+
df = pd.DataFrame([input_data])
|
| 112 |
+
|
| 113 |
+
# Preprocess: encode sample type
|
| 114 |
+
df['sample_encoded'] = label_encoder.transform([input_data['sample']])[0]
|
| 115 |
+
|
| 116 |
+
# Create method_encoded feature
|
| 117 |
+
df['method_encoded'] = 1 if input_data['time_min'] <= 15 else 0
|
| 118 |
+
|
| 119 |
+
# Select features in correct order
|
| 120 |
+
features = ['time_min', 'temperature_c', 'pressure_atm', 'ph', 'sample_encoded', 'method_encoded']
|
| 121 |
+
X = df[features]
|
| 122 |
+
|
| 123 |
+
# Scale features
|
| 124 |
+
X_scaled = scaler.transform(X)
|
| 125 |
+
|
| 126 |
+
# Make prediction
|
| 127 |
+
predictions = model.predict(X_scaled)
|
| 128 |
+
|
| 129 |
+
# Create results dictionary
|
| 130 |
+
results = {}
|
| 131 |
+
for i, target in enumerate(['pectin_yield', 'galacturonic_acid', 'molecular_weight', 'esterification_degree']):
|
| 132 |
+
results[target] = predictions[0, i]
|
| 133 |
+
|
| 134 |
+
print("Prediction results:")
|
| 135 |
+
for target, value in results.items():
|
| 136 |
+
print(f" {target}: {value:.4f}")
|
| 137 |
+
```
|
| 138 |
+
|
| 139 |
+
### Batch Prediction from File
|
| 140 |
+
|
| 141 |
+
```python
|
| 142 |
+
import pandas as pd
|
| 143 |
+
from huggingface_hub import hf_hub_download
|
| 144 |
+
import joblib
|
| 145 |
+
import pickle
|
| 146 |
+
|
| 147 |
+
class PectinPredictor:
|
| 148 |
+
def __init__(self, repo_id="arabovs-ai-lab/PectinProductionModels"):
|
| 149 |
+
self.repo_id = repo_id
|
| 150 |
+
self.model = None
|
| 151 |
+
self.scaler = None
|
| 152 |
+
self.label_encoder = None
|
| 153 |
+
self.feature_columns = ['time_min', 'temperature_c', 'pressure_atm', 'ph', 'sample_encoded', 'method_encoded']
|
| 154 |
+
self.target_columns = ['pectin_yield', 'galacturonic_acid', 'molecular_weight', 'esterification_degree']
|
| 155 |
+
|
| 156 |
+
def load_from_hub(self):
|
| 157 |
+
"""Load model and artifacts from Hugging Face Hub."""
|
| 158 |
+
# Download model
|
| 159 |
+
model_path = hf_hub_download(
|
| 160 |
+
repo_id=self.repo_id,
|
| 161 |
+
filename="best_model/model.pkl",
|
| 162 |
+
repo_type="model"
|
| 163 |
+
)
|
| 164 |
+
self.model = joblib.load(model_path)
|
| 165 |
+
|
| 166 |
+
# Download scaler
|
| 167 |
+
scaler_path = hf_hub_download(
|
| 168 |
+
repo_id=self.repo_id,
|
| 169 |
+
filename="scaler.pkl",
|
| 170 |
+
repo_type="model"
|
| 171 |
+
)
|
| 172 |
+
self.scaler = joblib.load(scaler_path)
|
| 173 |
+
|
| 174 |
+
# Download label encoder
|
| 175 |
+
encoder_path = hf_hub_download(
|
| 176 |
+
repo_id=self.repo_id,
|
| 177 |
+
filename="label_encoder.pkl",
|
| 178 |
+
repo_type="model"
|
| 179 |
+
)
|
| 180 |
+
with open(encoder_path, 'rb') as f:
|
| 181 |
+
self.label_encoder = pickle.load(f)
|
| 182 |
+
|
| 183 |
+
def predict_batch(self, input_df):
|
| 184 |
+
"""Predict on batch data."""
|
| 185 |
+
# Preprocessing
|
| 186 |
+
processed_df = input_df.copy()
|
| 187 |
+
|
| 188 |
+
# Encode sample type
|
| 189 |
+
processed_df['sample_encoded'] = self.label_encoder.transform(processed_df['sample'])
|
| 190 |
+
|
| 191 |
+
# Create method_encoded
|
| 192 |
+
processed_df['method_encoded'] = np.where(processed_df['time_min'] <= 15, 1, 0)
|
| 193 |
+
|
| 194 |
+
# Select and scale features
|
| 195 |
+
X = processed_df[self.feature_columns]
|
| 196 |
+
X_scaled = self.scaler.transform(X)
|
| 197 |
+
|
| 198 |
+
# Predict
|
| 199 |
+
predictions = self.model.predict(X_scaled)
|
| 200 |
+
|
| 201 |
+
# Add predictions to results
|
| 202 |
+
result_df = input_df.copy()
|
| 203 |
+
for i, target in enumerate(self.target_columns):
|
| 204 |
+
result_df[f'predicted_{target}'] = predictions[:, i]
|
| 205 |
+
|
| 206 |
+
return result_df
|
| 207 |
+
|
| 208 |
+
# Usage
|
| 209 |
+
predictor = PectinPredictor()
|
| 210 |
+
predictor.load_from_hub()
|
| 211 |
+
|
| 212 |
+
# Load your data
|
| 213 |
+
# df = pd.read_excel("your_data.xlsx")
|
| 214 |
+
# results = predictor.predict_batch(df)
|
| 215 |
+
```
|
| 216 |
+
|
| 217 |
+
### Comparing Different Models
|
| 218 |
+
|
| 219 |
+
```python
|
| 220 |
+
from huggingface_hub import hf_hub_download
|
| 221 |
+
import joblib
|
| 222 |
+
|
| 223 |
+
def compare_models(input_data, repo_id="arabovs-ai-lab/PectinProductionModels"):
|
| 224 |
+
"""Compare predictions from different models."""
|
| 225 |
+
models_to_compare = [
|
| 226 |
+
"best_model/model.pkl",
|
| 227 |
+
"gradient_boosting/model.pkl",
|
| 228 |
+
"random_forest/model.pkl",
|
| 229 |
+
"xgboost/model.pkl"
|
| 230 |
+
]
|
| 231 |
+
|
| 232 |
+
results = {}
|
| 233 |
+
|
| 234 |
+
for model_path in models_to_compare:
|
| 235 |
+
model_name = model_path.split('/')[0]
|
| 236 |
+
|
| 237 |
+
# Download model
|
| 238 |
+
local_path = hf_hub_download(
|
| 239 |
+
repo_id=repo_id,
|
| 240 |
+
filename=model_path,
|
| 241 |
+
repo_type="model"
|
| 242 |
+
)
|
| 243 |
+
|
| 244 |
+
model = joblib.load(local_path)
|
| 245 |
+
|
| 246 |
+
# Make prediction (assuming preprocessed input)
|
| 247 |
+
# predictions = model.predict(preprocessed_input)
|
| 248 |
+
# results[model_name] = predictions
|
| 249 |
+
|
| 250 |
+
return results
|
| 251 |
+
```
|
| 252 |
+
|
| 253 |
+
|
| 254 |
+
## π Repository Structure
|
| 255 |
+
|
| 256 |
+
```
|
| 257 |
+
arabovs-ai-lab/PectinProductionModels/
|
| 258 |
+
βββ best_model/ # Best overall model (Gradient Boosting)
|
| 259 |
+
β βββ model.pkl # Serialized model file
|
| 260 |
+
β βββ metadata.json # Model metadata
|
| 261 |
+
βββ random_forest/ # Random Forest model
|
| 262 |
+
βββ gradient_boosting/ # Gradient Boosting model
|
| 263 |
+
βββ xgboost/ # XGBoost model
|
| 264 |
+
βββ extra_trees/ # Extra Trees model
|
| 265 |
+
βββ linear_regression/ # Linear Regression model
|
| 266 |
+
βββ ridge_regression/ # Ridge Regression model
|
| 267 |
+
βββ lasso_regression/ # Lasso Regression model
|
| 268 |
+
βββ support_vector_regression/ # SVR model
|
| 269 |
+
βββ k_neighbors/ # K-Neighbors model
|
| 270 |
+
βββ multilayer_perceptron/ # MLP model
|
| 271 |
+
βββ scaler.pkl # Feature scaler
|
| 272 |
+
βββ label_encoder.pkl # Label encoder for categories
|
| 273 |
+
βββ model_metadata.json # Training metadata
|
| 274 |
+
βββ models_metadata.json # All models metadata
|
| 275 |
+
βββ README.md # This file
|
| 276 |
+
```
|
| 277 |
+
|
| 278 |
+
## π§ͺ Training Information
|
| 279 |
+
|
| 280 |
+
- **Dataset**: 1000 experimental records
|
| 281 |
+
- **Features**: 6 process parameters
|
| 282 |
+
- **Targets**: 4 quality parameters
|
| 283 |
+
- **Validation**: 80/20 train-test split
|
| 284 |
+
- **Cross-validation**: 5-fold
|
| 285 |
+
- **Best Algorithm**: Gradient Boosting
|
| 286 |
+
|
| 287 |
+
## π‘ Key Features
|
| 288 |
+
|
| 289 |
+
- **Multi-target regression**: Predicts 4 parameters simultaneously
|
| 290 |
+
- **Process optimization**: Helps optimize pectin production conditions
|
| 291 |
+
- **Quality prediction**: Estimates pectin quality from process variables
|
| 292 |
+
- **Multiple algorithms**: 10 different ML algorithms for comparison
|
| 293 |
+
|
| 294 |
+
## π License
|
| 295 |
+
|
| 296 |
+
MIT License
|
| 297 |
+
|
| 298 |
+
---
|
| 299 |
+
|
| 300 |
+
*Last updated: 2025-11-21*
|
| 301 |
+
*Repository: https://huggingface.co/arabovs-ai-lab/PectinProductionModels*
|
| 302 |
+
|
| 303 |
+
## π References
|
| 304 |
+
|
| 305 |
+
- [Pectin Production Technology](https://en.wikipedia.org/wiki/Pectin)
|
| 306 |
+
- [Scikit-learn](https://scikit-learn.org/)
|
| 307 |
+
- [Hugging Face Hub](https://huggingface.co/docs/hub/)
|
| 308 |
+
|