arabovs-ai-lab
/

PectinProductionModels

@@ -38,7 +38,6 @@ This repository contains trained machine learning models for predicting pectin q
 | Support Vector Regression | support_vector_regression | 0.4832 | 6612.2360 | Machine learning model for pectin production |
 | XGBoost | xgboost | 0.9203 | 1074.2310 | XGBoost model with excellent performance on tabular data |
 ### Best Model Performance
 - **Average R²**: 0.9427
 - **Average MAE**: 868.44
@@ -47,32 +46,53 @@ This repository contains trained machine learning models for predicting pectin q
 ## 📊 Model Details
 ### Target Variables
-  - `pectin_yield`: Pectin yield (%)
-  - `galacturonic_acid`: Galacturonic acid content (%)
-  - `molecular_weight`: Molecular weight (Da)
-  - `esterification_degree`: Esterification degree (%)
 ### Feature Variables
-  - `time_min`
-  - `temperature_c`
-  - `pressure_atm`
-  - `ph`
-  - `sample_encoded`
-  - `method_encoded`
 ## 🚀 Quick Start
 ### Installation
 ```bash
-pip install transformers huggingface-hub scikit-learn xgboost pandas numpy joblib
 ```
 ### Basic Usage
-### Using the Best Model
 ```python
 from huggingface_hub import hf_hub_download
 import joblib
@@ -80,6 +100,9 @@ import pandas as pd
 import numpy as np
 import pickle
 # Download model and supporting files
 model_path = hf_hub_download(
     repo_id="arabovs-ai-lab/PectinProductionModels",
@@ -105,8 +128,14 @@ scaler = joblib.load(scaler_path)
 with open(encoder_path, 'rb') as f:
     label_encoder = pickle.load(f)
-# Prepare input data
-input_data = {'sample': 'Айв.', 'time_min': 5, 'temperature_c': 120, 'pressure_atm': 1.0, 'ph': 2.5}
 # Create DataFrame
 df = pd.DataFrame([input_data])
@@ -114,7 +143,7 @@ df = pd.DataFrame([input_data])
 # Preprocess: encode sample type
 df['sample_encoded'] = label_encoder.transform([input_data['sample']])[0]
-# Create method_encoded feature
 df['method_encoded'] = 1 if input_data['time_min'] <= 15 else 0
 # Select features in correct order
@@ -129,7 +158,8 @@ predictions = model.predict(X_scaled)
 # Create results dictionary
 results = {}
-for i, target in enumerate(['pectin_yield', 'galacturonic_acid', 'molecular_weight', 'esterification_degree']):
     results[target] = predictions[0, i]
 print("Prediction results:")
@@ -137,34 +167,124 @@ for target, value in results.items():
     print(f"  {target}: {value:.4f}")
 ```
-### Batch Prediction from File
 ```python
 import pandas as pd
 from huggingface_hub import hf_hub_download
 import joblib
 import pickle
 class PectinPredictor:
     def __init__(self, repo_id="arabovs-ai-lab/PectinProductionModels"):
         self.repo_id = repo_id
         self.model = None
         self.scaler = None
         self.label_encoder = None
         self.feature_columns = ['time_min', 'temperature_c', 'pressure_atm', 'ph', 'sample_encoded', 'method_encoded']
         self.target_columns = ['pectin_yield', 'galacturonic_acid', 'molecular_weight', 'esterification_degree']
-    def load_from_hub(self):
-        """Load model and artifacts from Hugging Face Hub."""
-        # Download model
         model_path = hf_hub_download(
             repo_id=self.repo_id,
-            filename="best_model/model.pkl",
             repo_type="model"
         )
-        self.model = joblib.load(model_path)
-        # Download scaler
         scaler_path = hf_hub_download(
             repo_id=self.repo_id,
             filename="scaler.pkl",
@@ -172,7 +292,7 @@ class PectinPredictor:
         )
         self.scaler = joblib.load(scaler_path)
-        # Download label encoder
         encoder_path = hf_hub_download(
             repo_id=self.repo_id,
             filename="label_encoder.pkl",
@@ -180,77 +300,280 @@ class PectinPredictor:
         )
         with open(encoder_path, 'rb') as f:
             self.label_encoder = pickle.load(f)
-    def predict_batch(self, input_df):
-        """Predict on batch data."""
-        # Preprocessing
         processed_df = input_df.copy()
-        # Encode sample type
         processed_df['sample_encoded'] = self.label_encoder.transform(processed_df['sample'])
-        # Create method_encoded
         processed_df['method_encoded'] = np.where(processed_df['time_min'] <= 15, 1, 0)
-        # Select and scale features
         X = processed_df[self.feature_columns]
         X_scaled = self.scaler.transform(X)
-        # Predict
         predictions = self.model.predict(X_scaled)
-        # Add predictions to results
         result_df = input_df.copy()
         for i, target in enumerate(self.target_columns):
             result_df[f'predicted_{target}'] = predictions[:, i]
         return result_df
-# Usage
-predictor = PectinPredictor()
-predictor.load_from_hub()
-# Load your data
-# df = pd.read_excel("your_data.xlsx")
-# results = predictor.predict_batch(df)
-```
-### Comparing Different Models
-```python
-from huggingface_hub import hf_hub_download
-import joblib
-def compare_models(input_data, repo_id="arabovs-ai-lab/PectinProductionModels"):
-    """Compare predictions from different models."""
-    models_to_compare = [
-        "best_model/model.pkl",
-        "gradient_boosting/model.pkl",
-        "random_forest/model.pkl",
-        "xgboost/model.pkl"
-    ]
-    results = {}
-    for model_path in models_to_compare:
-        model_name = model_path.split('/')[0]
-        # Download model
-        local_path = hf_hub_download(
-            repo_id=repo_id,
-            filename=model_path,
-            repo_type="model"
         )
-        model = joblib.load(local_path)
-        # Make prediction (assuming preprocessed input)
-        # predictions = model.predict(preprocessed_input)
-        # results[model_name] = predictions
-    return results
-```
 ## 📁 Repository Structure
@@ -270,7 +593,7 @@ arabovs-ai-lab/PectinProductionModels/
 ├── k_neighbors/                 # K-Neighbors model
 ├── multilayer_perceptron/       # MLP model
 ├── scaler.pkl                   # Feature scaler
-├── label_encoder.pkl            # Label encoder for categories
 ├── model_metadata.json          # Training metadata
 ├── models_metadata.json         # All models metadata
 └── README.md                    # This file
@@ -279,7 +602,7 @@ arabovs-ai-lab/PectinProductionModels/
 ## 🧪 Training Information
 - **Dataset**: 1000 experimental records
-- **Features**: 6 process parameters
 - **Targets**: 4 quality parameters
 - **Validation**: 80/20 train-test split
 - **Cross-validation**: 5-fold
@@ -287,10 +610,40 @@ arabovs-ai-lab/PectinProductionModels/
 ## 💡 Key Features
-- **Multi-target regression**: Predicts 4 parameters simultaneously
 - **Process optimization**: Helps optimize pectin production conditions
 - **Quality prediction**: Estimates pectin quality from process variables
 - **Multiple algorithms**: 10 different ML algorithms for comparison
 ## 📄 License
@@ -306,4 +659,4 @@ MIT License
 - [Pectin Production Technology](https://en.wikipedia.org/wiki/Pectin)
 - [Scikit-learn](https://scikit-learn.org/)
 - [Hugging Face Hub](https://huggingface.co/docs/hub/)

 | Support Vector Regression | support_vector_regression | 0.4832 | 6612.2360 | Machine learning model for pectin production |
 | XGBoost | xgboost | 0.9203 | 1074.2310 | XGBoost model with excellent performance on tabular data |
 ### Best Model Performance
 - **Average R²**: 0.9427
 - **Average MAE**: 868.44
 ## 📊 Model Details
 ### Target Variables
+- `pectin_yield`: Пектиновые вещества, ПВ, % - Pectin yield (%)
+- `galacturonic_acid`: Галактуроновая кислота, ГК, % - Galacturonic acid content (%)
+- `molecular_weight`: Молекулярная масса, Mw, Д - Molecular weight (Da)
+- `esterification_degree`: Степень этерификации, СЭ, % - Esterification degree (%)
 ### Feature Variables
+- `time_min`: Время процесса, t, мин - Extraction time (minutes)
+- `temperature_c`: Температура, T, °C - Temperature (°C)
+- `pressure_atm`: Давление, P, атм - Pressure (atm)
+- `ph`: Кислотность, pH - pH level
+- `sample_encoded`: Тип сырья - Raw material type (encoded)
+- `method_encoded`: Метод экстракции - Extraction method (encoded: 1 for fast ≤15 min, 0 for slow >15 min)
+**Note**: Parameter Т:Ж (соотношение твердое:жидкое) was excluded from model training because it had a constant value of 1:20 across all experiments and therefore carried no predictive information.
+## 📋 Experimental Data Examples
+### Sample Experimental Data
+| Exp | Sample | t, мин | T, °C | P, атм | pH | ПВ, % | ГК, % | Mw, Д | СЭ, % |
+|-----|--------|--------|-------|--------|-----|-------|-------|-------|-------|
+| 1 | ЯП(М) | 7 | 120 | 2.08 | 2.0 | 25.864 | 52.706 | 103773.64 | 71.17 |
+| 2 | ЯП(М) | 7 | 120 | 1.74 | 2.08 | 24.83 | 51.645 | 103098.49 | 70.015 |
+| 3 | Абр. | 5 | 130 | 2.09 | 1.74 | 14.755 | 67.55 | 127235.35 | 82.813 |
+| 4 | ЯП(М) | 7 | 120 | 2.05 | 2.0 | 26.353 | 53.804 | 105994.85 | 65.415 |
+### Raw Material Types
+| Code | Full Name | Type |
+|------|-----------|------|
+| Абр. | Абрикосовый (Apricot) | Fruit |
+| Рв. | Ревень (Rhubarb) | Vegetable |
+| Айв. | Айвы (Quince) | Fruit |
+| Ткв. | Тыквенный (Pumpkin) | Vegetable |
+| КрП | Корзинка подсолнечника (Sunflower head) | Plant |
+| ЯП(Ф) | Яблочный пектин Файзобод (Apple Faizobod) | Fruit |
+| ЯП(М) | Яблочный пектин Муминобод (Apple Muminobod) | Fruit |
 ## 🚀 Quick Start
 ### Installation
 ```bash
+pip install transformers huggingface-hub scikit-learn xgboost pandas numpy joblib tabulate
 ```
 ### Basic Usage
 ```python
 from huggingface_hub import hf_hub_download
 import joblib
 import numpy as np
 import pickle
+import warnings
+warnings.filterwarnings("ignore", category=UserWarning, module="sklearn")
 # Download model and supporting files
 model_path = hf_hub_download(
     repo_id="arabovs-ai-lab/PectinProductionModels",
 with open(encoder_path, 'rb') as f:
     label_encoder = pickle.load(f)
+# Prepare input data (Т:Ж parameter is not required as it was constant)
+input_data = {
+    'sample': 'Айв.',
+    'time_min': 5,
+    'temperature_c': 120,
+    'pressure_atm': 1.0,
+    'ph': 2.5
+}
 # Create DataFrame
 df = pd.DataFrame([input_data])
 # Preprocess: encode sample type
 df['sample_encoded'] = label_encoder.transform([input_data['sample']])[0]
+# Create method_encoded feature based on extraction time
 df['method_encoded'] = 1 if input_data['time_min'] <= 15 else 0
 # Select features in correct order
 # Create results dictionary
 results = {}
+target_names = ['pectin_yield', 'galacturonic_acid', 'molecular_weight', 'esterification_degree']
+for i, target in enumerate(target_names):
     results[target] = predictions[0, i]
 print("Prediction results:")
     print(f"  {target}: {value:.4f}")
 ```
+## 🔬 Advanced Model Comparison System
+For comprehensive comparison of all available models, use the `PectinPredictor` class:
 ```python
 import pandas as pd
+import numpy as np
 from huggingface_hub import hf_hub_download
 import joblib
 import pickle
+import warnings
+from sklearn.exceptions import InconsistentVersionWarning
+from tabulate import tabulate
+# Suppress sklearn version compatibility warnings
+warnings.filterwarnings("ignore", category=UserWarning, module="sklearn")
+warnings.filterwarnings("ignore", category=UserWarning, module="xgboost")
 class PectinPredictor:
+    """
+    A machine learning model for predicting pectin production parameters
+    from experimental conditions using pre-trained models from Hugging Face Hub.
+    """
+    # Available models with descriptions and metadata
+    AVAILABLE_MODELS = {
+        "best_model": {
+            "subfolder": "best_model",
+            "description": "🎯 Best overall model (Gradient Boosting) - optimal performance",
+            "color": "#FF6B6B"
+        },
+        "gradient_boosting": {
+            "subfolder": "gradient_boosting",
+            "description": "📈 Gradient Boosting - best for multi-task regression",
+            "color": "#4ECDC4"
+        },
+        "random_forest": {
+            "subfolder": "random_forest",
+            "description": "🌲 Random Forest - reliable and stable",
+            "color": "#45B7D1"
+        },
+        "xgboost": {
+            "subfolder": "xgboost",
+            "description": "⚡ XGBoost - high performance on tabular data",
+            "color": "#96CEB4"
+        },
+        "linear_regression": {
+            "subfolder": "linear_regression",
+            "description": "📊 Linear Regression - basic linear model",
+            "color": "#FECA57"
+        },
+        "extra_trees": {
+            "subfolder": "extra_trees",
+            "description": "🌳 Extra Trees - extreme random forests",
+            "color": "#FF9FF3"
+        },
+        "k_neighbors": {
+            "subfolder": "k-neighbors",
+            "description": "📏 K-Neighbors - nearest neighbors method",
+            "color": "#54A0FF"
+        },
+        "lasso_regression": {
+            "subfolder": "lasso_regression",
+            "description": "🎯 Lasso Regression - L1 regularization",
+            "color": "#5F27CD"
+        },
+        "multilayer_perceptron": {
+            "subfolder": "multilayer_perceptron",
+            "description": "🧠 Neural Network MLP - multilayer perceptron",
+            "color": "#00D2D3"
+        },
+        "ridge_regression": {
+            "subfolder": "ridge_regression",
+            "description": "🏔️ Ridge Regression - L2 regularization",
+            "color": "#FF9F43"
+        },
+        "support_vector_regression": {
+            "subfolder": "support_vector_regression",
+            "description": "🔗 Support Vector Regression - support vector method",
+            "color": "#A3CB38"
+        }
+    }
     def __init__(self, repo_id="arabovs-ai-lab/PectinProductionModels"):
+        """Initialize the predictor with model repository ID."""
         self.repo_id = repo_id
         self.model = None
         self.scaler = None
         self.label_encoder = None
+        # Model input features (after preprocessing)
         self.feature_columns = ['time_min', 'temperature_c', 'pressure_atm', 'ph', 'sample_encoded', 'method_encoded']
+        # Model output targets (pectin characteristics)
         self.target_columns = ['pectin_yield', 'galacturonic_acid', 'molecular_weight', 'esterification_degree']
+    def load_from_hub(self, model_type="best_model"):
+        """
+        Load model, scaler, and label encoder from Hugging Face Hub repository.
+        Args:
+            model_type: Key from AVAILABLE_MODELS to load specific model
+        """
+        if model_type not in self.AVAILABLE_MODELS:
+            raise ValueError(f"Model type '{model_type}' not found. Available: {list(self.AVAILABLE_MODELS.keys())}")
+        model_info = self.AVAILABLE_MODELS[model_type]
+        # Download and load the specified model
         model_path = hf_hub_download(
             repo_id=self.repo_id,
+            filename=f"{model_info['subfolder']}/model.pkl",
             repo_type="model"
         )
+        with warnings.catch_warnings():
+            warnings.filterwarnings("ignore", category=UserWarning)
+            self.model = joblib.load(model_path)
+        # Download and load the feature scaler for data normalization
         scaler_path = hf_hub_download(
             repo_id=self.repo_id,
             filename="scaler.pkl",
         )
         self.scaler = joblib.load(scaler_path)
+        # Download and load the label encoder for sample type conversion
         encoder_path = hf_hub_download(
             repo_id=self.repo_id,
             filename="label_encoder.pkl",
         )
         with open(encoder_path, 'rb') as f:
             self.label_encoder = pickle.load(f)
+    def prepare_dataframe(self, df):
+        """
+        Rename DataFrame columns from Russian to English to match model expectations.
+        """
+        column_mapping = {
+            'Образец \nпектина': 'sample',
+            't, мин': 'time_min',
+            'T, °C': 'temperature_c',
+            'P, атм': 'pressure_atm',
+            'pH': 'ph'
+        }
+        return df.rename(columns=column_mapping)
+    def preprocess_input(self, input_df):
+        """
+        Preprocess input data for model prediction.
+        Applies feature engineering, encoding, and scaling.
+        """
         processed_df = input_df.copy()
+        # Convert sample names to numeric codes using trained label encoder
         processed_df['sample_encoded'] = self.label_encoder.transform(processed_df['sample'])
+        # Create binary feature indicating extraction method based on time
         processed_df['method_encoded'] = np.where(processed_df['time_min'] <= 15, 1, 0)
+        # Select features in correct order and apply scaling
         X = processed_df[self.feature_columns]
         X_scaled = self.scaler.transform(X)
+        return X_scaled
+    def predict_batch(self, input_df, model_type="best_model"):
+        """
+        Generate predictions for multiple experimental conditions.
+        Args:
+            input_df: DataFrame containing experimental parameters
+            model_type: Which model to use for prediction
+        Returns:
+            Original DataFrame augmented with prediction columns
+        """
+        # Load specified model if not already loaded or different from current
+        if self.model is None or model_type != getattr(self, '_current_model', None):
+            self.load_from_hub(model_type)
+            self._current_model = model_type
+        # Preprocess input data
+        X_scaled = self.preprocess_input(input_df)
+        # Generate predictions using the trained model
         predictions = self.model.predict(X_scaled)
+        # Combine original data with predictions
         result_df = input_df.copy()
         for i, target in enumerate(self.target_columns):
             result_df[f'predicted_{target}'] = predictions[:, i]
         return result_df
+    def compare_all_models(self, input_data):
+        """
+        Compare predictions from ALL available machine learning models.
+        Args:
+            input_data: DataFrame or dictionary with input features
+        Returns:
+            DataFrame with predictions from each model for easy comparison
+        """
+        # Convert single input to DataFrame if needed
+        if isinstance(input_data, dict):
+            input_df = pd.DataFrame([input_data])
+        else:
+            input_df = input_data.copy()
+        # Preprocess input data once for all models
+        X_scaled = self.preprocess_input(input_df)
+        comparison_results = []
+        for model_name, model_info in self.AVAILABLE_MODELS.items():
+            try:
+                # Download and load model
+                model_path = hf_hub_download(
+                    repo_id=self.repo_id,
+                    filename=f"{model_info['subfolder']}/model.pkl",
+                    repo_type="model"
+                )
+                # Load model with suppressed warnings
+                with warnings.catch_warnings():
+                    warnings.filterwarnings("ignore", category=UserWarning)
+                    model = joblib.load(model_path)
+                # Generate predictions
+                predictions = model.predict(X_scaled)
+                # Extract predictions for this sample
+                result = {
+                    'model': model_name,
+                    'description': model_info['description']
+                }
+                for i, target in enumerate(self.target_columns):
+                    if len(predictions.shape) > 1:
+                        result[target] = predictions[0, i]
+                    else:
+                        result[target] = predictions[i]
+                comparison_results.append(result)
+            except Exception as e:
+                print(f"⚠️  Could not load model {model_name}: {e}")
+                continue
+        return pd.DataFrame(comparison_results)
+    def create_comparison_tables(self, comparison_df):
+        """
+        Create formatted comparison tables for easy analysis.
+        Args:
+            comparison_df: DataFrame from compare_all_models()
+        Returns:
+            Dictionary with different formatted tables
+        """
+        tables = {}
+        # Table 1: Detailed comparison with all metrics
+        detailed_table = comparison_df.copy()
+        detailed_table = detailed_table.round(4)
+        tables['detailed'] = tabulate(
+            detailed_table,
+            headers='keys',
+            tablefmt='grid',
+            showindex=False
         )
+        # Table 2: Summary statistics
+        summary_data = []
+        for target in self.target_columns:
+            values = comparison_df[target]
+            summary_data.append({
+                'Target': target,
+                'Mean': values.mean(),
+                'Std': values.std(),
+                'Min': values.min(),
+                'Max': values.max(),
+                'Range': values.max() - values.min()
+            })
+        summary_df = pd.DataFrame(summary_data).round(4)
+        tables['summary'] = tabulate(
+            summary_df,
+            headers='keys',
+            tablefmt='grid',
+            showindex=False
+        )
+        # Table 3: Ranked by pectin yield (most important metric)
+        ranked_df = comparison_df.sort_values('pectin_yield', ascending=False).round(4)
+        tables['ranked'] = tabulate(
+            ranked_df,
+            headers='keys',
+            tablefmt='grid',
+            showindex=False
+        )
+        return tables
+    def calculate_prediction_metrics(self, df_with_predictions):
+        """
+        Calculate basic metrics to evaluate prediction quality against actual values.
+        """
+        metrics = {}
+        for target in self.target_columns:
+            actual_col = None
+            # Find the actual value column
+            if target == 'pectin_yield':
+                actual_col = 'ПВ, %'
+            elif target == 'galacturonic_acid':
+                actual_col = 'ГК, %'
+            elif target == 'molecular_weight':
+                actual_col = 'Mw, Д'
+            elif target == 'esterification_degree':
+                actual_col = 'СЭ, %'
+            if actual_col and actual_col in df_with_predictions.columns:
+                actual = df_with_predictions[actual_col]
+                predicted = df_with_predictions[f'predicted_{target}']
+                # Calculate metrics
+                rmse = np.sqrt(np.mean((actual - predicted) ** 2))
+                mae = np.mean(np.abs(actual - predicted))
+                metrics[target] = {
+                    'RMSE': rmse,
+                    'MAE': mae,
+                    'correlation': np.corrcoef(actual, predicted)[0, 1]
+                }
+        return metrics
+# Example usage
+if __name__ == "__main__":
+    # Initialize predictor
+    predictor = PectinPredictor()
+    # Load experimental data
+    df = pd.read_excel("/content/ShortExperiments_DataSet.xlsx")
+    df_renamed = predictor.prepare_dataframe(df)
+    print("🔬 PECTIN PRODUCTION MODEL COMPARISON SYSTEM")
+    print("=" * 60)
+    # 1. Batch prediction with best model
+    print("\n1. BATCH PREDICTIONS WITH BEST MODEL:")
+    print("-" * 40)
+    results = predictor.predict_batch(df_renamed, model_type="best_model")
+    print(f"✅ Processed {len(results)} experiments")
+    # Calculate prediction quality metrics
+    metrics = predictor.calculate_prediction_metrics(results)
+    print("\n📊 PREDICTION QUALITY METRICS:")
+    for target, metric in metrics.items():
+        print(f"   {target}:")
+        print(f"     RMSE: {metric['RMSE']:.4f}")
+        print(f"     MAE: {metric['MAE']:.4f}")
+        print(f"     Correlation: {metric['correlation']:.4f}")
+    # 2. Compare all models for a single experiment
+    print("\n2. COMPARING ALL MODELS FOR SINGLE EXPERIMENT:")
+    print("-" * 50)
+    single_experiment = {
+        'sample': 'ЯП(М)',
+        'time_min': 7,
+        'temperature_c': 120,
+        'pressure_atm': 2.08,
+        'ph': 2.0
+    }
+    print(f"🔍 Input parameters: {single_experiment}")
+    # Compare all models
+    comparison_df = predictor.compare_all_models(single_experiment)
+    # Create and display comparison tables
+    tables = predictor.create_comparison_tables(comparison_df)
+    print("\n📋 DETAILED MODEL COMPARISON:")
+    print(tables['detailed'])
+    print("\n📈 PREDICTION SUMMARY STATISTICS:")
+    print(tables['summary'])
+    print("\n🏆 MODELS RANKED BY PECTIN YIELD:")
+    print(tables['ranked'])
+    # 3. Show available models
+    print("\n3. AVAILABLE MODELS:")
+    print("-" * 20)
+    for model_name, info in predictor.AVAILABLE_MODELS.items():
+        print(f"   • {model_name}: {info['description']}")
+    print(f"\n🎯 Total models available: {len(predictor.AVAILABLE_MODELS)}")
+    print(f"✅ Successfully loaded: {len(comparison_df)}")
+```
 ## 📁 Repository Structure
 ├── k_neighbors/                 # K-Neighbors model
 ├── multilayer_perceptron/       # MLP model
 ├── scaler.pkl                   # Feature scaler
+├── label_encoder.pkl            # Label encoder for sample types
 ├── model_metadata.json          # Training metadata
 ├── models_metadata.json         # All models metadata
 └── README.md                    # This file
 ## 🧪 Training Information
 - **Dataset**: 1000 experimental records
+- **Features**: 6 process parameters (excluding constant Т:Ж parameter)
 - **Targets**: 4 quality parameters
 - **Validation**: 80/20 train-test split
 - **Cross-validation**: 5-fold
 ## 💡 Key Features
+- **Multi-target regression**: Predicts 4 pectin quality parameters simultaneously
 - **Process optimization**: Helps optimize pectin production conditions
 - **Quality prediction**: Estimates pectin quality from process variables
 - **Multiple algorithms**: 10 different ML algorithms for comparison
+- **Industrial focus**: Specifically designed for pectin production technology
+## ⚠️ Important Notes
+### Data Requirements:
+- **Supported samples**: 7 types as listed above
+- **Parameter ranges**:
+  - Time: 5-180 minutes
+  - Temperature: 60-160°C
+  - Pressure: 1.0-5.0 atm
+  - pH: 1.5-4.0
+### Limitations:
+- Models trained on specific raw materials listed above
+- Accuracy may decrease outside trained parameter ranges
+- Retraining required for new types of raw materials
+## 📜 Citation
+If you use this model in your research, please cite it as:
+```bibtex
+@misc{PectinProductionModels2025,
+  title = {Pectin Production Models: Machine Learning for Predicting Pectin Quality Parameters},
+  author = {Arabovs AI Lab},
+  year = {2025},
+  publisher = {Hugging Face},
+  url = {https://huggingface.co/arabovs-ai-lab/PectinProductionModels}
+}
+```
 ## 📄 License
 - [Pectin Production Technology](https://en.wikipedia.org/wiki/Pectin)
 - [Scikit-learn](https://scikit-learn.org/)
 - [Hugging Face Hub](https://huggingface.co/docs/hub/)
+```