mboukabous's picture
Add application file
7c045bd

A newer version of the Gradio SDK is available: 6.5.0

Upgrade

# Scripts

This directory contains executable scripts for training, testing, and other tasks related to model development and evaluation.

Contents

train_regression_model.py

A script for training supervised learning regression models using scikit-learn. It handles data loading, preprocessing, optional log transformation, hyperparameter tuning, model evaluation, and saving of models, metrics, and visualizations.

Features

  • Supports various regression models defined in models/supervised/regression.
  • Performs hyperparameter tuning using grid search cross-validation.
  • Saves trained models and evaluation metrics.
  • Generates visualizations if specified.

Usage

python train_regression_model.py --model_module MODEL_MODULE \
    --data_path DATA_PATH/DATA_NAME.csv \
    --target_variable TARGET_VARIABLE [OPTIONS]
  • Required Arguments:

  • model_module: Name of the regression model module to import (e.g., linear_regression).

  • data_path: Path to the dataset directory, including the data file name.

  • target_variable: Name of the target variable.

  • Optional Arguments:

  • test_size: Proportion of the dataset to include in the test split (default: 0.2).

  • random_state: Random seed for reproducibility (default: 42).

  • log_transform: Apply log transformation to the target variable (regression only).

  • cv_folds: Number of cross-validation folds (default: 5).

  • scoring_metric: Scoring metric for model evaluation.

  • model_path: Path to save the trained model.

  • results_path: Path to save results and metrics.

  • visualize: Generate and save visualizations.

  • drop_columns: Comma-separated column names to drop from the dataset.

Usage Example

python train_regression_model.py --model_module linear_regression \
    --data_path data/house_prices/train.csv \
    --target_variable SalePrice --drop_columns Id \
    --log_transform --visualize

train_classification_model.py

A script for training supervised learning classification models using scikit-learn. It handles data loading, preprocessing, hyperparameter tuning (via grid search CV), model evaluation using classification metrics, and saving of models, metrics, and visualizations.

Features

  • Supports various classification models defined in models/supervised/classification.
  • Performs hyperparameter tuning using grid search cross-validation (via classification_hyperparameter_tuning).
  • Saves trained models and evaluation metrics (accuracy, precision, recall, F1).
  • If visualize is enabled, it generates a metrics bar chart and a confusion matrix plot.

Usage

python train_classification_model.py --model_module MODEL_MODULE \
    --data_path DATA_PATH/DATA_NAME.csv \
    --target_variable TARGET_VARIABLE [OPTIONS]
  • Required Arguments:

  • model_module: Name of the classification model module to import (e.g., logistic_regression).

  • data_path: Path to the dataset directory, including the data file name.

  • target_variable: Name of the target variable (categorical).

  • Optional Arguments:

  • test_size: Proportion of the dataset to include in the test split (default: 0.2).

  • random_state: Random seed for reproducibility (default: 42).

  • cv_folds: Number of cross-validation folds (default: 5).

  • scoring_metric: Scoring metric for model evaluation (e.g., accuracy, f1, roc_auc).

  • model_path: Path to save the trained model.

  • results_path: Path to save results and metrics.

  • visualize: Generate and save visualizations.

  • drop_columns: Comma-separated column names to drop from the dataset.

Usage Example

python train_classification_model.py --model_module logistic_regression \
    --data_path data/adult_income/train.csv \
    --target_variable income_bracket \
    --scoring_metric accuracy --visualize