mboukabous's picture
first commit
4c91838

A newer version of the Gradio SDK is available: 6.6.0

Upgrade

Anomaly (Outlier) Detection Models

This directory hosts scripts defining anomaly detection estimators (e.g., Isolation Forest, One-Class SVM, etc.) for use with train_anomaly_detection.py. Each file specifies a scikit-learn–compatible outlier detector and, if applicable, a parameter grid.

Key Points:

  • Estimator: Must allow .fit(X) and .predict(X) or similar. Typically returns +1 / −1 for inliers / outliers (we unify to 0 / 1).
  • Parameter Grid: You can define hyperparameters (like n_estimators, contamination) for potential searching.
  • Default Approach: We do not rely on labeled anomalies (unsupervised). The script will produce a predictions CSV with 0 = normal, 1 = outlier.

Note: The main script train_anomaly_detection.py handles data loading, label encoding, dropping/selecting columns, the .fit(X), .predict(X) steps, saving the outlier predictions, and (optionally) a 2D plot with outliers in red.

Available Anomaly Detection Models

Usage

For example, to detect outliers with an Isolation Forest:

python scripts/train_anomaly_detection.py \
  --model_module isolation_forest \
  --data_path data/breast_cancer/data.csv \
  --drop_columns "id,diagnosis" \
  --visualize

This:

  1. Loads isolation_forest.py, sets up IsolationForest(...).
  2. Fits the model to the data, saves it, then predict(...).
  3. Saves a predictions.csv with OutlierPrediction.
  4. If --visualize, does a 2D PCA scatter, coloring outliers red.