Spaces:
Sleeping
Sleeping
| # Anomaly (Outlier) Detection Models | |
| This directory hosts scripts defining **anomaly detection** estimators (e.g., Isolation Forest, One-Class SVM, etc.) for use with `train_anomaly_detection.py`. Each file specifies a scikit-learn–compatible outlier detector and, if applicable, a parameter grid. | |
| **Key Points**: | |
| - **Estimator**: Must allow `.fit(X)` and `.predict(X)` or similar. Typically returns +1 / −1 for inliers / outliers (we unify to 0 / 1). | |
| - **Parameter Grid**: You can define hyperparameters (like `n_estimators`, `contamination`) for potential searching. | |
| - **Default Approach**: We do not rely on labeled anomalies (unsupervised). The script will produce a predictions CSV with 0 = normal, 1 = outlier. | |
| **Note**: The main script `train_anomaly_detection.py` handles data loading, label encoding, dropping/selecting columns, the `.fit(X)`, `.predict(X)` steps, saving the outlier predictions, and (optionally) a 2D plot with outliers in red. | |
| ## Available Anomaly Detection Models | |
| - [Isolation Forest](isolation_forest.py) | |
| - [One-Class SVM](one_class_svm.py) | |
| - [Local Outlier Factor (LOF)](local_outlier_factor.py) | |
| ### Usage | |
| For example, to detect outliers with an Isolation Forest: | |
| ```bash | |
| python scripts/train_anomaly_detection.py \ | |
| --model_module isolation_forest \ | |
| --data_path data/breast_cancer/data.csv \ | |
| --drop_columns "id,diagnosis" \ | |
| --visualize | |
| ``` | |
| This: | |
| 1. Loads `isolation_forest.py`, sets up `IsolationForest(...)`. | |
| 2. Fits the model to the data, saves it, then `predict(...)`. | |
| 3. Saves a `predictions.csv` with `OutlierPrediction`. | |
| 4. If `--visualize`, does a 2D PCA scatter, coloring outliers red. | |