mboukabous's picture
first commit
4c91838
"""
isolation_forest.py
This module defines an Isolation Forest model for anomaly detection.
Isolation Forest is an efficient and effective algorithm for identifying
outliers in high-dimensional datasets.
Key Features:
- Utilizes a tree-based approach to isolate anomalies.
- Efficient for both large datasets and high-dimensional spaces.
- Automatically determines the expected proportion of anomalies.
Parameters:
- n_estimators (int): Number of base estimators in the ensemble.
- Default: 100.
- contamination (str or float): Expected proportion of outliers in the data.
- Default: 'auto' (automatically inferred based on dataset size).
- max_samples (int or float): Number of samples to draw for training each estimator.
- Default: 'auto' (uses min(256, number of samples)).
Default Configuration:
- n_estimators=100: Adequate for most datasets.
- contamination='auto': Automatically estimates the proportion of outliers.
"""
from sklearn.ensemble import IsolationForest
# Define the Isolation Forest estimator
estimator = IsolationForest(
n_estimators=100, # Default number of trees
contamination='auto', # Automatically estimates the contamination proportion
random_state=42 # Ensures reproducibility
)