๐Ÿšข AIS Maritime Anomaly Detection Models

A comprehensive suite of unsupervised machine learning models for detecting anomalous vessel behavior in AIS (Automatic Identification System) data, specifically designed for oil spill detection and maritime safety applications.

๐ŸŽฏ Model Overview

This repository contains 5 trained unsupervised anomaly detection models optimized for maritime AIS data analysis:

Model Type Accuracy Anomaly Rate Best For Size
IsolationForest ๐ŸŒŸ Ensemble 100% contamination match 10.0% Recommended - Best overall 1.1MB
LocalOutlierFactor Density-based 100% contamination match 10.0% Local anomaly detection 104MB
OneClassSVM SVM-based 100% contamination match 10.0% Non-linear patterns 3.9MB
EllipticEnvelope Statistical 100% contamination match 10.0% Gaussian distributed data 3.6MB
DBSCAN Clustering N/A (parameter-free) 2.1% Cluster-based anomalies 39MB

๐Ÿ›ข๏ธ Oil Spill Detection Pipeline

These models are designed as Step 2A in a comprehensive oil spill detection system:

AIS Stream โ†’ Anomaly Detection โ†’ Sentinel-1 SAR Analysis โ†’ Oil Spill Alert

Key Anomaly Patterns Detected:

  • Stationary vessels during potential oil transfer operations
  • Unusual speed patterns (too fast/slow for vessel type)
  • Deep-draught vessels in unexpected locations
  • Course/heading inconsistencies indicating suspicious navigation
  • Loitering behavior in sensitive maritime areas

๐Ÿ“Š Performance Metrics

IsolationForest (Recommended Model):

  • Contamination Accuracy: 100.00%
  • Score Separation: 0.154 (excellent discrimination)
  • Silhouette Score: 0.415 (good clustering quality)
  • Statistical Significance: 6/6 features significant
  • Processing Speed: ~30 seconds for 358K records

Feature Discrimination Power:

  • Speed (SOG): 51.6% difference between normal/anomalous
  • Draught: 31.2% difference (deep vessels suspicious)
  • Width: 9.5% difference
  • Course Difference: High effect size (0.927)

๐Ÿš€ Quick Start

Installation

pip install numpy pandas scikit-learn joblib

Usage Example

import joblib
import numpy as np

# Load the recommended model
model_data = joblib.load('isolationforest_model.joblib')
model = model_data['model']
scaler = model_data['scaler']

# Prepare your AIS data with these features:
# ['sog', 'cog', 'heading', 'width', 'length', 'draught',
#  'navigationalstatus_encoded', 'shiptype_encoded', 
#  'speed_category', 'size_category', 'course_diff', 'aspect_ratio']

# Make predictions
features_scaled = scaler.transform(your_ais_features)
anomaly_scores = model.decision_function(features_scaled)
predictions = model.predict(features_scaled)

# Anomalies have prediction = -1, anomaly_score < 0
anomalous_vessels = features_scaled[predictions == -1]

๐ŸŒ Environmental Impact

This system contributes to:

  • Marine pollution prevention through early oil spill detection
  • Maritime safety via suspicious vessel identification
  • Environmental protection of sensitive marine areas
  • Regulatory compliance for maritime authorities

๐Ÿ“ Model Files

Available in both formats for maximum compatibility:

Pickle Format (.pkl) - Recommended for Hugging Face:

  • isolationforest_model.pkl (1.1MB) โญ
  • localoutlierfactor_model.pkl (104MB)
  • oneclasssvm_model.pkl (3.9MB)
  • ellipticenvelope_model.pkl (3.6MB)
  • dbscan_model.pkl (39MB)

Joblib Format (.joblib) - For sklearn compatibility:

  • All models also available as .joblib files

Features

Data Processing

  • Handles missing values in AIS data
  • Encodes categorical variables (navigational status, ship type)
  • Creates derived features:
    • Speed categories (stationary, slow, normal, fast)
    • Vessel size categories
    • Course difference (COG vs Heading)
    • Aspect ratio (length/width)

Anomaly Detection

  • Uses Isolation Forest algorithm (unsupervised learning)
  • Configurable contamination parameter (default: 10% expected anomalies)
  • Provides anomaly scores and binary predictions

Analysis & Visualization

  • Statistical comparison between normal and anomalous vessels
  • Ship type analysis
  • Multiple visualization plots:
    • Anomaly score distributions
    • Speed vs vessel length scatter plots
    • Ship type anomaly rates
    • Course vs heading patterns
    • Vessel dimensions analysis

Installation

  1. Make sure you have Python 3.7+ installed
  2. Install required packages:
cd /Users/lakshmikotaru/Documents/ais_isolation_forest
pip install -r requirements.txt

Usage

Training the Model

Run the main script to train the Isolation Forest model:

python ais_anomaly_detection.py

This will:

  • Load the AIS data from /Users/lakshmikotaru/Downloads/ais_data.csv
  • Preprocess the data and create features
  • Train the Isolation Forest model
  • Generate analysis and visualizations
  • Save the trained model and results

Using the Trained Model

Use the prediction script to detect anomalies in new data:

# Use with new data file
python predict_anomalies.py path/to/new_ais_data.csv

# Or run without arguments to use original data as example
python predict_anomalies.py

Model Configuration

You can adjust the model parameters in ais_anomaly_detection.py:

detector = AISAnomalyDetector(
    contamination=0.1,    # Expected fraction of anomalies (10%)
    random_state=42       # For reproducible results
)

Data Format

The AIS data should be a CSV file with the following columns:

  • mmsi: Maritime Mobile Service Identity
  • navigationalstatus: Current navigation status
  • sog: Speed Over Ground (knots)
  • cog: Course Over Ground (degrees)
  • heading: Vessel heading (degrees)
  • shiptype: Type of vessel
  • width: Vessel width (meters)
  • length: Vessel length (meters)
  • draught: Vessel draught (meters)

Output Files

ais_isolation_forest_model.joblib

The trained model file that can be loaded for future predictions.

detected_anomalies.csv

Detailed information about all detected anomalies, including:

  • Original vessel data
  • Anomaly scores
  • Binary anomaly flags

anomaly_analysis_plots.png

Comprehensive visualization showing:

  • Anomaly score distributions
  • Feature comparisons between normal and anomalous vessels
  • Ship type analysis
  • Various scatter plots and distributions

Interpretation

Anomaly Scores

  • Lower (more negative) scores indicate higher anomaly likelihood
  • Scores are relative to the training data distribution

Common Anomaly Types

The model may detect:

  • Vessels with unusual speed patterns
  • Ships with inconsistent course/heading relationships
  • Vessels with atypical dimensions for their type
  • Unusual combinations of vessel characteristics

Example Output

============================================================
AIS DATA ANOMALY DETECTION USING ISOLATION FOREST
============================================================
Loading data from /Users/lakshmikotaru/Downloads/ais_data.csv...
Loaded 358351 records with 9 columns

Training Isolation Forest model...
Model training completed!
Number of anomalies detected: 35835 out of 358351 samples
Anomaly rate: 10.00%

ANOMALY ANALYSIS SUMMARY
============================================================
Normal samples: 322516 (90.0%)
Anomalous samples: 35835 (10.0%)

Customization

Adding New Features

To add new derived features, modify the preprocess_data method in the AISAnomalyDetector class.

Changing Model Parameters

Adjust the IsolationForest parameters in the __init__ method:

  • n_estimators: Number of trees in the forest
  • contamination: Expected proportion of anomalies
  • max_samples: Number of samples to draw for each tree

Visualization

Modify the visualize_results method to add new plots or change existing ones.

Notes

  • The model is unsupervised, so it learns patterns without labeled anomalies
  • Results should be validated by domain experts
  • The contamination parameter significantly affects the number of detected anomalies
  • Missing values are handled automatically during preprocessing

Troubleshooting

  1. Import errors: Make sure all requirements are installed
  2. File not found: Check that the AIS data file path is correct
  3. Memory issues: For very large datasets, consider processing in chunks
  4. Plotting issues: Ensure matplotlib backend is properly configured

Contact

Generated for AIS Anomaly Detection Project - 2025-01-11

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using MeghanaK25/ais-isolation-forest 1