Network Traffic Anomaly Detector

A real-time unsupervised anomaly detection system for network traffic using Isolation Forest. The model learns a baseline of "normal" network behavior and flags statistical outliers β€” such as ping floods, port scans, DDoS attacks, and data exfiltration β€” without requiring labeled attack data.

Model Details

Property Value
Algorithm Isolation Forest (scikit-learn)
Type Unsupervised Anomaly Detection
Input 13 window-aggregated network traffic features
Output Normal (1) or Outlier (-1) + decision score
Window Size 5 seconds
Training Data 1.45M packets across 399 time windows of normal traffic
Contamination 0.05 (5% expected outliers)

Features Used

The model operates on 13 features extracted from 5-second traffic windows:

Feature Description
packet_count Total packets in the window
avg_packet_size Mean packet size (bytes)
std_packet_size Standard deviation of packet sizes
max_packet_size Largest packet in the window
min_packet_size Smallest packet in the window
total_bytes Total bytes transferred
unique_src_ips Number of unique source IPs
unique_dst_ips Number of unique destination IPs
icmp_ratio Proportion of ICMP packets
tcp_ratio Proportion of TCP packets
udp_ratio Proportion of UDP packets
packets_per_second Packet rate
bytes_per_second Throughput rate

How It Works

Raw Packets (PCAP/CSV) β†’ Feature Extraction (5s windows) β†’ StandardScaler β†’ Isolation Forest β†’ Normal / Outlier

Pipeline

  1. Capture β€” Record network traffic as PCAP using tcpdump or Wireshark
  2. Extract β€” feature_extraction.py aggregates raw packets into 5-second windows with 13 statistical features
  3. Train β€” train_anomaly_detector.py fits an Isolation Forest on the extracted features (assumes mostly normal traffic)
  4. Detect β€” realtime_detector.py sniffs live traffic and classifies each 5-second window in real time

Quick Start

Installation

pip install -r requirements.txt

Option 1: Use the Pre-Trained Model

import joblib
import numpy as np

model = joblib.load("models/isolation_forest_model.pkl")
scaler = joblib.load("models/scaler.pkl")
feature_cols = joblib.load("models/feature_columns.pkl")

# Example: single window of traffic features
features = np.array([[500, 450.0, 300.0, 1470, 66, 225000, 3, 3, 0.0, 0.05, 0.95, 100.0, 45000.0]])
scaled = scaler.transform(features)
prediction = model.predict(scaled)       # 1 = Normal, -1 = Outlier
score = model.decision_function(scaled)  # More negative = more anomalous

Option 2: Real-Time Detection

# Requires root for packet capture
sudo python realtime_detector.py

Option 3: Train on Your Own Network

# 1. Capture traffic (replace en0 with your interface)
sudo tcpdump -i en0 -w my_traffic.pcap

# 2. Convert PCAP to CSV (using tshark)
tshark -r my_traffic.pcap -T fields -e frame.time_epoch -e ip.src -e ip.dst -e frame.len -e ip.proto > my_traffic.csv

# 3. Extract features
python feature_extraction.py

# 4. Train your own model
python train_anomaly_detector.py

# 5. Run detector
sudo python realtime_detector.py

What It Detects

Anomaly Type Detection Signal Tested
Ping flood (external) High ICMP ratio + high packet count Yes
DDoS / volume flood Extreme packets_per_second and bytes_per_second Yes
Port / network scan Spike in unique_src_ips or unique_dst_ips Expected
Data exfiltration Unusual bytes_per_second to few IPs Expected
DNS tunneling High UDP ratio + abnormal packet patterns Expected

Limitations

  • Unsupervised only β€” detects statistical deviations, not specific attack types
  • Volume-based β€” low-volume attacks (e.g., slow port scans, single malicious connections) may not trigger
  • No payload inspection β€” application-layer attacks (SQL injection, XSS) are invisible
  • Baseline-dependent β€” the model reflects the network it was trained on; retrain for your own environment for best results
  • Local LAN floods may not trigger if traffic volume resembles normal streaming

Files

models/
  isolation_forest_model.pkl   # Trained Isolation Forest model
  scaler.pkl                   # StandardScaler for feature normalization
  feature_columns.pkl          # Feature column names and order
src/
  feature_extraction.py        # Extract features from packet CSV
  train_anomaly_detector.py    # Train anomaly detection models
  realtime_detector.py         # Real-time packet sniffing and detection
requirements.txt               # Python dependencies

Training Details

  • Training data: 1.45M packets from diverse normal traffic (browsing, streaming, idle, downloads, mixed protocols)
  • Windows: 399 five-second windows
  • Models trained: Isolation Forest (primary), One-Class SVM, Local Outlier Factor
  • Validation: Real-time testing confirmed detection of external ping floods (scores -0.05 to -0.17) with minimal false positives on normal traffic

Citation

@misc{rajuamburu-network-anomaly-detector,
  author = {rajuamburu},
  title = {Network Traffic Anomaly Detector},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/rajuamburu/network-anomaly-detector}
}

License

MIT

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support