Network Traffic Anomaly Detector

A real-time unsupervised anomaly detection system for network traffic using Isolation Forest. The model learns a baseline of "normal" network behavior and flags statistical outliers — such as ping floods, port scans, DDoS attacks, and data exfiltration — without requiring labeled attack data.

Model Details

Property	Value
Algorithm	Isolation Forest (scikit-learn)
Type	Unsupervised Anomaly Detection
Input	13 window-aggregated network traffic features
Output	Normal (1) or Outlier (-1) + decision score
Window Size	5 seconds
Training Data	1.45M packets across 399 time windows of normal traffic
Contamination	0.05 (5% expected outliers)

Features Used

The model operates on 13 features extracted from 5-second traffic windows:

Feature	Description
`packet_count`	Total packets in the window
`avg_packet_size`	Mean packet size (bytes)
`std_packet_size`	Standard deviation of packet sizes
`max_packet_size`	Largest packet in the window
`min_packet_size`	Smallest packet in the window
`total_bytes`	Total bytes transferred
`unique_src_ips`	Number of unique source IPs
`unique_dst_ips`	Number of unique destination IPs
`icmp_ratio`	Proportion of ICMP packets
`tcp_ratio`	Proportion of TCP packets
`udp_ratio`	Proportion of UDP packets
`packets_per_second`	Packet rate
`bytes_per_second`	Throughput rate

How It Works

Raw Packets (PCAP/CSV) → Feature Extraction (5s windows) → StandardScaler → Isolation Forest → Normal / Outlier

Pipeline

Capture — Record network traffic as PCAP using tcpdump or Wireshark
Extract — feature_extraction.py aggregates raw packets into 5-second windows with 13 statistical features
Train — train_anomaly_detector.py fits an Isolation Forest on the extracted features (assumes mostly normal traffic)
Detect — realtime_detector.py sniffs live traffic and classifies each 5-second window in real time

Quick Start

Installation

pip install -r requirements.txt

Option 1: Use the Pre-Trained Model

import joblib
import numpy as np

model = joblib.load("models/isolation_forest_model.pkl")
scaler = joblib.load("models/scaler.pkl")
feature_cols = joblib.load("models/feature_columns.pkl")

# Example: single window of traffic features
features = np.array([[500, 450.0, 300.0, 1470, 66, 225000, 3, 3, 0.0, 0.05, 0.95, 100.0, 45000.0]])
scaled = scaler.transform(features)
prediction = model.predict(scaled)       # 1 = Normal, -1 = Outlier
score = model.decision_function(scaled)  # More negative = more anomalous

Option 2: Real-Time Detection

# Requires root for packet capture
sudo python realtime_detector.py

Option 3: Train on Your Own Network

# 1. Capture traffic (replace en0 with your interface)
sudo tcpdump -i en0 -w my_traffic.pcap

# 2. Convert PCAP to CSV (using tshark)
tshark -r my_traffic.pcap -T fields -e frame.time_epoch -e ip.src -e ip.dst -e frame.len -e ip.proto > my_traffic.csv

# 3. Extract features
python feature_extraction.py

# 4. Train your own model
python train_anomaly_detector.py

# 5. Run detector
sudo python realtime_detector.py

What It Detects

Anomaly Type	Detection Signal	Tested
Ping flood (external)	High ICMP ratio + high packet count	Yes
DDoS / volume flood	Extreme packets_per_second and bytes_per_second	Yes
Port / network scan	Spike in unique_src_ips or unique_dst_ips	Expected
Data exfiltration	Unusual bytes_per_second to few IPs	Expected
DNS tunneling	High UDP ratio + abnormal packet patterns	Expected

Limitations

Unsupervised only — detects statistical deviations, not specific attack types
Volume-based — low-volume attacks (e.g., slow port scans, single malicious connections) may not trigger
No payload inspection — application-layer attacks (SQL injection, XSS) are invisible
Baseline-dependent — the model reflects the network it was trained on; retrain for your own environment for best results
Local LAN floods may not trigger if traffic volume resembles normal streaming

Files

models/
  isolation_forest_model.pkl   # Trained Isolation Forest model
  scaler.pkl                   # StandardScaler for feature normalization
  feature_columns.pkl          # Feature column names and order
src/
  feature_extraction.py        # Extract features from packet CSV
  train_anomaly_detector.py    # Train anomaly detection models
  realtime_detector.py         # Real-time packet sniffing and detection
requirements.txt               # Python dependencies

Training Details

Training data: 1.45M packets from diverse normal traffic (browsing, streaming, idle, downloads, mixed protocols)
Windows: 399 five-second windows
Models trained: Isolation Forest (primary), One-Class SVM, Local Outlier Factor
Validation: Real-time testing confirmed detection of external ping floods (scores -0.05 to -0.17) with minimal false positives on normal traffic

Citation

@misc{rajuamburu-network-anomaly-detector,
  author = {rajuamburu},
  title = {Network Traffic Anomaly Detector},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/rajuamburu/network-anomaly-detector}
}

License

MIT

Downloads last month: -