Network Traffic Anomaly Detector
A real-time unsupervised anomaly detection system for network traffic using Isolation Forest. The model learns a baseline of "normal" network behavior and flags statistical outliers β such as ping floods, port scans, DDoS attacks, and data exfiltration β without requiring labeled attack data.
Model Details
| Property | Value |
|---|---|
| Algorithm | Isolation Forest (scikit-learn) |
| Type | Unsupervised Anomaly Detection |
| Input | 13 window-aggregated network traffic features |
| Output | Normal (1) or Outlier (-1) + decision score |
| Window Size | 5 seconds |
| Training Data | 1.45M packets across 399 time windows of normal traffic |
| Contamination | 0.05 (5% expected outliers) |
Features Used
The model operates on 13 features extracted from 5-second traffic windows:
| Feature | Description |
|---|---|
packet_count |
Total packets in the window |
avg_packet_size |
Mean packet size (bytes) |
std_packet_size |
Standard deviation of packet sizes |
max_packet_size |
Largest packet in the window |
min_packet_size |
Smallest packet in the window |
total_bytes |
Total bytes transferred |
unique_src_ips |
Number of unique source IPs |
unique_dst_ips |
Number of unique destination IPs |
icmp_ratio |
Proportion of ICMP packets |
tcp_ratio |
Proportion of TCP packets |
udp_ratio |
Proportion of UDP packets |
packets_per_second |
Packet rate |
bytes_per_second |
Throughput rate |
How It Works
Raw Packets (PCAP/CSV) β Feature Extraction (5s windows) β StandardScaler β Isolation Forest β Normal / Outlier
Pipeline
- Capture β Record network traffic as PCAP using
tcpdumpor Wireshark - Extract β
feature_extraction.pyaggregates raw packets into 5-second windows with 13 statistical features - Train β
train_anomaly_detector.pyfits an Isolation Forest on the extracted features (assumes mostly normal traffic) - Detect β
realtime_detector.pysniffs live traffic and classifies each 5-second window in real time
Quick Start
Installation
pip install -r requirements.txt
Option 1: Use the Pre-Trained Model
import joblib
import numpy as np
model = joblib.load("models/isolation_forest_model.pkl")
scaler = joblib.load("models/scaler.pkl")
feature_cols = joblib.load("models/feature_columns.pkl")
# Example: single window of traffic features
features = np.array([[500, 450.0, 300.0, 1470, 66, 225000, 3, 3, 0.0, 0.05, 0.95, 100.0, 45000.0]])
scaled = scaler.transform(features)
prediction = model.predict(scaled) # 1 = Normal, -1 = Outlier
score = model.decision_function(scaled) # More negative = more anomalous
Option 2: Real-Time Detection
# Requires root for packet capture
sudo python realtime_detector.py
Option 3: Train on Your Own Network
# 1. Capture traffic (replace en0 with your interface)
sudo tcpdump -i en0 -w my_traffic.pcap
# 2. Convert PCAP to CSV (using tshark)
tshark -r my_traffic.pcap -T fields -e frame.time_epoch -e ip.src -e ip.dst -e frame.len -e ip.proto > my_traffic.csv
# 3. Extract features
python feature_extraction.py
# 4. Train your own model
python train_anomaly_detector.py
# 5. Run detector
sudo python realtime_detector.py
What It Detects
| Anomaly Type | Detection Signal | Tested |
|---|---|---|
| Ping flood (external) | High ICMP ratio + high packet count | Yes |
| DDoS / volume flood | Extreme packets_per_second and bytes_per_second | Yes |
| Port / network scan | Spike in unique_src_ips or unique_dst_ips | Expected |
| Data exfiltration | Unusual bytes_per_second to few IPs | Expected |
| DNS tunneling | High UDP ratio + abnormal packet patterns | Expected |
Limitations
- Unsupervised only β detects statistical deviations, not specific attack types
- Volume-based β low-volume attacks (e.g., slow port scans, single malicious connections) may not trigger
- No payload inspection β application-layer attacks (SQL injection, XSS) are invisible
- Baseline-dependent β the model reflects the network it was trained on; retrain for your own environment for best results
- Local LAN floods may not trigger if traffic volume resembles normal streaming
Files
models/
isolation_forest_model.pkl # Trained Isolation Forest model
scaler.pkl # StandardScaler for feature normalization
feature_columns.pkl # Feature column names and order
src/
feature_extraction.py # Extract features from packet CSV
train_anomaly_detector.py # Train anomaly detection models
realtime_detector.py # Real-time packet sniffing and detection
requirements.txt # Python dependencies
Training Details
- Training data: 1.45M packets from diverse normal traffic (browsing, streaming, idle, downloads, mixed protocols)
- Windows: 399 five-second windows
- Models trained: Isolation Forest (primary), One-Class SVM, Local Outlier Factor
- Validation: Real-time testing confirmed detection of external ping floods (scores -0.05 to -0.17) with minimal false positives on normal traffic
Citation
@misc{rajuamburu-network-anomaly-detector,
author = {rajuamburu},
title = {Network Traffic Anomaly Detector},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/rajuamburu/network-anomaly-detector}
}
License
MIT
- Downloads last month
- -