Isolation Forest — eBPF Kubernetes Attack Detection

Pre-trained Isolation Forest models from the paper:

In progress

Training data: jniecko/ebpf-k8s-attack-detection

Models

File Traffic Aggregation Bundle keys ROC-AUC
iforest_flat02_global.pkl Flat (100 users) Global (cluster-wide) model, FEAT 0.881
iforest_flat02.pkl Flat (100 users) Per-pod m1, m2, FEAT_M1, FEAT_M2 M1: 0.763 / M2: 0.785
iforest_run01_global.pkl Seasonal (20–200 users) Global (cluster-wide) model, FEAT 0.720
iforest_run01.pkl Seasonal (20–200 users) Per-pod m1, m2, FEAT_M1, FEAT_M2 M1: 0.825 / M2: 0.849

Per-pod bundles contain multiple model variants in one file. M1 = syscall features only; M2 = syscall + Locust load features (req/s, p95 latency).

Environment

scikit-learn >= 1.4
python >= 3.10

Usage

Global model (iforest_flat02_global.pkl, iforest_run01_global.pkl)

import pickle

with open("iforest_flat02_global.pkl", "rb") as f:
    bundle = pickle.load(f)

model = bundle["model"]        # IsolationForest instance
feature_cols = bundle["FEAT"]  # list of column names
mean = bundle["global_mean"]   # pd.Series — z-score normalisation
std  = bundle["global_std"]

X_norm = (X[feature_cols] - mean) / std
scores = model.decision_function(X_norm)
# More negative = more anomalous

Per-pod bundle (iforest_flat02.pkl, iforest_run01.pkl)

import pickle

with open("iforest_run01.pkl", "rb") as f:
    bundle = pickle.load(f)

# M1 — syscall features only
model_m1 = bundle["m1"]        # IsolationForest
feat_m1  = bundle["FEAT_M1"]   # list of column names

# M2 — syscall + Locust load features (req/s, p95 latency)
model_m2 = bundle["m2"]
feat_m2  = bundle["FEAT_M2"]

# Per-pod z-score normalisation stats (indexed by pod name)
pod_mean = bundle["per_pod_mean"]  # dict[pod_name -> pd.Series]
pod_std  = bundle["per_pod_std"]

# Normalise a single pod's window DataFrame
pod = "frontend-abc123"
X_norm = (X[feat_m1] - pod_mean[pod]) / pod_std[pod]

scores_m1 = model_m1.decision_function(X_norm)
# For unknown pods, fall back to global stats:
# X_norm = (X[feat_m1] - bundle["global_mean"]) / bundle["global_std"]

Security note: Only load .pkl files from trusted sources. Pickle deserialization can execute arbitrary code.

Model Parameters

IsolationForest(
    n_estimators=100,
    contamination=0.40,   # global variant
    # contamination=0.042  # per-pod variant
    random_state=42
)

Trained on cycles 1–2 (no attacks), evaluated on cycles 3–5 (temporal split).

Attack Types Covered

xmrig, revshell, distroless_revshell, k8sapi, suid_escalation, ld_preload

Citation

In progress

License

CC BY 4.0

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support