CICIDS-2017 SOC Tier-1 Intrusion Detector

Calibrated LightGBM for 5-class network intrusion detection trained on the official CICIDS-2017 dataset.

Performance (calibrated, temporal test split)

Class Precision Recall F1
Normal 0.98 1.00 0.99
DoS 1.00 0.85 0.92
PortScan 0.95 1.00 0.97
Brute Force 0.99 0.99 0.99
Web Attack 0.86 0.91 0.88
Macro F1 0.9518

Web Attack precision improved from 0.19 โ†’ 0.86 vs the standard baseline.

Artifacts

File Description
tier1_lgbm_calibrated.pkl Calibrated LightGBM (isotonic, natural-dist cal set)
scaler.pkl StandardScaler fitted on training data only
feature_selector.pkl RF-based SelectFromModel (24 of 71 features kept)
selected_features.pkl List of 24 selected feature names
feature_cols.pkl Full list of 71 input features (pre-selection)
label_encoder.pkl LabelEncoder: Brute Force=0, DoS=1, Normal=2, PortScan=3, Web Attack=4

Usage

import joblib, numpy as np

le       = joblib.load('label_encoder.pkl')
scaler   = joblib.load('scaler.pkl')
selector = joblib.load('feature_selector.pkl')
model    = joblib.load('tier1_lgbm_calibrated.pkl')

# flows: DataFrame with columns matching feature_cols (71 CICFlowMeter features)
X = scaler.transform(flows.values.astype('float32'))
X = selector.transform(X)
preds  = le.inverse_transform(model.predict(X))
probas = model.predict_proba(X)   # calibrated โ€” P(Normal)~0.83 for benign traffic
Key design decisions
- **Temporal split**: per-class 80/20 chronological cut per daily file
- **SMOTE**: Web Attack 1.5kโ†’5k, Brute Force 7kโ†’15k before undersampling
- **Early stopping**: balanced resampled eval set (not natural-dist) so all 5 classes
  contribute equally to the stopping signal
- **Calibration**: `FrozenEstimator` + isotonic on natural-distribution val set (83% Normal)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support