# Explainable IDS Full Pipeline — Code Walkthrough This document explains the notebook `explainable_ids_full_pipeline.ipynb` in very detailed practical terms. The goal is to understand **what each line or block does**, **why it exists**, and **how it connects to the project deliverables**: 1. Train an IDS model. 2. Explain IDS predictions. 3. Evaluate explanation stability and faithfulness. 4. Analyze security/adversarial risks. The notebook is organized into seven main parts: - setup and imports, - dataset loading and preprocessing, - model definitions, - model training and evaluation, - SHAP explanations, - LIME explanations, - stability, faithfulness, and security analysis. --- ## Big Picture Before Reading the Code The project is an **Explainable Intrusion Detection System (X-IDS)**. The dataset is **NSL-KDD**, where each row is a network connection. Each connection has 41 features such as protocol, service, duration, bytes, login status, error rates, and host-level statistics. The target label is binary: - `normal` - `anomaly` The notebook trains three neural models: - **MLP**: a standard feed-forward network for tabular data. - **LSTM**: treats the 41 features like a sequence. - **1D-CNN**: treats the 41 features like a one-dimensional signal. Then it explains predictions using: - **SHAP**: feature contribution values based on Shapley values. - **LIME**: local surrogate explanations based on perturbations. Then it asks: - Are explanations stable? - Are explanations faithful? - Are important features manipulable by attackers? --- # Cell 2 — Install Dependencies ```python !pip install -q torch numpy pandas scikit-learn datasets shap lime matplotlib scipy ``` ### What it does This line installs all Python packages needed in Google Colab. - `torch`: PyTorch, used to build and train neural networks. - `numpy`: numerical arrays and mathematical operations. - `pandas`: table/dataframe manipulation. - `scikit-learn`: preprocessing and metrics. - `datasets`: Hugging Face library to load NSL-KDD. - `shap`: SHAP explanations. - `lime`: LIME explanations. - `matplotlib`: plots and figures. - `scipy`: statistics such as Pearson and Spearman correlations. ### Why it matters This prepares the environment. Without these libraries, the rest of the notebook cannot run. ### Mapping to the project This supports **all tasks** because it installs the tools for training, explaining, evaluating, and plotting. --- # Cell 3 — Imports, Reproducibility, and Device Setup ```python import os, sys, json, time, random, pickle ``` Imports standard Python utilities. - `os`, `sys`: system/file utilities. - `json`: could be used for saving structured results. - `time`: used to measure training time. - `random`: Python random generator. - `pickle`: can save/load Python objects. ```python import numpy as np ``` Imports NumPy as `np`. Almost all numerical arrays in preprocessing, SHAP, LIME, and metrics use NumPy. ```python import pandas as pd ``` Imports pandas as `pd`. The NSL-KDD dataset is converted to pandas DataFrames so we can manipulate columns easily. ```python import torch ``` Imports PyTorch main library. ```python import torch.nn as nn ``` Imports PyTorch neural-network module as `nn`. This is used for layers like `Linear`, `LSTM`, `Conv1d`, `BatchNorm`, `Dropout`, and `CrossEntropyLoss`. ```python from torch.utils.data import TensorDataset, DataLoader ``` Imports utilities to package arrays into datasets and mini-batches. - `TensorDataset`: wraps tensors `(X, y)` together. - `DataLoader`: creates batches for training and testing. ```python from sklearn.preprocessing import LabelEncoder, MinMaxScaler ``` Imports preprocessing tools. - `LabelEncoder`: converts categorical strings to integers. - `MinMaxScaler`: scales numerical features into `[0, 1]`. ```python from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score, average_precision_score ``` Imports evaluation metrics. - `classification_report`: precision, recall, F1-score. - `confusion_matrix`: counts correct/incorrect predictions by class. - `roc_auc_score`: ROC-AUC ranking metric. - `average_precision_score`: PR-AUC / average precision. ```python from datasets import load_dataset ``` Imports Hugging Face dataset loader. Used to download/load NSL-KDD. ```python import shap ``` Imports SHAP explainability library. ```python from lime import lime_tabular ``` Imports LIME tabular explainer. ```python from scipy.stats import spearmanr, pearsonr ``` Imports statistical correlation functions. - `spearmanr`: rank correlation. Used for comparing feature rankings and LIME stability. - `pearsonr`: linear correlation. Used for SHAP perturbation stability. ```python import matplotlib.pyplot as plt ``` Imports plotting interface. ```python import warnings warnings.filterwarnings('ignore') ``` Suppresses warning messages to keep the Colab output cleaner. ### Reproducibility block ```python SEED = 42 ``` Defines the random seed. A seed is a fixed starting point for randomness. ```python random.seed(SEED) ``` Fixes Python's built-in random generator. ```python np.random.seed(SEED) ``` Fixes NumPy randomness. This affects random sample selection for SHAP/LIME and stability tests. ```python torch.manual_seed(SEED) ``` Fixes PyTorch randomness, such as weight initialization and training randomness. ```python torch.backends.cudnn.deterministic = True ``` Forces deterministic CUDA operations where possible. This improves reproducibility. ```python torch.backends.cudnn.benchmark = False ``` Disables CuDNN benchmarking. Benchmarking can choose different algorithms depending on runtime conditions, which hurts reproducibility. ### Device selection ```python DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu') ``` Checks if a GPU is available. If yes, training uses CUDA GPU; otherwise it uses CPU. ```python print(f'Device: {DEVICE}') ``` Prints the selected device. ```python if DEVICE.type == 'cuda': print(f'GPU: {torch.cuda.get_device_name(0)}') ``` If running on GPU, prints the GPU name. In the final run it was Tesla T4. ### Mapping to the project This cell establishes reproducibility and compute setup. In an academic report, reproducibility is important because results should be repeatable. --- # Cell 5 — Feature Names, Dataset Loading, and Class Distribution ```python FEATURE_NAMES = [ ... ] ``` This list contains the 41 NSL-KDD feature names in the exact order used by the dataset and model. The list is not just cosmetic. It is needed for: - selecting feature columns from the DataFrame, - preserving consistent input order, - labeling SHAP plots, - labeling LIME explanations, - interpreting security implications. ### Lines 1–16 — NSL-KDD features The features include: - Basic connection features: `duration`, `protocol_type`, `service`, `flag`, `src_bytes`, `dst_bytes`. - Content features: `hot`, `num_failed_logins`, `logged_in`, `root_shell`, etc. - Time-based traffic features: `count`, `srv_count`, `serror_rate`, `rerror_rate`, etc. - Host-based traffic features: `dst_host_count`, `dst_host_srv_count`, `dst_host_*` rates. Why this matters: later, when SHAP says `logged_in` is important, we know exactly which IDS feature influenced the model. ```python CATEGORICAL_COLS = ['protocol_type', 'service', 'flag'] ``` Defines the three categorical columns. These contain strings, not numbers, so they must be encoded before feeding them into neural networks. ```python ds = load_dataset('Mireu-Lab/NSL-KDD') ``` Loads NSL-KDD from Hugging Face. ```python df_train = ds['train'].to_pandas() df_test = ds['test'].to_pandas() ``` Converts train and test splits into pandas DataFrames. Pandas makes column operations easier. ```python print(f'Train: {len(df_train)} | Test: {len(df_test)}') ``` Prints dataset sizes. Final output: - Train: 151,165 - Test: 34,394 ```python print('\nTrain distribution:') print(df_train['class'].value_counts()) ``` Prints how many normal/anomaly samples exist in training. ```python print('\nTest distribution:') print(df_test['class'].value_counts()) ``` Prints class distribution in the test set. ### Why class distribution matters The train and test distributions are different: - Train has more normal than anomaly. - Test has more anomaly than normal. This matters because the model must generalize under distribution shift. ### Mapping to project This cell supports the **dataset understanding** part of the report. It proves what data we used and shows imbalance/distribution shift. --- # Cell 6 — Target Encoding, Categorical Encoding, and Scaling ```python # Encode target (binary: anomaly=0, normal=1) ``` Comment explaining the binary label setup. ```python class_names = ['anomaly', 'normal'] ``` Defines readable class names. This is used later in classification reports and LIME explanations. ```python le_y = LabelEncoder() ``` Creates a label encoder for target labels. ```python y_train = le_y.fit_transform(df_train['class'].values) ``` Fits the encoder on the training labels and transforms them into integers. In this dataset, the final encoding is: - anomaly = 0 - normal = 1 ```python y_test = le_y.transform(df_test['class'].values) ``` Transforms test labels using the same encoder learned from training. Important: we do not fit on test labels, because the test set must remain unseen. ```python df_tr, df_te = df_train.copy(), df_test.copy() ``` Creates copies of the train and test DataFrames so original data remains unchanged. ```python label_encoders = {} ``` Creates a dictionary to store encoders for each categorical feature. ```python for col in CATEGORICAL_COLS: ``` Loops over the categorical columns: protocol_type, service, flag. ```python le = LabelEncoder() ``` Creates a new encoder for the current categorical column. ```python le.fit(df_tr[col]) ``` Fits the encoder only on training categories. ```python known = set(le.classes_) ``` Stores categories seen during training. ```python df_te[col] = df_te[col].apply(lambda x: x if x in known else le.classes_[0]) ``` Handles possible unknown categories in test data. If a test category was not seen during training, it is replaced by the first known class. Why: LabelEncoder cannot transform unseen labels. This prevents runtime errors. ```python df_tr[col] = le.transform(df_tr[col]) ``` Transforms training categorical values into integers. ```python df_te[col] = le.transform(df_te[col]) ``` Transforms test categorical values using the same encoder. ```python label_encoders[col] = le ``` Stores the encoder for later inspection or inverse transformation. ```python print(f'Encoded {col}: {len(le.classes_)} categories') ``` Prints how many categories each column has. Final output: - protocol_type: 3 categories - service: 70 categories - flag: 11 categories ### Scaling ```python scaler = MinMaxScaler() ``` Creates a scaler that maps each feature to [0, 1]. ```python X_train = scaler.fit_transform(df_tr[FEATURE_NAMES].values.astype(np.float32)) ``` Takes training features, converts them to float32, fits the scaler on training data, and transforms training features. Important: fit only on training data. ```python X_test = scaler.transform(df_te[FEATURE_NAMES].values.astype(np.float32)) ``` Transforms test features using the training scaler. Again, no fitting on test data to avoid data leakage. ```python print(f'\nX_train: {X_train.shape} | X_test: {X_test.shape}') ``` Prints feature matrix shapes. Final output: - X_train: (151165, 41) - X_test: (34394, 41) ```python print(f'y_train: {np.bincount(y_train)} | y_test: {np.bincount(y_test)}') ``` Prints encoded class counts. ### Why this cell is essential Neural networks cannot directly process strings or unscaled heterogeneous features. This cell converts the raw dataset into clean numerical tensors. ### Mapping to project This is the **preprocessing pipeline** in the report. --- # Cell 8 — Model Definitions This cell defines the three deep learning models. --- ## MLP_IDS ```python class MLP_IDS(nn.Module): ``` Defines a PyTorch class for the MLP model. It inherits from `nn.Module`, which is required for PyTorch models. ```python def __init__(self, in_dim=41, num_classes=2): ``` Constructor. Input dimension is 41 because NSL-KDD has 41 features. Number of classes is 2: anomaly and normal. ```python super().__init__() ``` Initializes the parent PyTorch module. ```python self.net = nn.Sequential( ``` Creates a sequence of layers that will run one after another. ```python nn.Linear(in_dim, 256), nn.BatchNorm1d(256), nn.ReLU(), nn.Dropout(0.3), ``` First hidden block: - `Linear(41, 256)`: maps 41 input features to 256 hidden units. - `BatchNorm1d(256)`: stabilizes hidden activations. - `ReLU()`: adds non-linearity. - `Dropout(0.3)`: randomly drops 30% of activations during training to reduce overfitting. ```python nn.Linear(256, 128), nn.BatchNorm1d(128), nn.ReLU(), nn.Dropout(0.2), ``` Second hidden block. Reduces representation from 256 to 128. ```python nn.Linear(128, 64), nn.ReLU(), ``` Third hidden block. Reduces from 128 to 64. ```python nn.Linear(64, num_classes) ``` Output layer. Produces two logits: one for anomaly and one for normal. ```python ) ``` Ends the sequential model. ```python for m in self.modules(): ``` Loops through all modules/layers inside the model. ```python if isinstance(m, nn.Linear): ``` Checks if the current module is a linear layer. ```python nn.init.xavier_uniform_(m.weight) ``` Initializes weights using Xavier uniform initialization. This helps gradients flow well at the start of training. ```python nn.init.zeros_(m.bias) ``` Initializes biases to zero. ```python def forward(self, x): return self.net(x) ``` Defines the forward pass. Input `x` passes through `self.net`. ```python def count_parameters(self): return sum(p.numel() for p in self.parameters() if p.requires_grad) ``` Counts trainable parameters. Used for reporting model size. ### Why MLP is used MLP is the simplest strong baseline for tabular data. If a complex model beats the MLP, that suggests the extra architecture has value. --- ## LSTM_IDS ```python class LSTM_IDS(nn.Module): ``` Defines the LSTM model class. ```python def __init__(self, in_dim=41, hidden_dim=64, num_layers=2, num_classes=2): ``` Constructor. It uses 41 features, hidden size 64, 2 LSTM layers, and 2 output classes. ```python super().__init__() ``` Initializes parent module. ```python self.lstm = nn.LSTM(1, hidden_dim, num_layers, batch_first=True, dropout=0.2) ``` Creates an LSTM. Important detail: each feature is treated as one timestep with one value. So input shape becomes: ```text batch_size × 41 × 1 ``` - `input_size=1`: each timestep contains one feature value. - `hidden_dim=64`: LSTM hidden representation size. - `num_layers=2`: stacked LSTM layers. - `batch_first=True`: batch dimension comes first. - `dropout=0.2`: dropout between LSTM layers. ```python self.fc = nn.Sequential(nn.Linear(hidden_dim, 32), nn.ReLU(), nn.Linear(32, num_classes)) ``` Creates a small classifier after the LSTM. - 64 hidden state → 32 hidden units → 2 output classes. ```python def forward(self, x): ``` Defines forward pass. ```python out, (h_n, _) = self.lstm(x.unsqueeze(-1)) ``` `x` originally has shape: ```text batch_size × 41 ``` `x.unsqueeze(-1)` changes it to: ```text batch_size × 41 × 1 ``` The LSTM returns: - `out`: output at all timesteps. - `h_n`: final hidden states. - `_`: cell states, ignored. ```python return self.fc(h_n[-1]) ``` Uses the final hidden state from the last LSTM layer and feeds it into the classifier. ```python def count_parameters(self): return sum(p.numel() for p in self.parameters() if p.requires_grad) ``` Counts trainable parameters. ### Why LSTM is used Even though NSL-KDD is not a time series, the features have an order and groups. LSTM may learn dependencies across these feature groups. --- ## CNN1D_IDS ```python class CNN1D_IDS(nn.Module): ``` Defines the 1D-CNN model. ```python def __init__(self, in_dim=41, num_classes=2): ``` Constructor with 41 input features and 2 output classes. ```python super().__init__() ``` Initializes parent module. ```python self.conv = nn.Sequential( ``` Creates convolutional feature extractor. ```python nn.Conv1d(1, 64, 3, padding=1), nn.BatchNorm1d(64), nn.ReLU(), ``` First convolution block: - input channels = 1, - output channels = 64, - kernel size = 3, - padding = 1 keeps length 41. This learns local patterns across neighboring features. ```python nn.Conv1d(64, 128, 3, padding=1), nn.BatchNorm1d(128), nn.ReLU(), ``` Second convolution block, increasing channels from 64 to 128. ```python nn.AdaptiveAvgPool1d(8) ``` Compresses the sequence length to 8, regardless of input length. ```python ) ``` Ends convolution block. ```python self.fc = nn.Sequential(nn.Linear(128*8, 64), nn.ReLU(), nn.Dropout(0.2), nn.Linear(64, num_classes)) ``` Classifier after convolution: - Flattened size = 128 channels × 8 pooled positions. - Dense layer to 64. - ReLU. - Dropout. - Output layer to 2 classes. ```python def forward(self, x): ``` Defines forward pass. ```python x = self.conv(x.unsqueeze(1)) ``` Original `x` shape is: ```text batch_size × 41 ``` `x.unsqueeze(1)` gives: ```text batch_size × 1 × 41 ``` This is the format Conv1d expects. ```python return self.fc(x.view(x.size(0), -1)) ``` Flattens convolution output and feeds it to classifier. ```python def count_parameters(self): return sum(p.numel() for p in self.parameters() if p.requires_grad) ``` Counts parameters. ### Final model loop ```python for name, cls in [('MLP', MLP_IDS), ('LSTM', LSTM_IDS), ('CNN1D', CNN1D_IDS)]: ``` Loops over the three model classes. ```python m = cls() ``` Instantiates each model. ```python print(f'{name}: {m.count_parameters():,} parameters') ``` Prints model parameter counts. ### Mapping to project This cell implements the **Train model** requirement and sets up model comparison. --- # Cell 10 — Training All Models This is the largest and most important training cell. ```python EPOCHS = 50 BATCH_SIZE = 256 LR = 1e-3 ``` Defines training hyperparameters: - train for 50 epochs, - use mini-batches of 256 samples, - learning rate is 0.001. ```python train_ds = TensorDataset(torch.FloatTensor(X_train), torch.LongTensor(y_train)) ``` Converts training NumPy arrays into PyTorch tensors and bundles features/labels together. - Features become float tensors. - Labels become long integer tensors required by CrossEntropyLoss. ```python test_ds = TensorDataset(torch.FloatTensor(X_test), torch.LongTensor(y_test)) ``` Same for test data. ```python train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True) ``` Creates mini-batches for training and shuffles data each epoch. ```python test_loader = DataLoader(test_ds, batch_size=BATCH_SIZE) ``` Creates test batches. No shuffle is needed because evaluation order does not matter. ### Class weights ```python counts = np.bincount(y_train) ``` Counts how many examples exist per class. ```python weights = 1.0 / counts.astype(np.float32) ``` Creates inverse-frequency weights. Smaller classes get larger weight. ```python weights = weights / weights.sum() * len(weights) ``` Normalizes weights so their average scale is reasonable. ```python class_weights = torch.FloatTensor(weights).to(DEVICE) ``` Converts weights to PyTorch tensor and moves them to GPU/CPU. ### Why class weights? Class imbalance can make the model favor the majority class. Weighted loss penalizes mistakes on underrepresented classes more. --- ## train_model function ```python def train_model(model, model_name): ``` Defines a reusable function to train any of the three models. ```python print(...) ``` Prints a header showing which model is being trained. ```python model.to(DEVICE) ``` Moves model to GPU or CPU. ```python criterion = nn.CrossEntropyLoss(weight=class_weights) ``` Defines classification loss with class weights. CrossEntropyLoss expects raw logits, so the model does not need Softmax during training. ```python optimizer = torch.optim.Adam(model.parameters(), lr=LR, weight_decay=1e-4) ``` Creates Adam optimizer. - `lr=1e-3`: learning rate. - `weight_decay=1e-4`: L2 regularization to reduce overfitting. ```python scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=5, factor=0.5) ``` Creates learning-rate scheduler. If loss plateaus for 5 epochs, learning rate is halved. ```python best_f1, history = 0, {'train_loss': [], 'test_acc': []} ``` Initializes best F1 and stores training history. ```python best_state = None ``` Will store the best model weights. ```python t0 = time.time() ``` Starts timing training. ```python for epoch in range(EPOCHS): ``` Training loop over 50 epochs. ```python model.train() ``` Sets model to training mode. Enables dropout and batchnorm training behavior. ```python total_loss = 0 ``` Initializes epoch loss accumulator. ```python for xb, yb in train_loader: ``` Loops over training mini-batches. ```python xb, yb = xb.to(DEVICE), yb.to(DEVICE) ``` Moves batch to GPU/CPU. ```python optimizer.zero_grad() ``` Clears old gradients. ```python loss = criterion(model(xb), yb) ``` Runs model forward pass and computes cross-entropy loss. ```python loss.backward() ``` Backpropagates gradients. ```python optimizer.step() ``` Updates model weights. ```python total_loss += loss.item() * len(yb) ``` Adds weighted batch loss to epoch loss. ### Evaluation inside each epoch ```python model.eval() ``` Sets model to evaluation mode. Dropout is disabled, batchnorm uses learned statistics. ```python preds, probs, labels = [], [], [] ``` Creates lists to collect predictions, probabilities, and labels. ```python with torch.no_grad(): ``` Disables gradient computation to save memory and speed up evaluation. ```python for xb, yb in test_loader: ``` Loops through test batches. ```python xb = xb.to(DEVICE) ``` Moves features to GPU/CPU. ```python out = model(xb) ``` Gets raw logits. ```python preds.append(out.argmax(1).cpu().numpy()) ``` Predicted class is the index of the largest logit. ```python probs.append(torch.softmax(out, 1).cpu().numpy()) ``` Converts logits to class probabilities. ```python labels.append(yb.numpy()) ``` Stores true labels. ```python preds = np.concatenate(preds) probs = np.concatenate(probs) labels = np.concatenate(labels) ``` Combines batch arrays into full test arrays. ```python report = classification_report(labels, preds, output_dict=True) ``` Computes precision, recall, F1, etc. ```python wf1 = report['weighted avg']['f1-score'] ``` Extracts weighted F1-score. ```python acc = report['accuracy'] ``` Extracts accuracy. ```python test_loss = total_loss / len(y_train) ``` Despite the variable name, this is actually average training loss for the epoch. ```python scheduler.step(test_loss) ``` Updates scheduler based on loss. ```python history['train_loss'].append(total_loss / len(y_train)) history['test_acc'].append(acc) ``` Stores loss and accuracy for plots. ```python if wf1 > best_f1: ``` Checks if current model is best so far. ```python best_f1 = wf1 ``` Updates best F1. ```python best_state = {k: v.cpu().clone() for k, v in model.state_dict().items()} ``` Saves a copy of best model weights on CPU. ```python if (epoch+1) % 10 == 0 or epoch == 0: ``` Prints progress at epoch 1 and every 10 epochs. ```python print(...) ``` Shows epoch, loss, accuracy, and F1. ### Final evaluation ```python dt = time.time() - t0 ``` Measures total training time. ```python model.load_state_dict(best_state) ``` Restores best model weights. ```python model.eval() ``` Sets evaluation mode. The next block repeats final evaluation on the test set to compute final metrics. ```python roc = roc_auc_score(labels, probs[:, 1]) ``` Computes ROC-AUC using probability of class 1 (`normal`). ```python pr = average_precision_score(labels, probs[:, 1]) ``` Computes PR-AUC / average precision. ```python print(...) print(classification_report(...)) print(confusion_matrix(...)) ``` Prints final metrics, per-class report, and confusion matrix. ```python return model, {...} ``` Returns trained model and result dictionary. ### Training all models ```python models = {} results = {} ``` Creates dictionaries to store models and results. ```python for name, cls in [('mlp', MLP_IDS), ('lstm', LSTM_IDS), ('cnn1d', CNN1D_IDS)]: ``` Loops over model classes. ```python models[name], results[name] = train_model(cls(), name.upper()) ``` Instantiates, trains, and stores each model. ### Mapping to project This cell implements the **Train model** part and produces the model comparison results. --- # Cells 11 and 12 — Model Summary and Training Curves ## Cell 11 ```python print(f'{"Model":<8} {"Params":>8} {"W-F1":>8} {"ROC-AUC":>9} {"PR-AUC":>8} {"Time":>8}') ``` Prints table header. ```python print('-'*50) ``` Prints separator line. ```python for name in ['mlp', 'lstm', 'cnn1d']: ``` Loops over the three trained models. ```python r = results[name] ``` Gets metric dictionary. ```python p = models[name].count_parameters() ``` Gets parameter count. ```python print(...) ``` Prints model name, parameters, F1, ROC-AUC, PR-AUC, and time. ### Why this matters This is the main quantitative result table in the report. ## Cell 12 ```python fig, axes = plt.subplots(1, 2, figsize=(14, 5)) ``` Creates two side-by-side plots. ```python for name in ['mlp', 'lstm', 'cnn1d']: ``` Loops over models. ```python axes[0].plot(results[name]['history']['train_loss'], label=name.upper()) ``` Plots training loss over epochs. ```python axes[1].plot(results[name]['history']['test_acc'], label=name.upper()) ``` Plots test accuracy over epochs. ```python axes[0].set_xlabel(...); ... ``` Labels first plot. ```python axes[1].set_xlabel(...); ... ``` Labels second plot. ```python plt.tight_layout(); plt.show() ``` Adjusts spacing and displays plots. ### Mapping to project These plots support training analysis and make the report/presentation visual. --- # Cell 14 — SHAP Setup and SHAP Value Computation ```python mlp_cpu = models['mlp'].cpu().eval() ``` Moves the trained MLP to CPU and sets evaluation mode. Why MLP? The project uses MLP for SHAP explanation because it is a clean tabular baseline and easier to explain consistently. ```python def predict_fn(X): ``` Defines a prediction wrapper for SHAP and LIME. ```python with torch.no_grad(): ``` No gradients are needed for explanation queries. ```python return torch.softmax(mlp_cpu(torch.FloatTensor(X)), 1).numpy() ``` Converts NumPy input to PyTorch tensor, runs the MLP, applies Softmax, and returns probabilities. SHAP and LIME need a function that takes NumPy arrays and returns prediction probabilities. ```python bg_idx = np.random.choice(len(X_train), 100, replace=False) ``` Randomly selects 100 training samples as SHAP background data. Background data represents the baseline distribution. ```python exp_idx = np.random.choice(len(X_test), 150, replace=False) ``` Randomly selects 150 test samples to explain. Why sample? Kernel SHAP is expensive; explaining the entire test set would take too long. ```python explainer = shap.KernelExplainer(predict_fn, X_train[bg_idx]) ``` Creates a model-agnostic SHAP explainer using prediction function and background data. ```python print('Computing SHAP values...') ``` Progress message. ```python shap_values_raw = explainer.shap_values(X_test[exp_idx], nsamples=200, silent=True) ``` Computes SHAP values for 150 test samples using 200 samples for approximation. ```python if isinstance(shap_values_raw, list): shap_vals_anomaly = shap_values_raw[0] elif shap_values_raw.ndim == 3: shap_vals_anomaly = shap_values_raw[:, :, 0] else: shap_vals_anomaly = shap_values_raw ``` Handles different SHAP library output formats. - Older SHAP returns a list per class. - Newer SHAP may return a 3D array. - The code extracts class 0: anomaly. ```python print(f'Done! Shape: {shap_vals_anomaly.shape}') ``` Prints SHAP array shape. Final shape is `(150, 41)`. ### Mapping to project This is the core of **Explain predictions** using SHAP. --- # Cells 15–18 — SHAP Global and Local Explanations ## Cell 15 — Feature Importance ```python mean_abs_shap = np.abs(shap_vals_anomaly).mean(axis=0) ``` Takes absolute SHAP values and averages across samples. Why absolute value? We care about magnitude of influence, regardless of direction. ```python feature_importance = sorted(zip(FEATURE_NAMES, mean_abs_shap), key=lambda x: x[1], reverse=True) ``` Pairs each feature name with its importance and sorts descending. ```python print('Top 15 features...') ``` Prints heading. ```python for i, (f, v) in enumerate(feature_importance[:15]): print(...) ``` Prints top 15 features. ## Cell 16 — SHAP Summary Plot ```python shap.summary_plot(shap_vals_anomaly, X_test[exp_idx], feature_names=FEATURE_NAMES, max_display=15) ``` Creates SHAP summary plot. It shows: - feature importance, - direction of feature effect, - distribution of SHAP values, - top 15 features. ## Cell 17 — SHAP Bar Plot ```python plt.figure(figsize=(10, 6)) ``` Creates figure. ```python top15 = feature_importance[:15] ``` Selects top 15 SHAP features. ```python plt.barh(range(15), [v for _, v in top15][::-1], color='steelblue') ``` Draws horizontal bar chart. `[::-1]` reverses order so most important appears at top visually. ```python plt.yticks(range(15), [f for f, _ in top15][::-1]) ``` Labels bars with feature names. ```python plt.xlabel('Mean |SHAP value|') plt.title('Top 15 Features — MLP (Anomaly Class)') plt.tight_layout(); plt.show() ``` Adds labels, title, and displays plot. ## Cell 18 — Local SHAP Explanation ```python idx = 0 ``` Selects first explained test sample. ```python pred = predict_fn(X_test[exp_idx[idx:idx+1]]) ``` Gets prediction probabilities for that sample. ```python print(f'Sample prediction: anomaly={pred[0][0]:.3f}, normal={pred[0][1]:.3f}') ``` Prints predicted probabilities. ```python print(f'True label: {class_names[y_test[exp_idx[idx]]]}') ``` Prints true label. ```python ev = explainer.expected_value ``` Gets SHAP baseline expected output. ```python ev0 = ev[0] if isinstance(ev, (list, np.ndarray)) else ev ``` Handles expected value format. ```python shap.force_plot(ev0, shap_vals_anomaly[idx], X_test[exp_idx[idx]], feature_names=FEATURE_NAMES, matplotlib=True) ``` Creates force plot for one prediction. ### Mapping to project These cells produce the **explanation analysis deliverable**. --- # Cell 20 — LIME Explanation Analysis ```python lime_explainer = lime_tabular.LimeTabularExplainer(...) ``` Creates a LIME explainer for tabular data. ```python X_train ``` Training data is used by LIME to understand feature distributions. ```python feature_names=FEATURE_NAMES ``` Gives readable feature names. ```python class_names=class_names ``` Gives readable class labels. ```python discretize_continuous=True ``` LIME bins continuous features into intervals, making explanations more interpretable. ```python random_state=SEED ``` Makes LIME sampling reproducible. ```python n_lime = 30 ``` Number of test samples to explain. ```python lime_idx = np.random.choice(len(X_test), n_lime, replace=False) ``` Randomly selects 30 test samples. ```python all_top_features = {} ``` Dictionary to count how often each feature appears in LIME top explanations. ```python for i, idx in enumerate(lime_idx): ``` Loops over selected samples. ```python exp = lime_explainer.explain_instance(X_test[idx], predict_fn, num_features=10, top_labels=1) ``` Generates a LIME explanation for one sample. - `num_features=10`: keep top 10 features. - `top_labels=1`: explain predicted class. ```python pred_class = np.argmax(predict_fn(X_test[idx].reshape(1, -1))) ``` Gets predicted class for that sample. ```python for fw in exp.as_list(label=pred_class): ``` Loops over feature-weight pairs in the LIME explanation. ```python fname = fw[0].split(' ')[0] ``` Extracts feature name from LIME's text rule. ```python all_top_features[fname] = all_top_features.get(fname, 0) + 1 ``` Counts how often this feature appears. ```python if (i+1) % 10 == 0: print(...) ``` Progress every 10 samples. ```python lime_sorted = sorted(all_top_features.items(), key=lambda x: x[1], reverse=True) ``` Sorts features by frequency. ```python for f, c in lime_sorted[:10]: print(...) ``` Prints top 10 LIME features. ### Mapping to project This implements the **Apply explainability** task using LIME. --- # Cell 21 — SHAP vs LIME Comparison ```python fig, axes = plt.subplots(1, 2, figsize=(16, 6)) ``` Creates two side-by-side plots. ```python top10_shap = feature_importance[:10] ``` Gets top 10 SHAP features. ```python axes[0].barh(...) ``` Plots SHAP top 10. ```python top10_lime = lime_sorted[:10] ``` Gets top 10 LIME features. ```python axes[1].barh(...) ``` Plots LIME top 10. ```python plt.suptitle('SHAP vs LIME Feature Rankings', fontsize=14) plt.tight_layout(); plt.show() ``` Displays comparison plot. ### Rank correlation ```python shap_ranks = {f: i for i, (f, _) in enumerate(feature_importance[:20])} ``` Creates dictionary mapping SHAP feature to rank. ```python lime_ranks = {f: i for i, (f, _) in enumerate(lime_sorted[:20])} ``` Creates dictionary mapping LIME feature to rank. ```python common = set(shap_ranks.keys()) & set(lime_ranks.keys()) ``` Finds features appearing in both top-20 lists. ```python if len(common) >= 5: ``` Only compute correlation if enough overlap exists. ```python rho, p = spearmanr([...], [...]) ``` Computes Spearman rank correlation between SHAP and LIME rankings. ```python print(...) ``` Prints result. Final result: ```text Spearman correlation = 0.0714 p = 0.8665 ``` ### Interpretation SHAP and LIME disagree strongly. This is a key finding: explanations depend on method choice. --- # Cell 23 — SHAP Stability Evaluation ```python def compute_shap_stability(explainer, sample, epsilon, n_perturbs=10): ``` Defines function to evaluate how stable SHAP is under perturbations. ```python rng = np.random.RandomState(SEED) ``` Creates deterministic random generator. ```python base = np.array(explainer.shap_values(sample.reshape(1,-1), nsamples=100, silent=True)) ``` Computes original SHAP explanation for the sample. ```python base = base[0].flatten() if isinstance(base, list) else base.flatten() ``` Flattens SHAP values into one vector. ```python max_delta, pccs = 0, [] ``` Initializes maximum explanation change and list of correlations. ```python for _ in range(n_perturbs): ``` Repeats perturbation several times. ```python noise = rng.uniform(-epsilon, epsilon, sample.shape) ``` Creates random noise bounded by epsilon. ```python perturbed = np.clip(sample + noise, 0, 1) ``` Adds noise and clips features to valid [0,1] range. ```python p_shap = np.array(explainer.shap_values(perturbed.reshape(1,-1), nsamples=100, silent=True)) ``` Computes SHAP explanation for perturbed sample. ```python p_shap = p_shap[0].flatten() if isinstance(p_shap, list) else p_shap.flatten() ``` Flattens perturbed explanation. ```python max_delta = max(max_delta, np.linalg.norm(p_shap - base)) ``` Computes explanation shift magnitude and keeps maximum. This is SENS_MAX. ```python if np.std(base) > 1e-8 and np.std(p_shap) > 1e-8: ``` Avoids correlation if vector has near-zero variance. ```python pccs.append(pearsonr(base, p_shap)[0]) ``` Computes Pearson correlation between original and perturbed SHAP values. ```python return max_delta, np.mean(pccs) if pccs else 0.0 ``` Returns SENS_MAX and average PCC. ### Running the test ```python epsilons = [0.01, 0.03, 0.05] ``` Perturbation sizes. ```python n_stability = 8 ``` Number of samples used for stability test. ```python stability_idx = np.random.choice(len(X_test), n_stability, replace=False) ``` Randomly selects test samples. ```python stability_results = {} ``` Stores results. ```python for eps in epsilons: ``` Loops over perturbation sizes. ```python sens_list, pcc_list = [], [] ``` Stores metrics per sample. ```python for i, idx in enumerate(stability_idx): ``` Loops over selected samples. ```python sm, pc = compute_shap_stability(...) ``` Computes SENS_MAX and PCC. ```python sens_list.append(sm); pcc_list.append(pc) ``` Stores results. ```python stability_results[eps] = {'sens_max': np.mean(sens_list), 'pcc': np.mean(pcc_list)} ``` Stores average metrics. ```python status = 'STABLE' if np.mean(pcc_list) > 0.6 else 'UNSTABLE' ``` Classifies explanation stability using threshold 0.6. ### Mapping to project This implements **Evaluate explanation stability**. --- # Cell 24 — LIME Stochastic Stability This evaluates whether LIME gives consistent explanations when run multiple times. ```python lime_corrs = [] ``` Stores average correlation per sample. ```python for i, idx in enumerate(stability_idx[:6]): ``` Uses first 6 stability samples. ```python weight_vecs = [] ``` Stores LIME weight vectors from different seeds. ```python for seed in range(10): ``` Runs LIME 10 times with different seeds. ```python le_obj = lime_tabular.LimeTabularExplainer(..., random_state=seed) ``` Creates a new LIME explainer with a different random seed. ```python exp = le_obj.explain_instance(..., num_features=len(FEATURE_NAMES)) ``` Explains the sample using all features. ```python w = np.zeros(len(FEATURE_NAMES)) ``` Creates a zero vector of feature weights. ```python for key, val in dict(exp.as_list()).items(): ``` Loops over LIME explanation terms. ```python for j, fn in enumerate(FEATURE_NAMES): if fn in key: w[j] = val; break ``` Maps LIME text rules back to feature indices. ```python weight_vecs.append(w) ``` Stores one explanation vector. ```python corrs = [] ``` Stores pairwise correlations. ```python for a in range(10): for b in range(a+1, 10): ``` Compares all pairs of the 10 runs. ```python if np.std(weight_vecs[a]) > 1e-8 and np.std(weight_vecs[b]) > 1e-8: ``` Avoids invalid correlation. ```python corrs.append(spearmanr(weight_vecs[a], weight_vecs[b])[0]) ``` Computes Spearman correlation between two LIME runs. ```python mc = np.mean(corrs) if corrs else 0 ``` Mean correlation for this sample. ```python lime_corrs.append(mc) ``` Stores it. ```python lime_status = 'STABLE' if np.mean(lime_corrs) > 0.6 else 'UNSTABLE' ``` Classifies LIME stability. ### Mapping to project This tests whether LIME explanations are reliable despite LIME randomness. --- # Cell 25 — Faithfulness Evaluation Faithfulness asks: do the important features actually matter to the model? ```python def get_shap_for_class(shap_values, class_idx=0): ``` Helper function for SHAP output formats. ```python if isinstance(shap_values, list): return shap_values[class_idx] ``` Older SHAP format. ```python elif isinstance(shap_values, np.ndarray) and shap_values.ndim == 3: return shap_values[:, :, class_idx] ``` Newer SHAP 3D format. ```python else: return shap_values ``` Fallback. ```python faith_results = {k: [] for k in [3, 5, 10]} ``` Creates result lists for top-3, top-5, and top-10 feature masking. ```python for idx in stability_idx[:10]: ``` Loops over up to 10 samples. ```python sample = X_test[idx] ``` Gets one test sample. ```python sv_raw = explainer.shap_values(sample.reshape(1,-1), nsamples=100, silent=True) ``` Computes SHAP values. ```python sv = get_shap_for_class(sv_raw, 0).flatten() ``` Extracts anomaly-class SHAP vector. ```python base_conf = predict_fn(sample.reshape(1,-1))[0] ``` Gets original prediction probabilities. ```python pred_cls = np.argmax(base_conf) ``` Gets predicted class. ```python for k in faith_results: ``` Loops over k = 3, 5, 10. ```python masked = sample.copy() ``` Copies sample. ```python masked[np.argsort(np.abs(sv))[-k:]] = 0.0 ``` Finds top-k absolute SHAP features and masks them by setting to 0. ```python drop = base_conf[pred_cls] - predict_fn(masked.reshape(1,-1))[0][pred_cls] ``` Measures confidence drop after masking. ```python faith_results[k].append(float(drop)) ``` Stores confidence drop. ```python for k, scores in faith_results.items(): print(...) ``` Prints average and standard deviation. ### Mapping to project This implements **Evaluate explanation faithfulness**. --- # Cell 26 — Stability Summary Plot ```python fig, axes = plt.subplots(1, 3, figsize=(16, 5)) ``` Creates three plots side by side. ```python eps_list = list(stability_results.keys()) ``` Gets epsilon values. ```python axes[0].plot(eps_list, [stability_results[e]['sens_max'] for e in eps_list], ...) ``` Plots SENS_MAX vs epsilon. ```python pcc_vals = [stability_results[e]['pcc'] for e in eps_list] ``` Gets PCC values. ```python colors = ['green' if p > 0.6 else 'red' for p in pcc_vals] ``` Green bars for stable, red for unstable. ```python axes[1].bar(...) ``` Plots PCC stability bars. ```python axes[1].axhline(y=0.6, ...) ``` Draws stability threshold line. ```python ks = list(faith_results.keys()) ``` Gets masking sizes 3, 5, 10. ```python axes[2].bar(...) ``` Plots faithfulness confidence drop with error bars. ```python plt.suptitle(...) plt.tight_layout(); plt.show() ``` Adds title and displays. ### Mapping to project This creates the figure used to summarize explanation reliability. --- # Cell 28 — Security Implications / Feature Manipulability ```python manipulable = {...} ``` Defines features that attackers may directly influence. Examples: - `src_bytes` - `dst_bytes` - `hot` - `num_failed_logins` - `duration` ```python partial = {...} ``` Defines partially manipulable features. Examples: - `count` - `srv_count` - `serror_rate` - `rerror_rate` - `protocol_type` - `flag` These can sometimes be influenced but not freely controlled. ```python non_manip = {...} ``` Defines non-manipulable features such as host-level aggregated statistics. Examples: - `dst_host_count` - `dst_host_srv_count` - `dst_host_rerror_rate` - `dst_host_serror_rate` These are computed by IDS sensors or depend on broader traffic context. ```python manip_count = {'Manipulable': 0, 'Partial': 0, 'Non-manipulable': 0} ``` Initializes counters. ```python for i, (f, v) in enumerate(feature_importance[:15]): ``` Loops over top 15 SHAP features. ```python if f in manipulable: status = 'MANIPULABLE' manip_count['Manipulable'] += 1 ``` Classifies feature as manipulable. ```python elif f in partial: status = 'PARTIAL' manip_count['Partial'] += 1 ``` Classifies feature as partially manipulable. ```python else: status = 'NON-MANIPULABLE' manip_count['Non-manipulable'] += 1 ``` Otherwise classifies as non-manipulable. ```python print(...) ``` Prints feature, SHAP value, and manipulability status. ```python print(f'\nSummary: {manip_count}') ``` Prints count summary. ```python if manip_count['Non-manipulable'] > manip_count['Manipulable']: print('-> Model relies more on non-manipulable features -> MORE ROBUST against evasion') else: print('-> Model relies more on manipulable features -> LESS ROBUST against evasion') ``` Simple security conclusion. ### Mapping to project This implements **Analyze security implications**. --- # Cell 29 — Final Summary ```python print('\n' + '='*60) print('FINAL RESULTS SUMMARY') print('='*60) ``` Prints final report-style summary header. ```python print(f'\n1. MODEL COMPARISON:') ``` Starts model result section. ```python for name in ['mlp', 'lstm', 'cnn1d']: r = results[name] print(...) ``` Prints F1, ROC-AUC, and PR-AUC for all models. ```python print(f'\n2. EXPLANATION STABILITY (SAFARI):') ``` Starts stability section. ```python for eps in epsilons: sr = stability_results[eps] status = 'STABLE' if sr['pcc'] > 0.6 else 'UNSTABLE' print(...) ``` Prints SHAP stability for each epsilon. ```python print(f' LIME: Spearman={np.mean(lime_corrs):.4f} ...') ``` Prints LIME stability. ```python print(f'\n3. FAITHFULNESS:') ``` Starts faithfulness section. ```python for k in [3, 5, 10]: print(...) ``` Prints confidence drop after masking top-k SHAP features. ```python print(f'\n4. SECURITY: Top features manipulability = {manip_count}') ``` Prints security summary. ```python print('\nDone!') ``` End message. ### Mapping to project This cell packages all deliverable results: - model comparison, - stability, - faithfulness, - security analysis. --- # How the Whole Notebook Maps to the Project Requirements | Teacher requirement | Notebook cells | What was done | |---|---|---| | Train model | Cells 5–12 | Load data, preprocess, train MLP/LSTM/CNN, evaluate metrics | | Explain predictions | Cells 14–21 | SHAP and LIME explanations, feature rankings, local explanations | | Evaluate stability | Cells 23–26 | SHAP perturbation stability, LIME stochastic stability | | Analyze risks | Cell 28 | Feature manipulability and evasion risk analysis | | Expected output | Cell 29 + figures | Final result summary, plots, explanation/security analysis | --- # The Main Story You Should Understand The code starts with raw NSL-KDD network connection records. It converts them into numerical normalized feature vectors. Then it trains three neural IDS models and compares them. The LSTM performs best. After training, the notebook does not stop at accuracy. It asks: why did the model make its decisions? SHAP and LIME are used to identify important features. SHAP finds features like `logged_in` and error-rate statistics. LIME finds some overlapping but different features. Their low Spearman correlation shows that XAI methods can disagree. Then the notebook asks whether the explanations are reliable. SHAP is stable only for very small perturbations. LIME is borderline stable. Feature masking shows SHAP explanations are reasonably faithful because removing top SHAP features reduces prediction confidence. Finally, the code asks whether explanations are safe. If top features are manipulable by attackers, explanations can leak evasion strategies. The model relies on several non-manipulable or partially manipulable features, which is a positive sign, but explanation access should still be controlled. --- # Key Things to Say if Asked About the Code 1. The preprocessing avoids data leakage by fitting encoders/scalers on training data and transforming test data. 2. The three models are compared fairly because they use the same dataset, preprocessing, and training setup. 3. Weighted F1 is important because class distributions are not perfectly balanced and train/test distributions differ. 4. SHAP gives global and local feature importance. 5. LIME gives local surrogate explanations. 6. SHAP and LIME disagree, which is an important result, not a failure. 7. Stability is evaluated because explanations must be consistent to be trusted. 8. Faithfulness is evaluated because important explanation features should actually affect predictions. 9. Security analysis checks whether important features can be manipulated by attackers. 10. The whole project is not just IDS accuracy; it is IDS + explanation + reliability + security.