deep-learning-project / docs /Explainable_IDS_Code_Walkthrough.md

Add detailed notebook code walkthrough

6ba905b verified 13 days ago

48.8 kB

Explainable IDS Full Pipeline — Code Walkthrough

This document explains the notebook explainable_ids_full_pipeline.ipynb in very detailed practical terms. The goal is to understand what each line or block does, why it exists, and how it connects to the project deliverables:

Train an IDS model.
Explain IDS predictions.
Evaluate explanation stability and faithfulness.
Analyze security/adversarial risks.

The notebook is organized into seven main parts:

setup and imports,
dataset loading and preprocessing,
model definitions,
model training and evaluation,
SHAP explanations,
LIME explanations,
stability, faithfulness, and security analysis.

Big Picture Before Reading the Code

The project is an Explainable Intrusion Detection System (X-IDS).

The dataset is NSL-KDD, where each row is a network connection. Each connection has 41 features such as protocol, service, duration, bytes, login status, error rates, and host-level statistics. The target label is binary:

normal
anomaly

The notebook trains three neural models:

MLP: a standard feed-forward network for tabular data.
LSTM: treats the 41 features like a sequence.
1D-CNN: treats the 41 features like a one-dimensional signal.

Then it explains predictions using:

SHAP: feature contribution values based on Shapley values.
LIME: local surrogate explanations based on perturbations.

Then it asks:

Are explanations stable?
Are explanations faithful?
Are important features manipulable by attackers?

Cell 2 — Install Dependencies

!pip install -q torch numpy pandas scikit-learn datasets shap lime matplotlib scipy

What it does

This line installs all Python packages needed in Google Colab.

torch: PyTorch, used to build and train neural networks.
numpy: numerical arrays and mathematical operations.
pandas: table/dataframe manipulation.
scikit-learn: preprocessing and metrics.
datasets: Hugging Face library to load NSL-KDD.
shap: SHAP explanations.
lime: LIME explanations.
matplotlib: plots and figures.
scipy: statistics such as Pearson and Spearman correlations.

Why it matters

This prepares the environment. Without these libraries, the rest of the notebook cannot run.

Mapping to the project

This supports all tasks because it installs the tools for training, explaining, evaluating, and plotting.

Cell 3 — Imports, Reproducibility, and Device Setup

import os, sys, json, time, random, pickle

Imports standard Python utilities.

os, sys: system/file utilities.
json: could be used for saving structured results.
time: used to measure training time.
random: Python random generator.
pickle: can save/load Python objects.

import numpy as np

Imports NumPy as np. Almost all numerical arrays in preprocessing, SHAP, LIME, and metrics use NumPy.

import pandas as pd

Imports pandas as pd. The NSL-KDD dataset is converted to pandas DataFrames so we can manipulate columns easily.

import torch

Imports PyTorch main library.

import torch.nn as nn

Imports PyTorch neural-network module as nn. This is used for layers like Linear, LSTM, Conv1d, BatchNorm, Dropout, and CrossEntropyLoss.

from torch.utils.data import TensorDataset, DataLoader

Imports utilities to package arrays into datasets and mini-batches.

TensorDataset: wraps tensors (X, y) together.
DataLoader: creates batches for training and testing.

from sklearn.preprocessing import LabelEncoder, MinMaxScaler

Imports preprocessing tools.

LabelEncoder: converts categorical strings to integers.
MinMaxScaler: scales numerical features into [0, 1].

from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score, average_precision_score

Imports evaluation metrics.

classification_report: precision, recall, F1-score.
confusion_matrix: counts correct/incorrect predictions by class.
roc_auc_score: ROC-AUC ranking metric.
average_precision_score: PR-AUC / average precision.

from datasets import load_dataset

Imports Hugging Face dataset loader. Used to download/load NSL-KDD.

import shap

Imports SHAP explainability library.

from lime import lime_tabular

Imports LIME tabular explainer.

from scipy.stats import spearmanr, pearsonr

Imports statistical correlation functions.

spearmanr: rank correlation. Used for comparing feature rankings and LIME stability.
pearsonr: linear correlation. Used for SHAP perturbation stability.

import matplotlib.pyplot as plt

Imports plotting interface.

import warnings
warnings.filterwarnings('ignore')

Suppresses warning messages to keep the Colab output cleaner.

Reproducibility block

SEED = 42

Defines the random seed. A seed is a fixed starting point for randomness.

random.seed(SEED)

Fixes Python's built-in random generator.

np.random.seed(SEED)

Fixes NumPy randomness. This affects random sample selection for SHAP/LIME and stability tests.

torch.manual_seed(SEED)

Fixes PyTorch randomness, such as weight initialization and training randomness.

torch.backends.cudnn.deterministic = True

Forces deterministic CUDA operations where possible. This improves reproducibility.

torch.backends.cudnn.benchmark = False

Disables CuDNN benchmarking. Benchmarking can choose different algorithms depending on runtime conditions, which hurts reproducibility.

Device selection

DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Checks if a GPU is available. If yes, training uses CUDA GPU; otherwise it uses CPU.

print(f'Device: {DEVICE}')

Prints the selected device.

if DEVICE.type == 'cuda':
    print(f'GPU: {torch.cuda.get_device_name(0)}')

If running on GPU, prints the GPU name. In the final run it was Tesla T4.

Mapping to the project

This cell establishes reproducibility and compute setup. In an academic report, reproducibility is important because results should be repeatable.

Cell 5 — Feature Names, Dataset Loading, and Class Distribution

FEATURE_NAMES = [ ... ]

This list contains the 41 NSL-KDD feature names in the exact order used by the dataset and model.

The list is not just cosmetic. It is needed for:

selecting feature columns from the DataFrame,
preserving consistent input order,
labeling SHAP plots,
labeling LIME explanations,
interpreting security implications.

Lines 1–16 — NSL-KDD features

The features include:

Basic connection features: duration, protocol_type, service, flag, src_bytes, dst_bytes.
Content features: hot, num_failed_logins, logged_in, root_shell, etc.
Time-based traffic features: count, srv_count, serror_rate, rerror_rate, etc.
Host-based traffic features: dst_host_count, dst_host_srv_count, dst_host_* rates.

Why this matters: later, when SHAP says logged_in is important, we know exactly which IDS feature influenced the model.

CATEGORICAL_COLS = ['protocol_type', 'service', 'flag']

Defines the three categorical columns. These contain strings, not numbers, so they must be encoded before feeding them into neural networks.

ds = load_dataset('Mireu-Lab/NSL-KDD')

Loads NSL-KDD from Hugging Face.

df_train = ds['train'].to_pandas()
df_test = ds['test'].to_pandas()

Converts train and test splits into pandas DataFrames. Pandas makes column operations easier.

print(f'Train: {len(df_train)} | Test: {len(df_test)}')

Prints dataset sizes.

Final output:

Train: 151,165
Test: 34,394

print('\nTrain distribution:')
print(df_train['class'].value_counts())

Prints how many normal/anomaly samples exist in training.

print('\nTest distribution:')
print(df_test['class'].value_counts())

Prints class distribution in the test set.

Why class distribution matters

The train and test distributions are different:

Train has more normal than anomaly.
Test has more anomaly than normal.

This matters because the model must generalize under distribution shift.

Mapping to project

This cell supports the dataset understanding part of the report. It proves what data we used and shows imbalance/distribution shift.

Cell 6 — Target Encoding, Categorical Encoding, and Scaling

# Encode target (binary: anomaly=0, normal=1)

Comment explaining the binary label setup.

class_names = ['anomaly', 'normal']

Defines readable class names. This is used later in classification reports and LIME explanations.

le_y = LabelEncoder()

Creates a label encoder for target labels.

y_train = le_y.fit_transform(df_train['class'].values)

Fits the encoder on the training labels and transforms them into integers.

In this dataset, the final encoding is:

anomaly = 0
normal = 1

y_test = le_y.transform(df_test['class'].values)

Transforms test labels using the same encoder learned from training.

Important: we do not fit on test labels, because the test set must remain unseen.

df_tr, df_te = df_train.copy(), df_test.copy()

Creates copies of the train and test DataFrames so original data remains unchanged.

label_encoders = {}

Creates a dictionary to store encoders for each categorical feature.

for col in CATEGORICAL_COLS:

Loops over the categorical columns: protocol_type, service, flag.

    le = LabelEncoder()

Creates a new encoder for the current categorical column.

    le.fit(df_tr[col])

Fits the encoder only on training categories.

    known = set(le.classes_)

Stores categories seen during training.

    df_te[col] = df_te[col].apply(lambda x: x if x in known else le.classes_[0])

Handles possible unknown categories in test data. If a test category was not seen during training, it is replaced by the first known class.

Why: LabelEncoder cannot transform unseen labels. This prevents runtime errors.

    df_tr[col] = le.transform(df_tr[col])

Transforms training categorical values into integers.

    df_te[col] = le.transform(df_te[col])

Transforms test categorical values using the same encoder.

    label_encoders[col] = le

Stores the encoder for later inspection or inverse transformation.

    print(f'Encoded {col}: {len(le.classes_)} categories')

Prints how many categories each column has.

Final output:

protocol_type: 3 categories
service: 70 categories
flag: 11 categories

Scaling

scaler = MinMaxScaler()

Creates a scaler that maps each feature to [0, 1].

X_train = scaler.fit_transform(df_tr[FEATURE_NAMES].values.astype(np.float32))

Takes training features, converts them to float32, fits the scaler on training data, and transforms training features.

Important: fit only on training data.

X_test = scaler.transform(df_te[FEATURE_NAMES].values.astype(np.float32))

Transforms test features using the training scaler.

Again, no fitting on test data to avoid data leakage.

print(f'\nX_train: {X_train.shape} | X_test: {X_test.shape}')

Prints feature matrix shapes.

Final output:

X_train: (151165, 41)
X_test: (34394, 41)

print(f'y_train: {np.bincount(y_train)} | y_test: {np.bincount(y_test)}')

Prints encoded class counts.

Why this cell is essential

Neural networks cannot directly process strings or unscaled heterogeneous features. This cell converts the raw dataset into clean numerical tensors.

Mapping to project

This is the preprocessing pipeline in the report.

Cell 8 — Model Definitions

This cell defines the three deep learning models.

MLP_IDS

class MLP_IDS(nn.Module):

Defines a PyTorch class for the MLP model. It inherits from nn.Module, which is required for PyTorch models.

    def __init__(self, in_dim=41, num_classes=2):

Constructor. Input dimension is 41 because NSL-KDD has 41 features. Number of classes is 2: anomaly and normal.

        super().__init__()

Initializes the parent PyTorch module.

        self.net = nn.Sequential(

Creates a sequence of layers that will run one after another.

            nn.Linear(in_dim, 256), nn.BatchNorm1d(256), nn.ReLU(), nn.Dropout(0.3),

First hidden block:

Linear(41, 256): maps 41 input features to 256 hidden units.
BatchNorm1d(256): stabilizes hidden activations.
ReLU(): adds non-linearity.
Dropout(0.3): randomly drops 30% of activations during training to reduce overfitting.

            nn.Linear(256, 128), nn.BatchNorm1d(128), nn.ReLU(), nn.Dropout(0.2),

Second hidden block. Reduces representation from 256 to 128.

            nn.Linear(128, 64), nn.ReLU(),

Third hidden block. Reduces from 128 to 64.

            nn.Linear(64, num_classes)

Output layer. Produces two logits: one for anomaly and one for normal.

Ends the sequential model.

        for m in self.modules():

Loops through all modules/layers inside the model.

            if isinstance(m, nn.Linear):

Checks if the current module is a linear layer.

                nn.init.xavier_uniform_(m.weight)

Initializes weights using Xavier uniform initialization. This helps gradients flow well at the start of training.

                nn.init.zeros_(m.bias)

Initializes biases to zero.

    def forward(self, x): return self.net(x)

Defines the forward pass. Input x passes through self.net.

    def count_parameters(self): return sum(p.numel() for p in self.parameters() if p.requires_grad)

Counts trainable parameters. Used for reporting model size.

Why MLP is used

MLP is the simplest strong baseline for tabular data. If a complex model beats the MLP, that suggests the extra architecture has value.

LSTM_IDS

class LSTM_IDS(nn.Module):

Defines the LSTM model class.

    def __init__(self, in_dim=41, hidden_dim=64, num_layers=2, num_classes=2):

Constructor. It uses 41 features, hidden size 64, 2 LSTM layers, and 2 output classes.

        super().__init__()

Initializes parent module.

        self.lstm = nn.LSTM(1, hidden_dim, num_layers, batch_first=True, dropout=0.2)

Creates an LSTM.

Important detail: each feature is treated as one timestep with one value. So input shape becomes:

batch_size × 41 × 1

input_size=1: each timestep contains one feature value.
hidden_dim=64: LSTM hidden representation size.
num_layers=2: stacked LSTM layers.
batch_first=True: batch dimension comes first.
dropout=0.2: dropout between LSTM layers.

        self.fc = nn.Sequential(nn.Linear(hidden_dim, 32), nn.ReLU(), nn.Linear(32, num_classes))

Creates a small classifier after the LSTM.

64 hidden state → 32 hidden units → 2 output classes.

    def forward(self, x):

Defines forward pass.

        out, (h_n, _) = self.lstm(x.unsqueeze(-1))

x originally has shape:

batch_size × 41

x.unsqueeze(-1) changes it to:

batch_size × 41 × 1

The LSTM returns:

out: output at all timesteps.
h_n: final hidden states.
_: cell states, ignored.

        return self.fc(h_n[-1])

Uses the final hidden state from the last LSTM layer and feeds it into the classifier.

    def count_parameters(self): return sum(p.numel() for p in self.parameters() if p.requires_grad)

Counts trainable parameters.

Why LSTM is used

Even though NSL-KDD is not a time series, the features have an order and groups. LSTM may learn dependencies across these feature groups.

CNN1D_IDS

class CNN1D_IDS(nn.Module):

Defines the 1D-CNN model.

    def __init__(self, in_dim=41, num_classes=2):

Constructor with 41 input features and 2 output classes.

        super().__init__()

Initializes parent module.

        self.conv = nn.Sequential(

Creates convolutional feature extractor.

            nn.Conv1d(1, 64, 3, padding=1), nn.BatchNorm1d(64), nn.ReLU(),

First convolution block:

input channels = 1,
output channels = 64,
kernel size = 3,
padding = 1 keeps length 41.

This learns local patterns across neighboring features.

            nn.Conv1d(64, 128, 3, padding=1), nn.BatchNorm1d(128), nn.ReLU(),

Second convolution block, increasing channels from 64 to 128.

            nn.AdaptiveAvgPool1d(8)

Compresses the sequence length to 8, regardless of input length.

Ends convolution block.

        self.fc = nn.Sequential(nn.Linear(128*8, 64), nn.ReLU(), nn.Dropout(0.2), nn.Linear(64, num_classes))

Classifier after convolution:

Flattened size = 128 channels × 8 pooled positions.
Dense layer to 64.
ReLU.
Dropout.
Output layer to 2 classes.

    def forward(self, x):

Defines forward pass.

        x = self.conv(x.unsqueeze(1))

Original x shape is:

batch_size × 41

x.unsqueeze(1) gives:

batch_size × 1 × 41

This is the format Conv1d expects.

        return self.fc(x.view(x.size(0), -1))

Flattens convolution output and feeds it to classifier.

    def count_parameters(self): return sum(p.numel() for p in self.parameters() if p.requires_grad)

Counts parameters.

Final model loop

for name, cls in [('MLP', MLP_IDS), ('LSTM', LSTM_IDS), ('CNN1D', CNN1D_IDS)]:

Loops over the three model classes.

    m = cls()

Instantiates each model.

    print(f'{name}: {m.count_parameters():,} parameters')

Prints model parameter counts.

Mapping to project

This cell implements the Train model requirement and sets up model comparison.

Cell 10 — Training All Models

This is the largest and most important training cell.

EPOCHS = 50
BATCH_SIZE = 256
LR = 1e-3

Defines training hyperparameters:

train for 50 epochs,
use mini-batches of 256 samples,
learning rate is 0.001.

train_ds = TensorDataset(torch.FloatTensor(X_train), torch.LongTensor(y_train))

Converts training NumPy arrays into PyTorch tensors and bundles features/labels together.

Features become float tensors.
Labels become long integer tensors required by CrossEntropyLoss.

test_ds = TensorDataset(torch.FloatTensor(X_test), torch.LongTensor(y_test))

Same for test data.

train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True)

Creates mini-batches for training and shuffles data each epoch.

test_loader = DataLoader(test_ds, batch_size=BATCH_SIZE)

Creates test batches. No shuffle is needed because evaluation order does not matter.

Class weights

counts = np.bincount(y_train)

Counts how many examples exist per class.

weights = 1.0 / counts.astype(np.float32)

Creates inverse-frequency weights. Smaller classes get larger weight.

weights = weights / weights.sum() * len(weights)

Normalizes weights so their average scale is reasonable.

class_weights = torch.FloatTensor(weights).to(DEVICE)

Converts weights to PyTorch tensor and moves them to GPU/CPU.

Why class weights?

Class imbalance can make the model favor the majority class. Weighted loss penalizes mistakes on underrepresented classes more.

train_model function

def train_model(model, model_name):

Defines a reusable function to train any of the three models.

    print(...)

Prints a header showing which model is being trained.

    model.to(DEVICE)

Moves model to GPU or CPU.

    criterion = nn.CrossEntropyLoss(weight=class_weights)

Defines classification loss with class weights.

CrossEntropyLoss expects raw logits, so the model does not need Softmax during training.

    optimizer = torch.optim.Adam(model.parameters(), lr=LR, weight_decay=1e-4)

Creates Adam optimizer.

lr=1e-3: learning rate.
weight_decay=1e-4: L2 regularization to reduce overfitting.

    scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=5, factor=0.5)

Creates learning-rate scheduler. If loss plateaus for 5 epochs, learning rate is halved.

    best_f1, history = 0, {'train_loss': [], 'test_acc': []}

Initializes best F1 and stores training history.

    best_state = None

Will store the best model weights.

    t0 = time.time()

Starts timing training.

    for epoch in range(EPOCHS):

Training loop over 50 epochs.

        model.train()

Sets model to training mode. Enables dropout and batchnorm training behavior.

        total_loss = 0

Initializes epoch loss accumulator.

        for xb, yb in train_loader:

Loops over training mini-batches.

            xb, yb = xb.to(DEVICE), yb.to(DEVICE)

Moves batch to GPU/CPU.

            optimizer.zero_grad()

Clears old gradients.

            loss = criterion(model(xb), yb)

Runs model forward pass and computes cross-entropy loss.

            loss.backward()

Backpropagates gradients.

            optimizer.step()

Updates model weights.

            total_loss += loss.item() * len(yb)

Adds weighted batch loss to epoch loss.

Evaluation inside each epoch

        model.eval()

Sets model to evaluation mode. Dropout is disabled, batchnorm uses learned statistics.

        preds, probs, labels = [], [], []

Creates lists to collect predictions, probabilities, and labels.

        with torch.no_grad():

Disables gradient computation to save memory and speed up evaluation.

            for xb, yb in test_loader:

Loops through test batches.

                xb = xb.to(DEVICE)

Moves features to GPU/CPU.

                out = model(xb)

Gets raw logits.

                preds.append(out.argmax(1).cpu().numpy())

Predicted class is the index of the largest logit.

                probs.append(torch.softmax(out, 1).cpu().numpy())

Converts logits to class probabilities.

                labels.append(yb.numpy())

Stores true labels.

        preds = np.concatenate(preds)
        probs = np.concatenate(probs)
        labels = np.concatenate(labels)

Combines batch arrays into full test arrays.

        report = classification_report(labels, preds, output_dict=True)

Computes precision, recall, F1, etc.

        wf1 = report['weighted avg']['f1-score']

Extracts weighted F1-score.

        acc = report['accuracy']

Extracts accuracy.

        test_loss = total_loss / len(y_train)

Despite the variable name, this is actually average training loss for the epoch.

        scheduler.step(test_loss)

Updates scheduler based on loss.

        history['train_loss'].append(total_loss / len(y_train))
        history['test_acc'].append(acc)

Stores loss and accuracy for plots.

        if wf1 > best_f1:

Checks if current model is best so far.

            best_f1 = wf1

Updates best F1.

            best_state = {k: v.cpu().clone() for k, v in model.state_dict().items()}

Saves a copy of best model weights on CPU.

        if (epoch+1) % 10 == 0 or epoch == 0:

Prints progress at epoch 1 and every 10 epochs.

            print(...)

Shows epoch, loss, accuracy, and F1.

Final evaluation

    dt = time.time() - t0

Measures total training time.

    model.load_state_dict(best_state)

Restores best model weights.

    model.eval()

Sets evaluation mode.

The next block repeats final evaluation on the test set to compute final metrics.

    roc = roc_auc_score(labels, probs[:, 1])

Computes ROC-AUC using probability of class 1 (normal).

    pr = average_precision_score(labels, probs[:, 1])

Computes PR-AUC / average precision.

    print(...)
    print(classification_report(...))
    print(confusion_matrix(...))

Prints final metrics, per-class report, and confusion matrix.

    return model, {...}

Returns trained model and result dictionary.

Training all models

models = {}
results = {}

Creates dictionaries to store models and results.

for name, cls in [('mlp', MLP_IDS), ('lstm', LSTM_IDS), ('cnn1d', CNN1D_IDS)]:

Loops over model classes.

    models[name], results[name] = train_model(cls(), name.upper())

Instantiates, trains, and stores each model.

Mapping to project

This cell implements the Train model part and produces the model comparison results.

Cells 11 and 12 — Model Summary and Training Curves

Cell 11

print(f'{"Model":<8} {"Params":>8} {"W-F1":>8} {"ROC-AUC":>9} {"PR-AUC":>8} {"Time":>8}')

Prints table header.

print('-'*50)

Prints separator line.

for name in ['mlp', 'lstm', 'cnn1d']:

Loops over the three trained models.

    r = results[name]

Gets metric dictionary.

    p = models[name].count_parameters()

Gets parameter count.

    print(...)

Prints model name, parameters, F1, ROC-AUC, PR-AUC, and time.

Why this matters

This is the main quantitative result table in the report.

Cell 12

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

Creates two side-by-side plots.

for name in ['mlp', 'lstm', 'cnn1d']:

Loops over models.

    axes[0].plot(results[name]['history']['train_loss'], label=name.upper())

Plots training loss over epochs.

    axes[1].plot(results[name]['history']['test_acc'], label=name.upper())

Plots test accuracy over epochs.

axes[0].set_xlabel(...); ...

Labels first plot.

axes[1].set_xlabel(...); ...

Labels second plot.

plt.tight_layout(); plt.show()

Adjusts spacing and displays plots.

Mapping to project

These plots support training analysis and make the report/presentation visual.

Cell 14 — SHAP Setup and SHAP Value Computation

mlp_cpu = models['mlp'].cpu().eval()

Moves the trained MLP to CPU and sets evaluation mode.

Why MLP? The project uses MLP for SHAP explanation because it is a clean tabular baseline and easier to explain consistently.

def predict_fn(X):

Defines a prediction wrapper for SHAP and LIME.

    with torch.no_grad():

No gradients are needed for explanation queries.

        return torch.softmax(mlp_cpu(torch.FloatTensor(X)), 1).numpy()

Converts NumPy input to PyTorch tensor, runs the MLP, applies Softmax, and returns probabilities.

SHAP and LIME need a function that takes NumPy arrays and returns prediction probabilities.

bg_idx = np.random.choice(len(X_train), 100, replace=False)

Randomly selects 100 training samples as SHAP background data.

Background data represents the baseline distribution.

exp_idx = np.random.choice(len(X_test), 150, replace=False)

Randomly selects 150 test samples to explain.

Why sample? Kernel SHAP is expensive; explaining the entire test set would take too long.

explainer = shap.KernelExplainer(predict_fn, X_train[bg_idx])

Creates a model-agnostic SHAP explainer using prediction function and background data.

print('Computing SHAP values...')

Progress message.

shap_values_raw = explainer.shap_values(X_test[exp_idx], nsamples=200, silent=True)

Computes SHAP values for 150 test samples using 200 samples for approximation.

if isinstance(shap_values_raw, list):
    shap_vals_anomaly = shap_values_raw[0]
elif shap_values_raw.ndim == 3:
    shap_vals_anomaly = shap_values_raw[:, :, 0]
else:
    shap_vals_anomaly = shap_values_raw

Handles different SHAP library output formats.

Older SHAP returns a list per class.
Newer SHAP may return a 3D array.
The code extracts class 0: anomaly.

print(f'Done! Shape: {shap_vals_anomaly.shape}')

Prints SHAP array shape. Final shape is (150, 41).

Mapping to project

This is the core of Explain predictions using SHAP.

Cells 15–18 — SHAP Global and Local Explanations

Cell 15 — Feature Importance

mean_abs_shap = np.abs(shap_vals_anomaly).mean(axis=0)

Takes absolute SHAP values and averages across samples.

Why absolute value? We care about magnitude of influence, regardless of direction.

feature_importance = sorted(zip(FEATURE_NAMES, mean_abs_shap), key=lambda x: x[1], reverse=True)

Pairs each feature name with its importance and sorts descending.

print('Top 15 features...')

Prints heading.

for i, (f, v) in enumerate(feature_importance[:15]):
    print(...)

Prints top 15 features.

Cell 16 — SHAP Summary Plot

shap.summary_plot(shap_vals_anomaly, X_test[exp_idx], feature_names=FEATURE_NAMES, max_display=15)

Creates SHAP summary plot.

It shows:

feature importance,
direction of feature effect,
distribution of SHAP values,
top 15 features.

Cell 17 — SHAP Bar Plot

plt.figure(figsize=(10, 6))

Creates figure.

top15 = feature_importance[:15]

Selects top 15 SHAP features.

plt.barh(range(15), [v for _, v in top15][::-1], color='steelblue')

Draws horizontal bar chart. [::-1] reverses order so most important appears at top visually.

plt.yticks(range(15), [f for f, _ in top15][::-1])

Labels bars with feature names.

plt.xlabel('Mean |SHAP value|')
plt.title('Top 15 Features — MLP (Anomaly Class)')
plt.tight_layout(); plt.show()

Adds labels, title, and displays plot.

Cell 18 — Local SHAP Explanation

idx = 0

Selects first explained test sample.

pred = predict_fn(X_test[exp_idx[idx:idx+1]])

Gets prediction probabilities for that sample.

print(f'Sample prediction: anomaly={pred[0][0]:.3f}, normal={pred[0][1]:.3f}')

Prints predicted probabilities.

print(f'True label: {class_names[y_test[exp_idx[idx]]]}')

Prints true label.

ev = explainer.expected_value

Gets SHAP baseline expected output.

ev0 = ev[0] if isinstance(ev, (list, np.ndarray)) else ev

Handles expected value format.

shap.force_plot(ev0, shap_vals_anomaly[idx], X_test[exp_idx[idx]], feature_names=FEATURE_NAMES, matplotlib=True)

Creates force plot for one prediction.

Mapping to project

These cells produce the explanation analysis deliverable.

Cell 20 — LIME Explanation Analysis

lime_explainer = lime_tabular.LimeTabularExplainer(...)

Creates a LIME explainer for tabular data.

X_train

Training data is used by LIME to understand feature distributions.

feature_names=FEATURE_NAMES

Gives readable feature names.

class_names=class_names

Gives readable class labels.

discretize_continuous=True

LIME bins continuous features into intervals, making explanations more interpretable.

random_state=SEED

Makes LIME sampling reproducible.

n_lime = 30

Number of test samples to explain.

lime_idx = np.random.choice(len(X_test), n_lime, replace=False)

Randomly selects 30 test samples.

all_top_features = {}

Dictionary to count how often each feature appears in LIME top explanations.

for i, idx in enumerate(lime_idx):

Loops over selected samples.

exp = lime_explainer.explain_instance(X_test[idx], predict_fn, num_features=10, top_labels=1)

Generates a LIME explanation for one sample.

num_features=10: keep top 10 features.
top_labels=1: explain predicted class.

pred_class = np.argmax(predict_fn(X_test[idx].reshape(1, -1)))

Gets predicted class for that sample.

for fw in exp.as_list(label=pred_class):

Loops over feature-weight pairs in the LIME explanation.

fname = fw[0].split(' ')[0]

Extracts feature name from LIME's text rule.

all_top_features[fname] = all_top_features.get(fname, 0) + 1

Counts how often this feature appears.

if (i+1) % 10 == 0:
    print(...)

Progress every 10 samples.

lime_sorted = sorted(all_top_features.items(), key=lambda x: x[1], reverse=True)

Sorts features by frequency.

for f, c in lime_sorted[:10]:
    print(...)

Prints top 10 LIME features.

Mapping to project

This implements the Apply explainability task using LIME.

Cell 21 — SHAP vs LIME Comparison

fig, axes = plt.subplots(1, 2, figsize=(16, 6))

Creates two side-by-side plots.

top10_shap = feature_importance[:10]

Gets top 10 SHAP features.

axes[0].barh(...)

Plots SHAP top 10.

top10_lime = lime_sorted[:10]

Gets top 10 LIME features.

axes[1].barh(...)

Plots LIME top 10.

plt.suptitle('SHAP vs LIME Feature Rankings', fontsize=14)
plt.tight_layout(); plt.show()

Displays comparison plot.

Rank correlation

shap_ranks = {f: i for i, (f, _) in enumerate(feature_importance[:20])}

Creates dictionary mapping SHAP feature to rank.

lime_ranks = {f: i for i, (f, _) in enumerate(lime_sorted[:20])}

Creates dictionary mapping LIME feature to rank.

common = set(shap_ranks.keys()) & set(lime_ranks.keys())

Finds features appearing in both top-20 lists.

if len(common) >= 5:

Only compute correlation if enough overlap exists.

rho, p = spearmanr([...], [...])

Computes Spearman rank correlation between SHAP and LIME rankings.

print(...)

Prints result.

Final result:

Spearman correlation = 0.0714
p = 0.8665

Interpretation

SHAP and LIME disagree strongly. This is a key finding: explanations depend on method choice.

Cell 23 — SHAP Stability Evaluation

def compute_shap_stability(explainer, sample, epsilon, n_perturbs=10):

Defines function to evaluate how stable SHAP is under perturbations.

    rng = np.random.RandomState(SEED)

Creates deterministic random generator.

    base = np.array(explainer.shap_values(sample.reshape(1,-1), nsamples=100, silent=True))

Computes original SHAP explanation for the sample.

    base = base[0].flatten() if isinstance(base, list) else base.flatten()

Flattens SHAP values into one vector.

    max_delta, pccs = 0, []

Initializes maximum explanation change and list of correlations.

    for _ in range(n_perturbs):

Repeats perturbation several times.

        noise = rng.uniform(-epsilon, epsilon, sample.shape)

Creates random noise bounded by epsilon.

        perturbed = np.clip(sample + noise, 0, 1)

Adds noise and clips features to valid [0,1] range.

        p_shap = np.array(explainer.shap_values(perturbed.reshape(1,-1), nsamples=100, silent=True))

Computes SHAP explanation for perturbed sample.

        p_shap = p_shap[0].flatten() if isinstance(p_shap, list) else p_shap.flatten()

Flattens perturbed explanation.

        max_delta = max(max_delta, np.linalg.norm(p_shap - base))

Computes explanation shift magnitude and keeps maximum. This is SENS_MAX.

        if np.std(base) > 1e-8 and np.std(p_shap) > 1e-8:

Avoids correlation if vector has near-zero variance.

            pccs.append(pearsonr(base, p_shap)[0])

Computes Pearson correlation between original and perturbed SHAP values.

    return max_delta, np.mean(pccs) if pccs else 0.0

Returns SENS_MAX and average PCC.

Running the test

epsilons = [0.01, 0.03, 0.05]

Perturbation sizes.

n_stability = 8

Number of samples used for stability test.

stability_idx = np.random.choice(len(X_test), n_stability, replace=False)

Randomly selects test samples.

stability_results = {}

Stores results.

for eps in epsilons:

Loops over perturbation sizes.

    sens_list, pcc_list = [], []

Stores metrics per sample.

    for i, idx in enumerate(stability_idx):

Loops over selected samples.

        sm, pc = compute_shap_stability(...)

Computes SENS_MAX and PCC.

        sens_list.append(sm); pcc_list.append(pc)

Stores results.

    stability_results[eps] = {'sens_max': np.mean(sens_list), 'pcc': np.mean(pcc_list)}

Stores average metrics.

    status = 'STABLE' if np.mean(pcc_list) > 0.6 else 'UNSTABLE'

Classifies explanation stability using threshold 0.6.

Mapping to project

This implements Evaluate explanation stability.

Cell 24 — LIME Stochastic Stability

This evaluates whether LIME gives consistent explanations when run multiple times.

lime_corrs = []

Stores average correlation per sample.

for i, idx in enumerate(stability_idx[:6]):

Uses first 6 stability samples.

    weight_vecs = []

Stores LIME weight vectors from different seeds.

    for seed in range(10):

Runs LIME 10 times with different seeds.

        le_obj = lime_tabular.LimeTabularExplainer(..., random_state=seed)

Creates a new LIME explainer with a different random seed.

        exp = le_obj.explain_instance(..., num_features=len(FEATURE_NAMES))

Explains the sample using all features.

        w = np.zeros(len(FEATURE_NAMES))

Creates a zero vector of feature weights.

        for key, val in dict(exp.as_list()).items():

Loops over LIME explanation terms.

            for j, fn in enumerate(FEATURE_NAMES):
                if fn in key: w[j] = val; break

Maps LIME text rules back to feature indices.

        weight_vecs.append(w)

Stores one explanation vector.

    corrs = []

Stores pairwise correlations.

    for a in range(10):
        for b in range(a+1, 10):

Compares all pairs of the 10 runs.

            if np.std(weight_vecs[a]) > 1e-8 and np.std(weight_vecs[b]) > 1e-8:

Avoids invalid correlation.

                corrs.append(spearmanr(weight_vecs[a], weight_vecs[b])[0])

Computes Spearman correlation between two LIME runs.

    mc = np.mean(corrs) if corrs else 0

Mean correlation for this sample.

    lime_corrs.append(mc)

Stores it.

lime_status = 'STABLE' if np.mean(lime_corrs) > 0.6 else 'UNSTABLE'

Classifies LIME stability.

Mapping to project

This tests whether LIME explanations are reliable despite LIME randomness.

Cell 25 — Faithfulness Evaluation

Faithfulness asks: do the important features actually matter to the model?

def get_shap_for_class(shap_values, class_idx=0):

Helper function for SHAP output formats.

    if isinstance(shap_values, list):
        return shap_values[class_idx]

Older SHAP format.

    elif isinstance(shap_values, np.ndarray) and shap_values.ndim == 3:
        return shap_values[:, :, class_idx]

Newer SHAP 3D format.

    else:
        return shap_values

Fallback.

faith_results = {k: [] for k in [3, 5, 10]}

Creates result lists for top-3, top-5, and top-10 feature masking.

for idx in stability_idx[:10]:

Loops over up to 10 samples.

    sample = X_test[idx]

Gets one test sample.

    sv_raw = explainer.shap_values(sample.reshape(1,-1), nsamples=100, silent=True)

Computes SHAP values.

    sv = get_shap_for_class(sv_raw, 0).flatten()

Extracts anomaly-class SHAP vector.

    base_conf = predict_fn(sample.reshape(1,-1))[0]

Gets original prediction probabilities.

    pred_cls = np.argmax(base_conf)

Gets predicted class.

    for k in faith_results:

Loops over k = 3, 5, 10.

        masked = sample.copy()

Copies sample.

        masked[np.argsort(np.abs(sv))[-k:]] = 0.0

Finds top-k absolute SHAP features and masks them by setting to 0.

        drop = base_conf[pred_cls] - predict_fn(masked.reshape(1,-1))[0][pred_cls]

Measures confidence drop after masking.

        faith_results[k].append(float(drop))

Stores confidence drop.

for k, scores in faith_results.items():
    print(...)

Prints average and standard deviation.

Mapping to project

This implements Evaluate explanation faithfulness.

Cell 26 — Stability Summary Plot

fig, axes = plt.subplots(1, 3, figsize=(16, 5))

Creates three plots side by side.

eps_list = list(stability_results.keys())

Gets epsilon values.

axes[0].plot(eps_list, [stability_results[e]['sens_max'] for e in eps_list], ...)

Plots SENS_MAX vs epsilon.

pcc_vals = [stability_results[e]['pcc'] for e in eps_list]

Gets PCC values.

colors = ['green' if p > 0.6 else 'red' for p in pcc_vals]

Green bars for stable, red for unstable.

axes[1].bar(...)

Plots PCC stability bars.

axes[1].axhline(y=0.6, ...)

Draws stability threshold line.

ks = list(faith_results.keys())

Gets masking sizes 3, 5, 10.

axes[2].bar(...)

Plots faithfulness confidence drop with error bars.

plt.suptitle(...)
plt.tight_layout(); plt.show()

Adds title and displays.

Mapping to project

This creates the figure used to summarize explanation reliability.

Cell 28 — Security Implications / Feature Manipulability

manipulable = {...}

Defines features that attackers may directly influence.

Examples:

src_bytes
dst_bytes
hot
num_failed_logins
duration

partial = {...}

Defines partially manipulable features.

Examples:

count
srv_count
serror_rate
rerror_rate
protocol_type
flag

These can sometimes be influenced but not freely controlled.

non_manip = {...}

Defines non-manipulable features such as host-level aggregated statistics.

Examples:

dst_host_count
dst_host_srv_count
dst_host_rerror_rate
dst_host_serror_rate

These are computed by IDS sensors or depend on broader traffic context.

manip_count = {'Manipulable': 0, 'Partial': 0, 'Non-manipulable': 0}

Initializes counters.

for i, (f, v) in enumerate(feature_importance[:15]):

Loops over top 15 SHAP features.

    if f in manipulable:
        status = 'MANIPULABLE'
        manip_count['Manipulable'] += 1

Classifies feature as manipulable.

    elif f in partial:
        status = 'PARTIAL'
        manip_count['Partial'] += 1

Classifies feature as partially manipulable.

    else:
        status = 'NON-MANIPULABLE'
        manip_count['Non-manipulable'] += 1

Otherwise classifies as non-manipulable.

    print(...)

Prints feature, SHAP value, and manipulability status.

print(f'\nSummary: {manip_count}')

Prints count summary.

if manip_count['Non-manipulable'] > manip_count['Manipulable']:
    print('-> Model relies more on non-manipulable features -> MORE ROBUST against evasion')
else:
    print('-> Model relies more on manipulable features -> LESS ROBUST against evasion')

Simple security conclusion.

Mapping to project

This implements Analyze security implications.

Cell 29 — Final Summary

print('\n' + '='*60)
print('FINAL RESULTS SUMMARY')
print('='*60)

Prints final report-style summary header.

print(f'\n1. MODEL COMPARISON:')

Starts model result section.

for name in ['mlp', 'lstm', 'cnn1d']:
    r = results[name]
    print(...)

Prints F1, ROC-AUC, and PR-AUC for all models.

print(f'\n2. EXPLANATION STABILITY (SAFARI):')

Starts stability section.

for eps in epsilons:
    sr = stability_results[eps]
    status = 'STABLE' if sr['pcc'] > 0.6 else 'UNSTABLE'
    print(...)

Prints SHAP stability for each epsilon.

print(f'   LIME: Spearman={np.mean(lime_corrs):.4f} ...')

Prints LIME stability.

print(f'\n3. FAITHFULNESS:')

Starts faithfulness section.

for k in [3, 5, 10]:
    print(...)

Prints confidence drop after masking top-k SHAP features.

print(f'\n4. SECURITY: Top features manipulability = {manip_count}')

Prints security summary.

print('\nDone!')

End message.

Mapping to project

This cell packages all deliverable results:

model comparison,
stability,
faithfulness,
security analysis.

How the Whole Notebook Maps to the Project Requirements

Teacher requirement	Notebook cells	What was done
Train model	Cells 5–12	Load data, preprocess, train MLP/LSTM/CNN, evaluate metrics
Explain predictions	Cells 14–21	SHAP and LIME explanations, feature rankings, local explanations
Evaluate stability	Cells 23–26	SHAP perturbation stability, LIME stochastic stability
Analyze risks	Cell 28	Feature manipulability and evasion risk analysis
Expected output	Cell 29 + figures	Final result summary, plots, explanation/security analysis

The Main Story You Should Understand

The code starts with raw NSL-KDD network connection records. It converts them into numerical normalized feature vectors. Then it trains three neural IDS models and compares them. The LSTM performs best.

After training, the notebook does not stop at accuracy. It asks: why did the model make its decisions? SHAP and LIME are used to identify important features. SHAP finds features like logged_in and error-rate statistics. LIME finds some overlapping but different features. Their low Spearman correlation shows that XAI methods can disagree.

Then the notebook asks whether the explanations are reliable. SHAP is stable only for very small perturbations. LIME is borderline stable. Feature masking shows SHAP explanations are reasonably faithful because removing top SHAP features reduces prediction confidence.

Finally, the code asks whether explanations are safe. If top features are manipulable by attackers, explanations can leak evasion strategies. The model relies on several non-manipulable or partially manipulable features, which is a positive sign, but explanation access should still be controlled.

Key Things to Say if Asked About the Code

The preprocessing avoids data leakage by fitting encoders/scalers on training data and transforming test data.
The three models are compared fairly because they use the same dataset, preprocessing, and training setup.
Weighted F1 is important because class distributions are not perfectly balanced and train/test distributions differ.
SHAP gives global and local feature importance.
LIME gives local surrogate explanations.
SHAP and LIME disagree, which is an important result, not a failure.
Stability is evaluated because explanations must be consistent to be trusted.
Faithfulness is evaluated because important explanation features should actually affect predictions.
Security analysis checks whether important features can be manipulated by attackers.
The whole project is not just IDS accuracy; it is IDS + explanation + reliability + security.