File size: 48,783 Bytes

6ba905b

# Explainable IDS Full Pipeline — Code Walkthrough

This document explains the notebook `explainable_ids_full_pipeline.ipynb` in very detailed practical terms. The goal is to understand **what each line or block does**, **why it exists**, and **how it connects to the project deliverables**:

1. Train an IDS model.
2. Explain IDS predictions.
3. Evaluate explanation stability and faithfulness.
4. Analyze security/adversarial risks.

The notebook is organized into seven main parts:

- setup and imports,
- dataset loading and preprocessing,
- model definitions,
- model training and evaluation,
- SHAP explanations,
- LIME explanations,
- stability, faithfulness, and security analysis.

---

## Big Picture Before Reading the Code

The project is an **Explainable Intrusion Detection System (X-IDS)**.

The dataset is **NSL-KDD**, where each row is a network connection. Each connection has 41 features such as protocol, service, duration, bytes, login status, error rates, and host-level statistics. The target label is binary:

- `normal`
- `anomaly`

The notebook trains three neural models:

- **MLP**: a standard feed-forward network for tabular data.
- **LSTM**: treats the 41 features like a sequence.
- **1D-CNN**: treats the 41 features like a one-dimensional signal.

Then it explains predictions using:

- **SHAP**: feature contribution values based on Shapley values.
- **LIME**: local surrogate explanations based on perturbations.

Then it asks:

- Are explanations stable?
- Are explanations faithful?
- Are important features manipulable by attackers?

---

# Cell 2 — Install Dependencies

```python
!pip install -q torch numpy pandas scikit-learn datasets shap lime matplotlib scipy
```

### What it does

This line installs all Python packages needed in Google Colab.

- `torch`: PyTorch, used to build and train neural networks.
- `numpy`: numerical arrays and mathematical operations.
- `pandas`: table/dataframe manipulation.
- `scikit-learn`: preprocessing and metrics.
- `datasets`: Hugging Face library to load NSL-KDD.
- `shap`: SHAP explanations.
- `lime`: LIME explanations.
- `matplotlib`: plots and figures.
- `scipy`: statistics such as Pearson and Spearman correlations.

### Why it matters

This prepares the environment. Without these libraries, the rest of the notebook cannot run.

### Mapping to the project

This supports **all tasks** because it installs the tools for training, explaining, evaluating, and plotting.

---

# Cell 3 — Imports, Reproducibility, and Device Setup

```python
import os, sys, json, time, random, pickle
```

Imports standard Python utilities.

- `os`, `sys`: system/file utilities.
- `json`: could be used for saving structured results.
- `time`: used to measure training time.
- `random`: Python random generator.
- `pickle`: can save/load Python objects.

```python
import numpy as np
```

Imports NumPy as `np`. Almost all numerical arrays in preprocessing, SHAP, LIME, and metrics use NumPy.

```python
import pandas as pd
```

Imports pandas as `pd`. The NSL-KDD dataset is converted to pandas DataFrames so we can manipulate columns easily.

```python
import torch
```

Imports PyTorch main library.

```python
import torch.nn as nn
```

Imports PyTorch neural-network module as `nn`. This is used for layers like `Linear`, `LSTM`, `Conv1d`, `BatchNorm`, `Dropout`, and `CrossEntropyLoss`.

```python
from torch.utils.data import TensorDataset, DataLoader
```

Imports utilities to package arrays into datasets and mini-batches.

- `TensorDataset`: wraps tensors `(X, y)` together.
- `DataLoader`: creates batches for training and testing.

```python
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
```

Imports preprocessing tools.

- `LabelEncoder`: converts categorical strings to integers.
- `MinMaxScaler`: scales numerical features into `[0, 1]`.

```python
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score, average_precision_score
```

Imports evaluation metrics.

- `classification_report`: precision, recall, F1-score.
- `confusion_matrix`: counts correct/incorrect predictions by class.
- `roc_auc_score`: ROC-AUC ranking metric.
- `average_precision_score`: PR-AUC / average precision.

```python
from datasets import load_dataset
```

Imports Hugging Face dataset loader. Used to download/load NSL-KDD.

```python
import shap
```

Imports SHAP explainability library.

```python
from lime import lime_tabular
```

Imports LIME tabular explainer.

```python
from scipy.stats import spearmanr, pearsonr
```

Imports statistical correlation functions.

- `spearmanr`: rank correlation. Used for comparing feature rankings and LIME stability.
- `pearsonr`: linear correlation. Used for SHAP perturbation stability.

```python
import matplotlib.pyplot as plt
```

Imports plotting interface.

```python
import warnings
warnings.filterwarnings('ignore')
```

Suppresses warning messages to keep the Colab output cleaner.

### Reproducibility block

```python
SEED = 42
```

Defines the random seed. A seed is a fixed starting point for randomness.

```python
random.seed(SEED)
```

Fixes Python's built-in random generator.

```python
np.random.seed(SEED)
```

Fixes NumPy randomness. This affects random sample selection for SHAP/LIME and stability tests.

```python
torch.manual_seed(SEED)
```

Fixes PyTorch randomness, such as weight initialization and training randomness.

```python
torch.backends.cudnn.deterministic = True
```

Forces deterministic CUDA operations where possible. This improves reproducibility.

```python
torch.backends.cudnn.benchmark = False
```

Disables CuDNN benchmarking. Benchmarking can choose different algorithms depending on runtime conditions, which hurts reproducibility.

### Device selection

```python
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
```

Checks if a GPU is available. If yes, training uses CUDA GPU; otherwise it uses CPU.

```python
print(f'Device: {DEVICE}')
```

Prints the selected device.

```python
if DEVICE.type == 'cuda':
    print(f'GPU: {torch.cuda.get_device_name(0)}')
```

If running on GPU, prints the GPU name. In the final run it was Tesla T4.

### Mapping to the project

This cell establishes reproducibility and compute setup. In an academic report, reproducibility is important because results should be repeatable.

---

# Cell 5 — Feature Names, Dataset Loading, and Class Distribution

```python
FEATURE_NAMES = [ ... ]
```

This list contains the 41 NSL-KDD feature names in the exact order used by the dataset and model.

The list is not just cosmetic. It is needed for:

- selecting feature columns from the DataFrame,
- preserving consistent input order,
- labeling SHAP plots,
- labeling LIME explanations,
- interpreting security implications.

### Lines 1–16 — NSL-KDD features

The features include:

- Basic connection features: `duration`, `protocol_type`, `service`, `flag`, `src_bytes`, `dst_bytes`.
- Content features: `hot`, `num_failed_logins`, `logged_in`, `root_shell`, etc.
- Time-based traffic features: `count`, `srv_count`, `serror_rate`, `rerror_rate`, etc.
- Host-based traffic features: `dst_host_count`, `dst_host_srv_count`, `dst_host_*` rates.

Why this matters: later, when SHAP says `logged_in` is important, we know exactly which IDS feature influenced the model.

```python
CATEGORICAL_COLS = ['protocol_type', 'service', 'flag']
```

Defines the three categorical columns. These contain strings, not numbers, so they must be encoded before feeding them into neural networks.

```python
ds = load_dataset('Mireu-Lab/NSL-KDD')
```

Loads NSL-KDD from Hugging Face.

```python
df_train = ds['train'].to_pandas()
df_test = ds['test'].to_pandas()
```

Converts train and test splits into pandas DataFrames. Pandas makes column operations easier.

```python
print(f'Train: {len(df_train)} | Test: {len(df_test)}')
```

Prints dataset sizes.

Final output:

- Train: 151,165
- Test: 34,394

```python
print('\nTrain distribution:')
print(df_train['class'].value_counts())
```

Prints how many normal/anomaly samples exist in training.

```python
print('\nTest distribution:')
print(df_test['class'].value_counts())
```

Prints class distribution in the test set.

### Why class distribution matters

The train and test distributions are different:

- Train has more normal than anomaly.
- Test has more anomaly than normal.

This matters because the model must generalize under distribution shift.

### Mapping to project

This cell supports the **dataset understanding** part of the report. It proves what data we used and shows imbalance/distribution shift.

---

# Cell 6 — Target Encoding, Categorical Encoding, and Scaling

```python
# Encode target (binary: anomaly=0, normal=1)
```

Comment explaining the binary label setup.

```python
class_names = ['anomaly', 'normal']
```

Defines readable class names. This is used later in classification reports and LIME explanations.

```python
le_y = LabelEncoder()
```

Creates a label encoder for target labels.

```python
y_train = le_y.fit_transform(df_train['class'].values)
```

Fits the encoder on the training labels and transforms them into integers.

In this dataset, the final encoding is:

- anomaly = 0
- normal = 1

```python
y_test = le_y.transform(df_test['class'].values)
```

Transforms test labels using the same encoder learned from training.

Important: we do not fit on test labels, because the test set must remain unseen.

```python
df_tr, df_te = df_train.copy(), df_test.copy()
```

Creates copies of the train and test DataFrames so original data remains unchanged.

```python
label_encoders = {}
```

Creates a dictionary to store encoders for each categorical feature.

```python
for col in CATEGORICAL_COLS:
```

Loops over the categorical columns: protocol_type, service, flag.

```python
    le = LabelEncoder()
```

Creates a new encoder for the current categorical column.

```python
    le.fit(df_tr[col])
```

Fits the encoder only on training categories.

```python
    known = set(le.classes_)
```

Stores categories seen during training.

```python
    df_te[col] = df_te[col].apply(lambda x: x if x in known else le.classes_[0])
```

Handles possible unknown categories in test data. If a test category was not seen during training, it is replaced by the first known class.

Why: LabelEncoder cannot transform unseen labels. This prevents runtime errors.

```python
    df_tr[col] = le.transform(df_tr[col])
```

Transforms training categorical values into integers.

```python
    df_te[col] = le.transform(df_te[col])
```

Transforms test categorical values using the same encoder.

```python
    label_encoders[col] = le
```

Stores the encoder for later inspection or inverse transformation.

```python
    print(f'Encoded {col}: {len(le.classes_)} categories')
```

Prints how many categories each column has.

Final output:

- protocol_type: 3 categories
- service: 70 categories
- flag: 11 categories

### Scaling

```python
scaler = MinMaxScaler()
```

Creates a scaler that maps each feature to [0, 1].

```python
X_train = scaler.fit_transform(df_tr[FEATURE_NAMES].values.astype(np.float32))
```

Takes training features, converts them to float32, fits the scaler on training data, and transforms training features.

Important: fit only on training data.

```python
X_test = scaler.transform(df_te[FEATURE_NAMES].values.astype(np.float32))
```

Transforms test features using the training scaler.

Again, no fitting on test data to avoid data leakage.

```python
print(f'\nX_train: {X_train.shape} | X_test: {X_test.shape}')
```

Prints feature matrix shapes.

Final output:

- X_train: (151165, 41)
- X_test: (34394, 41)

```python
print(f'y_train: {np.bincount(y_train)} | y_test: {np.bincount(y_test)}')
```

Prints encoded class counts.

### Why this cell is essential

Neural networks cannot directly process strings or unscaled heterogeneous features. This cell converts the raw dataset into clean numerical tensors.

### Mapping to project

This is the **preprocessing pipeline** in the report.

---

# Cell 8 — Model Definitions

This cell defines the three deep learning models.

---

## MLP_IDS

```python
class MLP_IDS(nn.Module):
```

Defines a PyTorch class for the MLP model. It inherits from `nn.Module`, which is required for PyTorch models.

```python
    def __init__(self, in_dim=41, num_classes=2):
```

Constructor. Input dimension is 41 because NSL-KDD has 41 features. Number of classes is 2: anomaly and normal.

```python
        super().__init__()
```

Initializes the parent PyTorch module.

```python
        self.net = nn.Sequential(
```

Creates a sequence of layers that will run one after another.

```python
            nn.Linear(in_dim, 256), nn.BatchNorm1d(256), nn.ReLU(), nn.Dropout(0.3),
```

First hidden block:

- `Linear(41, 256)`: maps 41 input features to 256 hidden units.
- `BatchNorm1d(256)`: stabilizes hidden activations.
- `ReLU()`: adds non-linearity.
- `Dropout(0.3)`: randomly drops 30% of activations during training to reduce overfitting.

```python
            nn.Linear(256, 128), nn.BatchNorm1d(128), nn.ReLU(), nn.Dropout(0.2),
```

Second hidden block. Reduces representation from 256 to 128.

```python
            nn.Linear(128, 64), nn.ReLU(),
```

Third hidden block. Reduces from 128 to 64.

```python
            nn.Linear(64, num_classes)
```

Output layer. Produces two logits: one for anomaly and one for normal.

```python
        )
```

Ends the sequential model.

```python
        for m in self.modules():
```

Loops through all modules/layers inside the model.

```python
            if isinstance(m, nn.Linear):
```

Checks if the current module is a linear layer.

```python
                nn.init.xavier_uniform_(m.weight)
```

Initializes weights using Xavier uniform initialization. This helps gradients flow well at the start of training.

```python
                nn.init.zeros_(m.bias)
```

Initializes biases to zero.

```python
    def forward(self, x): return self.net(x)
```

Defines the forward pass. Input `x` passes through `self.net`.

```python
    def count_parameters(self): return sum(p.numel() for p in self.parameters() if p.requires_grad)
```

Counts trainable parameters. Used for reporting model size.

### Why MLP is used

MLP is the simplest strong baseline for tabular data. If a complex model beats the MLP, that suggests the extra architecture has value.

---

## LSTM_IDS

```python
class LSTM_IDS(nn.Module):
```

Defines the LSTM model class.

```python
    def __init__(self, in_dim=41, hidden_dim=64, num_layers=2, num_classes=2):
```

Constructor. It uses 41 features, hidden size 64, 2 LSTM layers, and 2 output classes.

```python
        super().__init__()
```

Initializes parent module.

```python
        self.lstm = nn.LSTM(1, hidden_dim, num_layers, batch_first=True, dropout=0.2)
```

Creates an LSTM.

Important detail: each feature is treated as one timestep with one value. So input shape becomes:

```text
batch_size × 41 × 1
```

- `input_size=1`: each timestep contains one feature value.
- `hidden_dim=64`: LSTM hidden representation size.
- `num_layers=2`: stacked LSTM layers.
- `batch_first=True`: batch dimension comes first.
- `dropout=0.2`: dropout between LSTM layers.

```python
        self.fc = nn.Sequential(nn.Linear(hidden_dim, 32), nn.ReLU(), nn.Linear(32, num_classes))
```

Creates a small classifier after the LSTM.

- 64 hidden state → 32 hidden units → 2 output classes.

```python
    def forward(self, x):
```

Defines forward pass.

```python
        out, (h_n, _) = self.lstm(x.unsqueeze(-1))
```

`x` originally has shape:

```text
batch_size × 41
```

`x.unsqueeze(-1)` changes it to:

```text
batch_size × 41 × 1
```

The LSTM returns:

- `out`: output at all timesteps.
- `h_n`: final hidden states.
- `_`: cell states, ignored.

```python
        return self.fc(h_n[-1])
```

Uses the final hidden state from the last LSTM layer and feeds it into the classifier.

```python
    def count_parameters(self): return sum(p.numel() for p in self.parameters() if p.requires_grad)
```

Counts trainable parameters.

### Why LSTM is used

Even though NSL-KDD is not a time series, the features have an order and groups. LSTM may learn dependencies across these feature groups.

---

## CNN1D_IDS

```python
class CNN1D_IDS(nn.Module):
```

Defines the 1D-CNN model.

```python
    def __init__(self, in_dim=41, num_classes=2):
```

Constructor with 41 input features and 2 output classes.

```python
        super().__init__()
```

Initializes parent module.

```python
        self.conv = nn.Sequential(
```

Creates convolutional feature extractor.

```python
            nn.Conv1d(1, 64, 3, padding=1), nn.BatchNorm1d(64), nn.ReLU(),
```

First convolution block:

- input channels = 1,
- output channels = 64,
- kernel size = 3,
- padding = 1 keeps length 41.

This learns local patterns across neighboring features.

```python
            nn.Conv1d(64, 128, 3, padding=1), nn.BatchNorm1d(128), nn.ReLU(),
```

Second convolution block, increasing channels from 64 to 128.

```python
            nn.AdaptiveAvgPool1d(8)
```

Compresses the sequence length to 8, regardless of input length.

```python
        )
```

Ends convolution block.

```python
        self.fc = nn.Sequential(nn.Linear(128*8, 64), nn.ReLU(), nn.Dropout(0.2), nn.Linear(64, num_classes))
```

Classifier after convolution:

- Flattened size = 128 channels × 8 pooled positions.
- Dense layer to 64.
- ReLU.
- Dropout.
- Output layer to 2 classes.

```python
    def forward(self, x):
```

Defines forward pass.

```python
        x = self.conv(x.unsqueeze(1))
```

Original `x` shape is:

```text
batch_size × 41
```

`x.unsqueeze(1)` gives:

```text
batch_size × 1 × 41
```

This is the format Conv1d expects.

```python
        return self.fc(x.view(x.size(0), -1))
```

Flattens convolution output and feeds it to classifier.

```python
    def count_parameters(self): return sum(p.numel() for p in self.parameters() if p.requires_grad)
```

Counts parameters.

### Final model loop

```python
for name, cls in [('MLP', MLP_IDS), ('LSTM', LSTM_IDS), ('CNN1D', CNN1D_IDS)]:
```

Loops over the three model classes.

```python
    m = cls()
```

Instantiates each model.

```python
    print(f'{name}: {m.count_parameters():,} parameters')
```

Prints model parameter counts.

### Mapping to project

This cell implements the **Train model** requirement and sets up model comparison.

---

# Cell 10 — Training All Models

This is the largest and most important training cell.

```python
EPOCHS = 50
BATCH_SIZE = 256
LR = 1e-3
```

Defines training hyperparameters:

- train for 50 epochs,
- use mini-batches of 256 samples,
- learning rate is 0.001.

```python
train_ds = TensorDataset(torch.FloatTensor(X_train), torch.LongTensor(y_train))
```

Converts training NumPy arrays into PyTorch tensors and bundles features/labels together.

- Features become float tensors.
- Labels become long integer tensors required by CrossEntropyLoss.

```python
test_ds = TensorDataset(torch.FloatTensor(X_test), torch.LongTensor(y_test))
```

Same for test data.

```python
train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True)
```

Creates mini-batches for training and shuffles data each epoch.

```python
test_loader = DataLoader(test_ds, batch_size=BATCH_SIZE)
```

Creates test batches. No shuffle is needed because evaluation order does not matter.

### Class weights

```python
counts = np.bincount(y_train)
```

Counts how many examples exist per class.

```python
weights = 1.0 / counts.astype(np.float32)
```

Creates inverse-frequency weights. Smaller classes get larger weight.

```python
weights = weights / weights.sum() * len(weights)
```

Normalizes weights so their average scale is reasonable.

```python
class_weights = torch.FloatTensor(weights).to(DEVICE)
```

Converts weights to PyTorch tensor and moves them to GPU/CPU.

### Why class weights?

Class imbalance can make the model favor the majority class. Weighted loss penalizes mistakes on underrepresented classes more.

---

## train_model function

```python
def train_model(model, model_name):
```

Defines a reusable function to train any of the three models.

```python
    print(...)
```

Prints a header showing which model is being trained.

```python
    model.to(DEVICE)
```

Moves model to GPU or CPU.

```python
    criterion = nn.CrossEntropyLoss(weight=class_weights)
```

Defines classification loss with class weights.

CrossEntropyLoss expects raw logits, so the model does not need Softmax during training.

```python
    optimizer = torch.optim.Adam(model.parameters(), lr=LR, weight_decay=1e-4)
```

Creates Adam optimizer.

- `lr=1e-3`: learning rate.
- `weight_decay=1e-4`: L2 regularization to reduce overfitting.

```python
    scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=5, factor=0.5)
```

Creates learning-rate scheduler. If loss plateaus for 5 epochs, learning rate is halved.

```python
    best_f1, history = 0, {'train_loss': [], 'test_acc': []}
```

Initializes best F1 and stores training history.

```python
    best_state = None
```

Will store the best model weights.

```python
    t0 = time.time()
```

Starts timing training.

```python
    for epoch in range(EPOCHS):
```

Training loop over 50 epochs.

```python
        model.train()
```

Sets model to training mode. Enables dropout and batchnorm training behavior.

```python
        total_loss = 0
```

Initializes epoch loss accumulator.

```python
        for xb, yb in train_loader:
```

Loops over training mini-batches.

```python
            xb, yb = xb.to(DEVICE), yb.to(DEVICE)
```

Moves batch to GPU/CPU.

```python
            optimizer.zero_grad()
```

Clears old gradients.

```python
            loss = criterion(model(xb), yb)
```

Runs model forward pass and computes cross-entropy loss.

```python
            loss.backward()
```

Backpropagates gradients.

```python
            optimizer.step()
```

Updates model weights.

```python
            total_loss += loss.item() * len(yb)
```

Adds weighted batch loss to epoch loss.

### Evaluation inside each epoch

```python
        model.eval()
```

Sets model to evaluation mode. Dropout is disabled, batchnorm uses learned statistics.

```python
        preds, probs, labels = [], [], []
```

Creates lists to collect predictions, probabilities, and labels.

```python
        with torch.no_grad():
```

Disables gradient computation to save memory and speed up evaluation.

```python
            for xb, yb in test_loader:
```

Loops through test batches.

```python
                xb = xb.to(DEVICE)
```

Moves features to GPU/CPU.

```python
                out = model(xb)
```

Gets raw logits.

```python
                preds.append(out.argmax(1).cpu().numpy())
```

Predicted class is the index of the largest logit.

```python
                probs.append(torch.softmax(out, 1).cpu().numpy())
```

Converts logits to class probabilities.

```python
                labels.append(yb.numpy())
```

Stores true labels.

```python
        preds = np.concatenate(preds)
        probs = np.concatenate(probs)
        labels = np.concatenate(labels)
```

Combines batch arrays into full test arrays.

```python
        report = classification_report(labels, preds, output_dict=True)
```

Computes precision, recall, F1, etc.

```python
        wf1 = report['weighted avg']['f1-score']
```

Extracts weighted F1-score.

```python
        acc = report['accuracy']
```

Extracts accuracy.

```python
        test_loss = total_loss / len(y_train)
```

Despite the variable name, this is actually average training loss for the epoch.

```python
        scheduler.step(test_loss)
```

Updates scheduler based on loss.

```python
        history['train_loss'].append(total_loss / len(y_train))
        history['test_acc'].append(acc)
```

Stores loss and accuracy for plots.

```python
        if wf1 > best_f1:
```

Checks if current model is best so far.

```python
            best_f1 = wf1
```

Updates best F1.

```python
            best_state = {k: v.cpu().clone() for k, v in model.state_dict().items()}
```

Saves a copy of best model weights on CPU.

```python
        if (epoch+1) % 10 == 0 or epoch == 0:
```

Prints progress at epoch 1 and every 10 epochs.

```python
            print(...)
```

Shows epoch, loss, accuracy, and F1.

### Final evaluation

```python
    dt = time.time() - t0
```

Measures total training time.

```python
    model.load_state_dict(best_state)
```

Restores best model weights.

```python
    model.eval()
```

Sets evaluation mode.

The next block repeats final evaluation on the test set to compute final metrics.

```python
    roc = roc_auc_score(labels, probs[:, 1])
```

Computes ROC-AUC using probability of class 1 (`normal`).

```python
    pr = average_precision_score(labels, probs[:, 1])
```

Computes PR-AUC / average precision.

```python
    print(...)
    print(classification_report(...))
    print(confusion_matrix(...))
```

Prints final metrics, per-class report, and confusion matrix.

```python
    return model, {...}
```

Returns trained model and result dictionary.

### Training all models

```python
models = {}
results = {}
```

Creates dictionaries to store models and results.

```python
for name, cls in [('mlp', MLP_IDS), ('lstm', LSTM_IDS), ('cnn1d', CNN1D_IDS)]:
```

Loops over model classes.

```python
    models[name], results[name] = train_model(cls(), name.upper())
```

Instantiates, trains, and stores each model.

### Mapping to project

This cell implements the **Train model** part and produces the model comparison results.

---

# Cells 11 and 12 — Model Summary and Training Curves

## Cell 11

```python
print(f'{"Model":<8} {"Params":>8} {"W-F1":>8} {"ROC-AUC":>9} {"PR-AUC":>8} {"Time":>8}')
```

Prints table header.

```python
print('-'*50)
```

Prints separator line.

```python
for name in ['mlp', 'lstm', 'cnn1d']:
```

Loops over the three trained models.

```python
    r = results[name]
```

Gets metric dictionary.

```python
    p = models[name].count_parameters()
```

Gets parameter count.

```python
    print(...)
```

Prints model name, parameters, F1, ROC-AUC, PR-AUC, and time.

### Why this matters

This is the main quantitative result table in the report.

## Cell 12

```python
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
```

Creates two side-by-side plots.

```python
for name in ['mlp', 'lstm', 'cnn1d']:
```

Loops over models.

```python
    axes[0].plot(results[name]['history']['train_loss'], label=name.upper())
```

Plots training loss over epochs.

```python
    axes[1].plot(results[name]['history']['test_acc'], label=name.upper())
```

Plots test accuracy over epochs.

```python
axes[0].set_xlabel(...); ...
```

Labels first plot.

```python
axes[1].set_xlabel(...); ...
```

Labels second plot.

```python
plt.tight_layout(); plt.show()
```

Adjusts spacing and displays plots.

### Mapping to project

These plots support training analysis and make the report/presentation visual.

---

# Cell 14 — SHAP Setup and SHAP Value Computation

```python
mlp_cpu = models['mlp'].cpu().eval()
```

Moves the trained MLP to CPU and sets evaluation mode.

Why MLP? The project uses MLP for SHAP explanation because it is a clean tabular baseline and easier to explain consistently.

```python
def predict_fn(X):
```

Defines a prediction wrapper for SHAP and LIME.

```python
    with torch.no_grad():
```

No gradients are needed for explanation queries.

```python
        return torch.softmax(mlp_cpu(torch.FloatTensor(X)), 1).numpy()
```

Converts NumPy input to PyTorch tensor, runs the MLP, applies Softmax, and returns probabilities.

SHAP and LIME need a function that takes NumPy arrays and returns prediction probabilities.

```python
bg_idx = np.random.choice(len(X_train), 100, replace=False)
```

Randomly selects 100 training samples as SHAP background data.

Background data represents the baseline distribution.

```python
exp_idx = np.random.choice(len(X_test), 150, replace=False)
```

Randomly selects 150 test samples to explain.

Why sample? Kernel SHAP is expensive; explaining the entire test set would take too long.

```python
explainer = shap.KernelExplainer(predict_fn, X_train[bg_idx])
```

Creates a model-agnostic SHAP explainer using prediction function and background data.

```python
print('Computing SHAP values...')
```

Progress message.

```python
shap_values_raw = explainer.shap_values(X_test[exp_idx], nsamples=200, silent=True)
```

Computes SHAP values for 150 test samples using 200 samples for approximation.

```python
if isinstance(shap_values_raw, list):
    shap_vals_anomaly = shap_values_raw[0]
elif shap_values_raw.ndim == 3:
    shap_vals_anomaly = shap_values_raw[:, :, 0]
else:
    shap_vals_anomaly = shap_values_raw
```

Handles different SHAP library output formats.

- Older SHAP returns a list per class.
- Newer SHAP may return a 3D array.
- The code extracts class 0: anomaly.

```python
print(f'Done! Shape: {shap_vals_anomaly.shape}')
```

Prints SHAP array shape. Final shape is `(150, 41)`.

### Mapping to project

This is the core of **Explain predictions** using SHAP.

---

# Cells 15–18 — SHAP Global and Local Explanations

## Cell 15 — Feature Importance

```python
mean_abs_shap = np.abs(shap_vals_anomaly).mean(axis=0)
```

Takes absolute SHAP values and averages across samples.

Why absolute value? We care about magnitude of influence, regardless of direction.

```python
feature_importance = sorted(zip(FEATURE_NAMES, mean_abs_shap), key=lambda x: x[1], reverse=True)
```

Pairs each feature name with its importance and sorts descending.

```python
print('Top 15 features...')
```

Prints heading.

```python
for i, (f, v) in enumerate(feature_importance[:15]):
    print(...)
```

Prints top 15 features.

## Cell 16 — SHAP Summary Plot

```python
shap.summary_plot(shap_vals_anomaly, X_test[exp_idx], feature_names=FEATURE_NAMES, max_display=15)
```

Creates SHAP summary plot.

It shows:

- feature importance,
- direction of feature effect,
- distribution of SHAP values,
- top 15 features.

## Cell 17 — SHAP Bar Plot

```python
plt.figure(figsize=(10, 6))
```

Creates figure.

```python
top15 = feature_importance[:15]
```

Selects top 15 SHAP features.

```python
plt.barh(range(15), [v for _, v in top15][::-1], color='steelblue')
```

Draws horizontal bar chart. `[::-1]` reverses order so most important appears at top visually.

```python
plt.yticks(range(15), [f for f, _ in top15][::-1])
```

Labels bars with feature names.

```python
plt.xlabel('Mean |SHAP value|')
plt.title('Top 15 Features — MLP (Anomaly Class)')
plt.tight_layout(); plt.show()
```

Adds labels, title, and displays plot.

## Cell 18 — Local SHAP Explanation

```python
idx = 0
```

Selects first explained test sample.

```python
pred = predict_fn(X_test[exp_idx[idx:idx+1]])
```

Gets prediction probabilities for that sample.

```python
print(f'Sample prediction: anomaly={pred[0][0]:.3f}, normal={pred[0][1]:.3f}')
```

Prints predicted probabilities.

```python
print(f'True label: {class_names[y_test[exp_idx[idx]]]}')
```

Prints true label.

```python
ev = explainer.expected_value
```

Gets SHAP baseline expected output.

```python
ev0 = ev[0] if isinstance(ev, (list, np.ndarray)) else ev
```

Handles expected value format.

```python
shap.force_plot(ev0, shap_vals_anomaly[idx], X_test[exp_idx[idx]], feature_names=FEATURE_NAMES, matplotlib=True)
```

Creates force plot for one prediction.

### Mapping to project

These cells produce the **explanation analysis deliverable**.

---

# Cell 20 — LIME Explanation Analysis

```python
lime_explainer = lime_tabular.LimeTabularExplainer(...)
```

Creates a LIME explainer for tabular data.

```python
X_train
```

Training data is used by LIME to understand feature distributions.

```python
feature_names=FEATURE_NAMES
```

Gives readable feature names.

```python
class_names=class_names
```

Gives readable class labels.

```python
discretize_continuous=True
```

LIME bins continuous features into intervals, making explanations more interpretable.

```python
random_state=SEED
```

Makes LIME sampling reproducible.

```python
n_lime = 30
```

Number of test samples to explain.

```python
lime_idx = np.random.choice(len(X_test), n_lime, replace=False)
```

Randomly selects 30 test samples.

```python
all_top_features = {}
```

Dictionary to count how often each feature appears in LIME top explanations.

```python
for i, idx in enumerate(lime_idx):
```

Loops over selected samples.

```python
exp = lime_explainer.explain_instance(X_test[idx], predict_fn, num_features=10, top_labels=1)
```

Generates a LIME explanation for one sample.

- `num_features=10`: keep top 10 features.
- `top_labels=1`: explain predicted class.

```python
pred_class = np.argmax(predict_fn(X_test[idx].reshape(1, -1)))
```

Gets predicted class for that sample.

```python
for fw in exp.as_list(label=pred_class):
```

Loops over feature-weight pairs in the LIME explanation.

```python
fname = fw[0].split(' ')[0]
```

Extracts feature name from LIME's text rule.

```python
all_top_features[fname] = all_top_features.get(fname, 0) + 1
```

Counts how often this feature appears.

```python
if (i+1) % 10 == 0:
    print(...)
```

Progress every 10 samples.

```python
lime_sorted = sorted(all_top_features.items(), key=lambda x: x[1], reverse=True)
```

Sorts features by frequency.

```python
for f, c in lime_sorted[:10]:
    print(...)
```

Prints top 10 LIME features.

### Mapping to project

This implements the **Apply explainability** task using LIME.

---

# Cell 21 — SHAP vs LIME Comparison

```python
fig, axes = plt.subplots(1, 2, figsize=(16, 6))
```

Creates two side-by-side plots.

```python
top10_shap = feature_importance[:10]
```

Gets top 10 SHAP features.

```python
axes[0].barh(...)
```

Plots SHAP top 10.

```python
top10_lime = lime_sorted[:10]
```

Gets top 10 LIME features.

```python
axes[1].barh(...)
```

Plots LIME top 10.

```python
plt.suptitle('SHAP vs LIME Feature Rankings', fontsize=14)
plt.tight_layout(); plt.show()
```

Displays comparison plot.

### Rank correlation

```python
shap_ranks = {f: i for i, (f, _) in enumerate(feature_importance[:20])}
```

Creates dictionary mapping SHAP feature to rank.

```python
lime_ranks = {f: i for i, (f, _) in enumerate(lime_sorted[:20])}
```

Creates dictionary mapping LIME feature to rank.

```python
common = set(shap_ranks.keys()) & set(lime_ranks.keys())
```

Finds features appearing in both top-20 lists.

```python
if len(common) >= 5:
```

Only compute correlation if enough overlap exists.

```python
rho, p = spearmanr([...], [...])
```

Computes Spearman rank correlation between SHAP and LIME rankings.

```python
print(...)
```

Prints result.

Final result:

```text
Spearman correlation = 0.0714
p = 0.8665
```

### Interpretation

SHAP and LIME disagree strongly. This is a key finding: explanations depend on method choice.

---

# Cell 23 — SHAP Stability Evaluation

```python
def compute_shap_stability(explainer, sample, epsilon, n_perturbs=10):
```

Defines function to evaluate how stable SHAP is under perturbations.

```python
    rng = np.random.RandomState(SEED)
```

Creates deterministic random generator.

```python
    base = np.array(explainer.shap_values(sample.reshape(1,-1), nsamples=100, silent=True))
```

Computes original SHAP explanation for the sample.

```python
    base = base[0].flatten() if isinstance(base, list) else base.flatten()
```

Flattens SHAP values into one vector.

```python
    max_delta, pccs = 0, []
```

Initializes maximum explanation change and list of correlations.

```python
    for _ in range(n_perturbs):
```

Repeats perturbation several times.

```python
        noise = rng.uniform(-epsilon, epsilon, sample.shape)
```

Creates random noise bounded by epsilon.

```python
        perturbed = np.clip(sample + noise, 0, 1)
```

Adds noise and clips features to valid [0,1] range.

```python
        p_shap = np.array(explainer.shap_values(perturbed.reshape(1,-1), nsamples=100, silent=True))
```

Computes SHAP explanation for perturbed sample.

```python
        p_shap = p_shap[0].flatten() if isinstance(p_shap, list) else p_shap.flatten()
```

Flattens perturbed explanation.

```python
        max_delta = max(max_delta, np.linalg.norm(p_shap - base))
```

Computes explanation shift magnitude and keeps maximum. This is SENS_MAX.

```python
        if np.std(base) > 1e-8 and np.std(p_shap) > 1e-8:
```

Avoids correlation if vector has near-zero variance.

```python
            pccs.append(pearsonr(base, p_shap)[0])
```

Computes Pearson correlation between original and perturbed SHAP values.

```python
    return max_delta, np.mean(pccs) if pccs else 0.0
```

Returns SENS_MAX and average PCC.

### Running the test

```python
epsilons = [0.01, 0.03, 0.05]
```

Perturbation sizes.

```python
n_stability = 8
```

Number of samples used for stability test.

```python
stability_idx = np.random.choice(len(X_test), n_stability, replace=False)
```

Randomly selects test samples.

```python
stability_results = {}
```

Stores results.

```python
for eps in epsilons:
```

Loops over perturbation sizes.

```python
    sens_list, pcc_list = [], []
```

Stores metrics per sample.

```python
    for i, idx in enumerate(stability_idx):
```

Loops over selected samples.

```python
        sm, pc = compute_shap_stability(...)
```

Computes SENS_MAX and PCC.

```python
        sens_list.append(sm); pcc_list.append(pc)
```

Stores results.

```python
    stability_results[eps] = {'sens_max': np.mean(sens_list), 'pcc': np.mean(pcc_list)}
```

Stores average metrics.

```python
    status = 'STABLE' if np.mean(pcc_list) > 0.6 else 'UNSTABLE'
```

Classifies explanation stability using threshold 0.6.

### Mapping to project

This implements **Evaluate explanation stability**.

---

# Cell 24 — LIME Stochastic Stability

This evaluates whether LIME gives consistent explanations when run multiple times.

```python
lime_corrs = []
```

Stores average correlation per sample.

```python
for i, idx in enumerate(stability_idx[:6]):
```

Uses first 6 stability samples.

```python
    weight_vecs = []
```

Stores LIME weight vectors from different seeds.

```python
    for seed in range(10):
```

Runs LIME 10 times with different seeds.

```python
        le_obj = lime_tabular.LimeTabularExplainer(..., random_state=seed)
```

Creates a new LIME explainer with a different random seed.

```python
        exp = le_obj.explain_instance(..., num_features=len(FEATURE_NAMES))
```

Explains the sample using all features.

```python
        w = np.zeros(len(FEATURE_NAMES))
```

Creates a zero vector of feature weights.

```python
        for key, val in dict(exp.as_list()).items():
```

Loops over LIME explanation terms.

```python
            for j, fn in enumerate(FEATURE_NAMES):
                if fn in key: w[j] = val; break
```

Maps LIME text rules back to feature indices.

```python
        weight_vecs.append(w)
```

Stores one explanation vector.

```python
    corrs = []
```

Stores pairwise correlations.

```python
    for a in range(10):
        for b in range(a+1, 10):
```

Compares all pairs of the 10 runs.

```python
            if np.std(weight_vecs[a]) > 1e-8 and np.std(weight_vecs[b]) > 1e-8:
```

Avoids invalid correlation.

```python
                corrs.append(spearmanr(weight_vecs[a], weight_vecs[b])[0])
```

Computes Spearman correlation between two LIME runs.

```python
    mc = np.mean(corrs) if corrs else 0
```

Mean correlation for this sample.

```python
    lime_corrs.append(mc)
```

Stores it.

```python
lime_status = 'STABLE' if np.mean(lime_corrs) > 0.6 else 'UNSTABLE'
```

Classifies LIME stability.

### Mapping to project

This tests whether LIME explanations are reliable despite LIME randomness.

---

# Cell 25 — Faithfulness Evaluation

Faithfulness asks: do the important features actually matter to the model?

```python
def get_shap_for_class(shap_values, class_idx=0):
```

Helper function for SHAP output formats.

```python
    if isinstance(shap_values, list):
        return shap_values[class_idx]
```

Older SHAP format.

```python
    elif isinstance(shap_values, np.ndarray) and shap_values.ndim == 3:
        return shap_values[:, :, class_idx]
```

Newer SHAP 3D format.

```python
    else:
        return shap_values
```

Fallback.

```python
faith_results = {k: [] for k in [3, 5, 10]}
```

Creates result lists for top-3, top-5, and top-10 feature masking.

```python
for idx in stability_idx[:10]:
```

Loops over up to 10 samples.

```python
    sample = X_test[idx]
```

Gets one test sample.

```python
    sv_raw = explainer.shap_values(sample.reshape(1,-1), nsamples=100, silent=True)
```

Computes SHAP values.

```python
    sv = get_shap_for_class(sv_raw, 0).flatten()
```

Extracts anomaly-class SHAP vector.

```python
    base_conf = predict_fn(sample.reshape(1,-1))[0]
```

Gets original prediction probabilities.

```python
    pred_cls = np.argmax(base_conf)
```

Gets predicted class.

```python
    for k in faith_results:
```

Loops over k = 3, 5, 10.

```python
        masked = sample.copy()
```

Copies sample.

```python
        masked[np.argsort(np.abs(sv))[-k:]] = 0.0
```

Finds top-k absolute SHAP features and masks them by setting to 0.

```python
        drop = base_conf[pred_cls] - predict_fn(masked.reshape(1,-1))[0][pred_cls]
```

Measures confidence drop after masking.

```python
        faith_results[k].append(float(drop))
```

Stores confidence drop.

```python
for k, scores in faith_results.items():
    print(...)
```

Prints average and standard deviation.

### Mapping to project

This implements **Evaluate explanation faithfulness**.

---

# Cell 26 — Stability Summary Plot

```python
fig, axes = plt.subplots(1, 3, figsize=(16, 5))
```

Creates three plots side by side.

```python
eps_list = list(stability_results.keys())
```

Gets epsilon values.

```python
axes[0].plot(eps_list, [stability_results[e]['sens_max'] for e in eps_list], ...)
```

Plots SENS_MAX vs epsilon.

```python
pcc_vals = [stability_results[e]['pcc'] for e in eps_list]
```

Gets PCC values.

```python
colors = ['green' if p > 0.6 else 'red' for p in pcc_vals]
```

Green bars for stable, red for unstable.

```python
axes[1].bar(...)
```

Plots PCC stability bars.

```python
axes[1].axhline(y=0.6, ...)
```

Draws stability threshold line.

```python
ks = list(faith_results.keys())
```

Gets masking sizes 3, 5, 10.

```python
axes[2].bar(...)
```

Plots faithfulness confidence drop with error bars.

```python
plt.suptitle(...)
plt.tight_layout(); plt.show()
```

Adds title and displays.

### Mapping to project

This creates the figure used to summarize explanation reliability.

---

# Cell 28 — Security Implications / Feature Manipulability

```python
manipulable = {...}
```

Defines features that attackers may directly influence.

Examples:

- `src_bytes`
- `dst_bytes`
- `hot`
- `num_failed_logins`
- `duration`

```python
partial = {...}
```

Defines partially manipulable features.

Examples:

- `count`
- `srv_count`
- `serror_rate`
- `rerror_rate`
- `protocol_type`
- `flag`

These can sometimes be influenced but not freely controlled.

```python
non_manip = {...}
```

Defines non-manipulable features such as host-level aggregated statistics.

Examples:

- `dst_host_count`
- `dst_host_srv_count`
- `dst_host_rerror_rate`
- `dst_host_serror_rate`

These are computed by IDS sensors or depend on broader traffic context.

```python
manip_count = {'Manipulable': 0, 'Partial': 0, 'Non-manipulable': 0}
```

Initializes counters.

```python
for i, (f, v) in enumerate(feature_importance[:15]):
```

Loops over top 15 SHAP features.

```python
    if f in manipulable:
        status = 'MANIPULABLE'
        manip_count['Manipulable'] += 1
```

Classifies feature as manipulable.

```python
    elif f in partial:
        status = 'PARTIAL'
        manip_count['Partial'] += 1
```

Classifies feature as partially manipulable.

```python
    else:
        status = 'NON-MANIPULABLE'
        manip_count['Non-manipulable'] += 1
```

Otherwise classifies as non-manipulable.

```python
    print(...)
```

Prints feature, SHAP value, and manipulability status.

```python
print(f'\nSummary: {manip_count}')
```

Prints count summary.

```python
if manip_count['Non-manipulable'] > manip_count['Manipulable']:
    print('-> Model relies more on non-manipulable features -> MORE ROBUST against evasion')
else:
    print('-> Model relies more on manipulable features -> LESS ROBUST against evasion')
```

Simple security conclusion.

### Mapping to project

This implements **Analyze security implications**.

---

# Cell 29 — Final Summary

```python
print('\n' + '='*60)
print('FINAL RESULTS SUMMARY')
print('='*60)
```

Prints final report-style summary header.

```python
print(f'\n1. MODEL COMPARISON:')
```

Starts model result section.

```python
for name in ['mlp', 'lstm', 'cnn1d']:
    r = results[name]
    print(...)
```

Prints F1, ROC-AUC, and PR-AUC for all models.

```python
print(f'\n2. EXPLANATION STABILITY (SAFARI):')
```

Starts stability section.

```python
for eps in epsilons:
    sr = stability_results[eps]
    status = 'STABLE' if sr['pcc'] > 0.6 else 'UNSTABLE'
    print(...)
```

Prints SHAP stability for each epsilon.

```python
print(f'   LIME: Spearman={np.mean(lime_corrs):.4f} ...')
```

Prints LIME stability.

```python
print(f'\n3. FAITHFULNESS:')
```

Starts faithfulness section.

```python
for k in [3, 5, 10]:
    print(...)
```

Prints confidence drop after masking top-k SHAP features.

```python
print(f'\n4. SECURITY: Top features manipulability = {manip_count}')
```

Prints security summary.

```python
print('\nDone!')
```

End message.

### Mapping to project

This cell packages all deliverable results:

- model comparison,
- stability,
- faithfulness,
- security analysis.

---

# How the Whole Notebook Maps to the Project Requirements

| Teacher requirement | Notebook cells | What was done |
|---|---|---|
| Train model | Cells 5–12 | Load data, preprocess, train MLP/LSTM/CNN, evaluate metrics |
| Explain predictions | Cells 14–21 | SHAP and LIME explanations, feature rankings, local explanations |
| Evaluate stability | Cells 23–26 | SHAP perturbation stability, LIME stochastic stability |
| Analyze risks | Cell 28 | Feature manipulability and evasion risk analysis |
| Expected output | Cell 29 + figures | Final result summary, plots, explanation/security analysis |

---

# The Main Story You Should Understand

The code starts with raw NSL-KDD network connection records. It converts them into numerical normalized feature vectors. Then it trains three neural IDS models and compares them. The LSTM performs best.

After training, the notebook does not stop at accuracy. It asks: why did the model make its decisions? SHAP and LIME are used to identify important features. SHAP finds features like `logged_in` and error-rate statistics. LIME finds some overlapping but different features. Their low Spearman correlation shows that XAI methods can disagree.

Then the notebook asks whether the explanations are reliable. SHAP is stable only for very small perturbations. LIME is borderline stable. Feature masking shows SHAP explanations are reasonably faithful because removing top SHAP features reduces prediction confidence.

Finally, the code asks whether explanations are safe. If top features are manipulable by attackers, explanations can leak evasion strategies. The model relies on several non-manipulable or partially manipulable features, which is a positive sign, but explanation access should still be controlled.

---

# Key Things to Say if Asked About the Code

1. The preprocessing avoids data leakage by fitting encoders/scalers on training data and transforming test data.
2. The three models are compared fairly because they use the same dataset, preprocessing, and training setup.
3. Weighted F1 is important because class distributions are not perfectly balanced and train/test distributions differ.
4. SHAP gives global and local feature importance.
5. LIME gives local surrogate explanations.
6. SHAP and LIME disagree, which is an important result, not a failure.
7. Stability is evaluated because explanations must be consistent to be trusted.
8. Faithfulness is evaluated because important explanation features should actually affect predictions.
9. Security analysis checks whether important features can be manipulated by attackers.
10. The whole project is not just IDS accuracy; it is IDS + explanation + reliability + security.