File size: 35,936 Bytes

6cd8396

# Exam Prep — Explainable Intrusion Detection System (X-IDS)

This document is your **deep oral-exam prep** for the presentation. It includes:

- a full pitch for every slide,
- the concepts you must know,
- likely teacher questions,
- strong answers,
- emergency short answers if you panic,
- and the key numbers from the final notebook.

Project repo: <https://huggingface.co/cathrica/deep-learning-project>  
Updated GitHub repo: <https://github.com/sloka1c/DPL>  
Final artifact: `explainable_ids_full_pipeline.ipynb - Colab.pdf`

---

## 0. The One-Minute Summary You Must Memorize

Our project is an **Explainable Intrusion Detection System**, or **X-IDS**. The goal is not only to detect whether a network connection is normal or anomalous, but also to explain **why** the model made that decision.

We used the **NSL-KDD** intrusion detection dataset, preprocessed 41 network features, and trained three deep learning models: **MLP**, **LSTM**, and **1D-CNN**. The best model was the **LSTM**, with a weighted F1-score of **0.7800**, ROC-AUC of **0.9434**, and PR-AUC of **0.9222**.

Then we applied two post-hoc explainability methods: **SHAP** and **LIME**. SHAP showed that features like `logged_in`, `dst_host_rerror_rate`, `protocol_type`, and error-rate features were important. LIME produced different rankings, and the Spearman correlation between SHAP and LIME rankings was only **0.0714**, meaning the methods did not strongly agree.

We also evaluated explanation stability. SHAP was stable for very small perturbations, with PCC **0.6293** at epsilon **0.01**, but became unstable at larger perturbations. LIME was borderline stable with mean Spearman **0.6054**. Finally, we analyzed security implications: explanations help analysts, but if exposed carelessly, they can also leak information to attackers.

**Main conclusion:** Explainability is useful for IDS, but explanations must be evaluated for stability, faithfulness, and security risk before being trusted.

---

## 1. Key Numbers to Know by Heart

### Dataset

| Item | Value |
|---|---:|
| Dataset | NSL-KDD |
| Features | 41 |
| Categorical features | `protocol_type`, `service`, `flag` |
| Train size | 151,165 records |
| Test size | 34,394 records |
| Train distribution | 80,792 normal / 70,373 anomaly |
| Test distribution | 22,531 anomaly / 11,863 normal |
| Task | Binary classification: normal vs anomaly |

### Model Results

| Model | Parameters | Weighted F1 | ROC-AUC | PR-AUC | Time |
|---|---:|---:|---:|---:|---:|
| MLP | 52,802 | 0.7639 | 0.9231 | 0.8699 | 145.1s |
| LSTM | 52,578 | **0.7800** | **0.9434** | **0.9222** | 162.9s |
| 1D-CNN | 91,074 | 0.7579 | 0.9410 | 0.9182 | 173.1s |

### Top SHAP Features — Anomaly Class

| Rank | Feature | Mean SHAP |
|---:|---|---:|
| 1 | `logged_in` | 0.0950 |
| 2 | `dst_host_rerror_rate` | 0.0619 |
| 3 | `protocol_type` | 0.0573 |
| 4 | `rerror_rate` | 0.0479 |
| 5 | `dst_host_serror_rate` | 0.0427 |
| 6 | `count` | 0.0398 |
| 7 | `serror_rate` | 0.0380 |
| 8 | `dst_host_same_srv_rate` | 0.0319 |
| 9 | `same_srv_rate` | 0.0256 |
| 10 | `dst_host_srv_serror_rate` | 0.0252 |

### Top LIME Features

| Feature | Frequency |
|---|---:|
| `wrong_fragment` | 30/30 |
| `rerror_rate` | 30/30 |
| `protocol_type` | 30/30 |
| `dst_host_rerror_rate` | 30/30 |
| `num_failed_logins` | 21/30 |
| `num_shells` | 21/30 |
| `logged_in` | 18/30 |
| `root_shell` | 18/30 |
| `su_attempted` | 17/30 |
| `hot` | 17/30 |

### XAI Stability / Faithfulness

| Metric | Result |
|---|---:|
| SHAP vs LIME Spearman correlation | 0.0714 |
| SHAP PCC at epsilon 0.01 | 0.6293 — stable |
| SHAP PCC at epsilon 0.03 | 0.5861 — unstable |
| SHAP PCC at epsilon 0.05 | 0.5676 — unstable |
| LIME stochastic stability | 0.6054 — borderline stable |
| Top-3 masking confidence drop | 0.3355 |
| Top-5 masking confidence drop | 0.3592 |
| Top-10 masking confidence drop | 0.4938 |

---

## 2. Full Pitch for Each Slide

Use these as scripts. Do not read them word-for-word like a robot; use them to understand what to say.

---

### Slide 1 — Title: Explainable Intrusion Detection System

**Pitch:**

Good morning. Our project is called **Explainable Intrusion Detection System**, or **X-IDS**. The idea is to build a deep learning IDS that detects whether a network connection is normal or anomalous, but also gives explanations for its decisions.

In cybersecurity, a prediction alone is not enough. If a model flags a connection as malicious, the security analyst needs to know *why*. So our work combines three parts: deep learning models for detection, explainability methods like SHAP and LIME, and stability/security analysis to see whether those explanations can actually be trusted.

We implemented the full pipeline on the NSL-KDD dataset, compared MLP, LSTM, and 1D-CNN models, and then analyzed explanations, their stability, faithfulness, and possible security risks.

**What to emphasize:**

- It is not just classification.
- It is detection + explanation + trust evaluation.
- You have a full reproducible Colab pipeline.

**Possible question:** Why is your project called explainable IDS and not just IDS?

**Answer:** Because the goal is not only to predict normal/anomaly, but to explain which features influenced the prediction and evaluate if those explanations are reliable.

---

### Slide 2 — Motivation: IDS Accuracy Is Not Enough

**Pitch:**

Traditional IDS systems generate alerts, but modern deep learning IDS models can be hard to interpret. This is a problem because cybersecurity decisions are high-stakes. If the model produces a false positive, analysts waste time. If it produces a false negative, an attack may pass undetected.

Explainability helps analysts understand whether the alert makes sense. For example, if a connection is classified as an attack because of high error rates and suspicious login behavior, that is useful evidence. But explainability also creates a risk: if an attacker sees the explanations, they may learn which features to manipulate.

So our motivation is: can we make IDS decisions interpretable, while also checking whether those explanations are stable and safe to expose?

**Concepts to know:**

- IDS = Intrusion Detection System.
- False positive = normal traffic flagged as attack.
- False negative = attack classified as normal.
- Explainability can improve trust but may leak information.

**Possible question:** Why do analysts need explanations?

**Strong answer:** Because IDS alerts must be investigated. Explanations help analysts prioritize alerts, understand model behavior, and detect whether the model is relying on meaningful security features or spurious correlations.

---

### Slide 3 — Research Questions

**Pitch:**

We structured the project around five research questions.

First, can deep learning models detect intrusions on NSL-KDD with acceptable performance? Second, which features drive the anomaly predictions? Third, do SHAP and LIME agree on which features are important? Fourth, are the explanations stable when we slightly perturb the inputs or rerun stochastic methods? And fifth, from a security perspective, are the important features controllable by attackers or mostly sensor-side statistics?

This structure is important because an explainable IDS should not be judged only by accuracy. It should also be judged by interpretability, consistency, and security implications.

**Possible question:** What is your main research question?

**Answer:** The main question is: can we make deep learning IDS predictions interpretable without losing detection performance, and are the explanations stable enough to be trusted in a security context?

---

### Slide 4 — Dataset: NSL-KDD

**Pitch:**

We used the NSL-KDD dataset, which is a standard benchmark for intrusion detection. Each sample represents a network connection described by 41 features. These include basic connection features like duration and protocol type, content features like failed logins, and traffic statistics like error rates and destination-host counts.

The task in our final pipeline is binary classification: normal versus anomaly. The training set contains 151,165 records and the test set contains 34,394 records. An important point is that the train and test distributions are different. In training, normal traffic is slightly more common, but in the test set anomalies dominate. This creates a distribution shift and makes generalization harder.

For preprocessing, we encoded the categorical features with LabelEncoder and scaled all features to [0,1] using MinMaxScaler. Scaling is especially important because features like byte counts and error rates have very different ranges.

**Things to know:**

- NSL-KDD is an improved version of KDD Cup 99.
- It has 41 features.
- Binary labels: normal/anomaly.
- Train/test distribution shift matters.

**Possible question:** Why did you use MinMaxScaler?

**Answer:** Because the features have very different numerical ranges. Scaling to [0,1] stabilizes neural network training and makes perturbation-based explanation evaluation meaningful, because epsilon noise has the same scale across features.

**Possible question:** Why LabelEncoder and not OneHotEncoder?

**Answer:** OneHotEncoder would expand the feature space and make explanations harder to interpret. LabelEncoder preserves the original 41-feature structure, which makes SHAP and LIME outputs easier to map back to IDS features. The drawback is that it introduces artificial ordering, which we mention as a limitation.

---

### Slide 5 — Pipeline Overview

**Pitch:**

This slide summarizes the full pipeline. We start by loading NSL-KDD from Hugging Face. Then we encode categorical variables and scale the features. After that, we train three deep learning models: MLP, LSTM, and 1D-CNN. Then we explain predictions using SHAP and LIME. Finally, we evaluate explanation stability and perform security analysis.

The important point is that this is not just theoretical. The final notebook implements the pipeline end-to-end in Colab, using a Tesla T4 GPU, with fixed random seed 42 for reproducibility.

**Possible question:** What makes your project reproducible?

**Answer:** We use a fixed random seed, documented preprocessing, fixed model architectures and hyperparameters, and a single notebook that runs data loading, training, explanation, stability evaluation, and security analysis.

---

### Slide 6 — Deep Learning Models

**Pitch:**

We compared three lightweight deep learning architectures.

The first is an MLP, which is a strong baseline for tabular data. It uses fully connected layers with BatchNorm, ReLU, and Dropout.

The second is an LSTM. Normally LSTMs are used for sequences, but here we treat the 41 features as a feature sequence. This allows the model to learn dependencies between feature groups.

The third is a 1D-CNN. It treats the feature vector like a one-dimensional signal and uses convolution filters to learn local patterns between neighboring features.

All three models were trained with the same optimizer, learning rate, batch size, number of epochs, and preprocessing pipeline. This makes the comparison fair.

**Concepts to know:**

- MLP = feed-forward neural network.
- LSTM = recurrent neural network with gates for sequential dependencies.
- CNN = convolutional neural network, here applied to 1D features.
- BatchNorm stabilizes training.
- Dropout reduces overfitting.
- Adam is adaptive gradient optimization.
- CrossEntropyLoss is standard for classification.

**Possible question:** Why use LSTM on tabular features?

**Answer:** It is not a natural time sequence, but NSL-KDD features are grouped semantically: basic features, content features, time-based traffic features, and host-based traffic features. Treating them as a sequence can allow the LSTM to learn dependencies between these feature groups.

**Possible question:** Why did you include CNN?

**Answer:** A 1D-CNN can learn local patterns between neighboring features, especially in grouped rate-based features. It provides an architecture comparison with different inductive bias.

---

### Slide 7 — Model Performance Results

**Pitch:**

Here are the main performance results. The LSTM performed best, with weighted F1-score 0.7800, ROC-AUC 0.9434, and PR-AUC 0.9222. The MLP achieved weighted F1 0.7639, and the 1D-CNN achieved 0.7579.

The LSTM was not the fastest, but it gave the best overall performance. The MLP was fastest, and the CNN had the most parameters but did not outperform the LSTM.

This result suggests that modeling feature dependencies with the LSTM was useful for this dataset. However, the difference is not huge, so all models are reasonably comparable.

**Possible question:** Why use weighted F1 instead of accuracy?

**Answer:** Because the dataset is imbalanced and the test distribution differs from training. Weighted F1 accounts for both precision and recall per class, weighted by support. Accuracy alone can hide poor performance on the minority or important class.

**Possible question:** What is ROC-AUC?

**Answer:** ROC-AUC measures how well the model ranks positive versus negative samples across classification thresholds. A value close to 1 means strong separability.

**Possible question:** What is PR-AUC and why is it useful?

**Answer:** PR-AUC is area under the precision-recall curve. It is especially useful for imbalanced classification because it focuses on precision and recall of the positive class.

---

### Slide 8 — SHAP: Global Feature Importance

**Pitch:**

After training, we used SHAP to explain the MLP model. SHAP assigns each feature a contribution value, based on the idea of Shapley values from game theory. In simple terms, it estimates how much each feature contributes to the prediction.

For the anomaly class, the most important feature was `logged_in`, followed by `dst_host_rerror_rate`, `protocol_type`, `rerror_rate`, and `dst_host_serror_rate`. These features make sense for intrusion detection because login status and error rates are closely related to suspicious behavior.

This result shows that the model is not using random features; it is relying on meaningful network behavior indicators.

**Concepts to know:**

- SHAP = SHapley Additive exPlanations.
- It is based on Shapley values from cooperative game theory.
- It gives local explanations and can be aggregated into global importance.
- Mean absolute SHAP value measures average feature impact.

**Possible question:** Explain SHAP simply.

**Answer:** SHAP measures how much each feature contributes to moving the prediction away from the average prediction. Positive or negative SHAP values show whether a feature pushes the prediction toward or away from a class.

**Possible question:** Why KernelExplainer?

**Answer:** KernelExplainer is model-agnostic, so it can explain any model through input-output queries. That makes it suitable for comparing explanations consistently across different architectures.

---

### Slide 9 — LIME vs SHAP

**Pitch:**

We also used LIME, another post-hoc explainability method. LIME explains one prediction by creating perturbed samples around it, observing the model outputs, and fitting a simple local surrogate model.

The top LIME features included `wrong_fragment`, `rerror_rate`, `protocol_type`, and `dst_host_rerror_rate`, each appearing in 30 out of 30 explanations.

However, when we compared SHAP and LIME feature rankings, the Spearman correlation was only 0.0714, with p-value 0.8665. This means the two methods did not strongly agree.

This is an important result: explainability depends on the method. We should not blindly trust a single explanation method without checking consistency.

**Concepts to know:**

- LIME = Local Interpretable Model-Agnostic Explanations.
- LIME approximates the model locally with a simpler model.
- Spearman correlation compares ranking similarity.
- Low correlation means different feature rankings.

**Possible question:** Why do SHAP and LIME disagree?

**Answer:** They use different assumptions. SHAP estimates feature contributions using a game-theoretic approach, while LIME fits a local surrogate model based on random perturbations. Different sampling, locality definitions, and weighting can lead to different feature rankings.

**Possible question:** Which one is better?

**Answer:** Not universally. SHAP has stronger theoretical foundations and is more consistent, but can be expensive. LIME is intuitive and local but stochastic and sensitive to perturbation settings. In security-critical systems, we should compare both and evaluate stability.

---

### Slide 10 — Explanation Stability

**Pitch:**

An explanation should be stable: if two inputs are almost the same, their explanations should also be similar. We evaluated this by adding small bounded noise to inputs and comparing SHAP attributions using PCC, or Pearson correlation coefficient.

At epsilon 0.01, SHAP had PCC 0.6293, which is above the 0.6 threshold, so we considered it stable. But at epsilon 0.03 and 0.05, PCC dropped below 0.6, meaning explanations became unstable as perturbations increased.

For LIME, we evaluated stochastic stability by running it multiple times with different seeds and measuring Spearman correlation. The average was 0.6054, just above the threshold, so it is borderline stable.

The conclusion is that explanations are not automatically reliable. Their stability must be measured.

**Concepts to know:**

- Stability = similar inputs should have similar explanations.
- Perturbation = small change/noise added to input.
- PCC = Pearson correlation coefficient, measures linear similarity.
- SENS_MAX = maximum explanation shift under perturbations.
- Spearman correlation = similarity of rankings.

**Possible question:** Why use epsilon values 0.01, 0.03, 0.05?

**Answer:** Because features were scaled to [0,1], so these epsilons represent small bounded perturbations. They allow us to test how explanations react to increasingly larger but still controlled changes.

**Possible question:** Why threshold 0.6?

**Answer:** We used 0.6 as a practical stability threshold inspired by robustness/interpretable evaluation frameworks such as SAFARI. It indicates moderate positive agreement.

---

### Slide 11 — Faithfulness: Feature Masking

**Pitch:**

Stability tells us whether explanations are consistent, but faithfulness tells us whether the highlighted features actually matter to the model.

To test faithfulness, we masked the top SHAP features and measured how much the model confidence dropped. If the explanation is meaningful, removing the most important features should reduce confidence.

The results show that masking the top 3 features caused an average confidence drop of 0.3355. Masking top 5 caused 0.3592, and masking top 10 caused 0.4938. So the more important features we remove, the larger the confidence drop.

This supports that SHAP is identifying features that the model really uses.

**Possible question:** What is faithfulness?

**Answer:** Faithfulness means the explanation reflects the model's actual decision process. If a feature is marked important, changing or removing it should significantly affect the model prediction.

**Possible question:** Why does top-10 masking have a bigger drop than top-3?

**Answer:** Because more influential features are removed, so the model loses more of the information it used for the prediction.

---

### Slide 12 — Security Implications

**Pitch:**

Explainability has two sides. It helps defenders, but it can also help attackers.

If explanations reveal that the model relies heavily on features an attacker can manipulate, then the attacker may try to change those features to evade detection. For example, if `src_bytes` or login-related features dominate, an attacker might craft traffic to change those values.

So we analyzed the top SHAP features by manipulability. Some features are manipulable, some are partially manipulable, and some are non-manipulable sensor-side statistics. Our model relied on several non-manipulable or partially manipulable features, such as destination-host error rates and host statistics. That is better for robustness than relying only on attacker-controlled fields.

But explanations should still be access-controlled, rate-limited, and monitored because repeated explanation queries could leak model strategy.

**Concepts to know:**

- Manipulable features = attacker can directly change them.
- Non-manipulable features = computed by sensor or depend on network aggregates.
- Explanation leakage = attacker learns model behavior from explanations.
- Evasion = attacker modifies input to be classified as normal.

**Possible question:** How can explanations help attackers?

**Answer:** If an attacker sees that certain features strongly drive anomaly detection, they can try to craft traffic that changes those features while preserving malicious behavior, increasing the chance of evasion.

**Possible question:** What mitigation would you propose?

**Answer:** Limit explanation access to trusted analysts, rate-limit explanation APIs, log queries, avoid exposing raw SHAP values externally, and combine ML explanations with rule-based IDS and human review.

---

### Slide 13 — Limitations

**Pitch:**

There are several limitations.

First, NSL-KDD is a benchmark dataset, but real network traffic is more complex and changes over time. Second, LabelEncoder preserves interpretability but introduces artificial order for categorical features. Third, Kernel SHAP is computationally expensive, so we used sampled subsets. Fourth, LIME is stochastic, so results can vary depending on random seed and perturbation settings.

Finally, our project evaluates explainability and stability, but it is not a full adversarial robustness defense. A production IDS would require more realistic traffic, online evaluation, and adversarial testing.

**Possible question:** What would you improve if you had more time?

**Answer:** I would test on more modern datasets like CIC-IDS2017 or UNSW-NB15, use one-hot or embeddings for categorical features, evaluate multiclass attack categories, and add adversarial evasion experiments.

---

### Slide 14 — Conclusion

**Pitch:**

To conclude, our project shows that deep learning can be used for intrusion detection and that post-hoc explainability can reveal meaningful features behind predictions.

The LSTM achieved the best detection performance. SHAP produced interpretable global feature importance, and masking experiments supported faithfulness. However, SHAP and LIME did not strongly agree, and explanation stability decreased as perturbations increased.

So the main conclusion is: explainability is valuable, but in cybersecurity we cannot trust explanations blindly. We must evaluate their stability, faithfulness, and security risks before using them in operational decision-making.

**Good final sentence:**

Our final answer is that explainable IDS is possible, but trustworthy explainable IDS requires both accurate models and rigorous explanation evaluation.

---

### Slide 15 — Thank You / Questions

**Pitch:**

Thank you for listening. I am ready for your questions.

If asked to summarize again, say:

We built an explainable IDS using NSL-KDD, compared MLP, LSTM, and CNN, found that LSTM performed best, used SHAP and LIME to explain predictions, evaluated stability and faithfulness, and concluded that explanations are useful but must be controlled and validated in cybersecurity contexts.

---

## 3. Deep Concepts You Should Understand

### 3.1 Intrusion Detection System

An IDS monitors network activity and detects suspicious or malicious behavior. There are two broad types:

- **Signature-based IDS:** detects known attacks using predefined rules.
- **Anomaly-based IDS:** learns normal behavior and flags deviations.

Our project is closer to anomaly-based IDS using machine learning.

### 3.2 NSL-KDD Feature Groups

NSL-KDD features include:

1. **Basic connection features** — duration, protocol, service, flag, bytes.
2. **Content features** — login attempts, root shell, failed login, compromised count.
3. **Traffic features** — count, srv_count, error rates.
4. **Host-based features** — destination host counts and rates.

Why this matters: feature groups help interpret SHAP/LIME results.

### 3.3 Classification Metrics

#### Accuracy

Percentage of correct predictions. Can be misleading with imbalance.

#### Precision

Of everything predicted as attack, how many were actually attack?

High precision = few false alarms.

#### Recall

Of all real attacks, how many did we detect?

High recall = fewer missed attacks.

#### F1-score

Harmonic mean of precision and recall.

#### Weighted F1

F1 averaged across classes weighted by class support.

#### ROC-AUC

Measures ranking quality across thresholds using true positive rate and false positive rate.

#### PR-AUC

Precision-recall area. Very useful when data is imbalanced.

### 3.4 Why Explainability Matters in Security

In normal ML tasks, an explanation is nice to have. In security, it is more important because:

- analysts need evidence,
- models may learn shortcuts,
- false positives are expensive,
- attacks evolve,
- explanations can reveal vulnerabilities.

### 3.5 SHAP

SHAP assigns each feature a contribution value. It answers:

> How much did this feature push the prediction compared to the average prediction?

Strengths:

- theoretically grounded,
- both local and global explanations,
- consistent under certain assumptions.

Weaknesses:

- can be computationally expensive,
- depends on background samples,
- feature dependence can complicate interpretation.

### 3.6 LIME

LIME explains one prediction by:

1. perturbing the input around the instance,
2. getting model predictions for perturbations,
3. fitting a simple interpretable model locally.

Strengths:

- intuitive,
- model-agnostic,
- fast for local explanations.

Weaknesses:

- stochastic,
- sensitive to perturbation distribution,
- can be unstable.

### 3.7 Stability

An explanation is stable if similar inputs produce similar explanations.

If explanation changes drastically for tiny input changes, analysts cannot trust it.

### 3.8 Faithfulness

An explanation is faithful if it reflects what the model actually uses.

Feature masking tests this:

- remove top features,
- check if confidence drops,
- bigger drop means more faithful explanation.

### 3.9 Security of Explanations

Explanations may leak:

- important features,
- model weaknesses,
- evasion strategy.

Defenses:

- access control,
- rate limiting,
- logging,
- aggregate explanations instead of raw values,
- combine with rule-based IDS.

---

## 4. Likely Exam Questions and Strong Answers

### Q1. What is the main goal of your project?

**Answer:** The goal is to build a deep learning IDS that can detect anomalies and explain its predictions using SHAP and LIME. We also evaluate whether those explanations are stable, faithful, and safe from a security perspective.

### Q2. Why did you choose NSL-KDD?

**Answer:** NSL-KDD is a standard benchmark for intrusion detection. It is cleaner than the original KDD Cup 99 and includes 41 interpretable network features, which makes it suitable for explainability analysis.

### Q3. Why binary classification and not multiclass?

**Answer:** The final pipeline focuses on binary normal/anomaly detection because it is the core IDS problem and allows clearer evaluation of explainability. Multiclass attack classification is a natural extension.

### Q4. Why compare MLP, LSTM, and CNN?

**Answer:** They represent different inductive biases: MLP for general tabular learning, LSTM for sequential dependencies between feature groups, and 1D-CNN for local feature patterns. Comparing them shows whether architecture affects performance.

### Q5. Why did LSTM perform best?

**Answer:** The LSTM may capture dependencies between ordered feature groups in NSL-KDD, such as basic features, content features, and host-based statistics. This can improve generalization compared to treating features independently.

### Q6. Why did CNN not win even though it has more parameters?

**Answer:** More parameters do not guarantee better performance. The CNN's local convolution assumption may not match the tabular feature structure as well as the LSTM's sequential dependency modeling.

### Q7. What is SHAP?

**Answer:** SHAP is an explainability method based on Shapley values. It estimates how much each feature contributes to a prediction compared to a baseline expected prediction.

### Q8. What is LIME?

**Answer:** LIME explains individual predictions by perturbing the input, observing model outputs, and fitting a simple local surrogate model to approximate the model near that instance.

### Q9. Why did SHAP and LIME disagree?

**Answer:** They use different explanation mechanisms. SHAP is game-theoretic and estimates feature contributions, while LIME builds a local surrogate from random perturbations. Their assumptions and sampling processes differ, so rankings can differ.

### Q10. Which explanation method do you trust more?

**Answer:** I would trust SHAP more for global analysis because it has stronger theoretical grounding and was used consistently. But I would not blindly trust either method; I would compare methods and evaluate stability and faithfulness.

### Q11. What does stability mean in your project?

**Answer:** Stability means that similar inputs should produce similar explanations. We tested it by perturbing inputs slightly and measuring correlation between original and perturbed explanations.

### Q12. What is SENS_MAX?

**Answer:** SENS_MAX measures the maximum change in explanation under bounded input perturbations. Lower SENS_MAX means the explanation is less sensitive and therefore more stable.

### Q13. What is PCC?

**Answer:** PCC is Pearson correlation coefficient. We used it to measure similarity between original SHAP values and perturbed SHAP values. Higher PCC means more stable explanations.

### Q14. Why did SHAP become unstable at larger epsilon?

**Answer:** Larger perturbations change the input more significantly, so the model may rely on different feature interactions. As a result, attributions shift and PCC decreases.

### Q15. What does faithfulness mean?

**Answer:** Faithfulness means the explanation reflects the model's real decision process. If top features are truly important, masking them should reduce prediction confidence.

### Q16. What did your masking experiment show?

**Answer:** Masking top SHAP features reduced confidence. Top-10 masking caused the biggest drop, 0.4938, which supports that SHAP identified meaningful features.

### Q17. What is the security risk of explainability?

**Answer:** Explanations can reveal which features the model relies on. An attacker could use that knowledge to manipulate controllable features and evade detection.

### Q18. How can you reduce explanation leakage?

**Answer:** Restrict explanation access to trusted analysts, rate-limit explanation APIs, log queries, avoid exposing raw explanation values externally, and combine ML explanations with traditional IDS rules.

### Q19. What are manipulable and non-manipulable features?

**Answer:** Manipulable features are features an attacker can directly influence, like bytes or payload-related fields. Non-manipulable features are computed by the IDS sensor or network aggregation, like destination-host statistics.

### Q20. What is the biggest limitation of your work?

**Answer:** NSL-KDD is an old benchmark dataset and may not fully represent modern traffic. Also, LabelEncoder introduces artificial ordering, and the explanation analysis used sampled subsets due to computational cost.

### Q21. If you had more time, what would you add?

**Answer:** I would test on newer datasets like CIC-IDS2017 or UNSW-NB15, evaluate multiclass attack detection, use better categorical encodings, and test adversarial evasion attacks using the explanation outputs.

### Q22. Why is PR-AUC important here?

**Answer:** Because intrusion detection datasets are often imbalanced. PR-AUC focuses on precision and recall, making it more informative than accuracy when class distributions are uneven.

### Q23. What does a low SHAP-LIME Spearman correlation mean practically?

**Answer:** It means the two explanation methods rank features differently. Practically, analysts may receive different justifications depending on the XAI method, so explanation method choice matters.

### Q24. Did your model rely on robust features?

**Answer:** Partially yes. Several top SHAP features are host-based or sensor-side statistics, which are harder for attackers to directly manipulate. However, some features are partially manipulable, so risk remains.

### Q25. How would this work in a real SOC?

**Answer:** The model would classify traffic and provide explanations to analysts. Explanations would be shown only internally, with access control and logging. Analysts would use them as supporting evidence, not as the only decision source.

---

## 5. Emergency Answers If You Panic

### If asked: What did you do?

We built a deep learning IDS on NSL-KDD, compared MLP/LSTM/CNN, explained predictions with SHAP and LIME, and evaluated explanation stability, faithfulness, and security implications.

### If asked: Best result?

The LSTM was best: weighted F1 **0.7800**, ROC-AUC **0.9434**, PR-AUC **0.9222**.

### If asked: Main conclusion?

Explainability helps understand IDS decisions, but explanations must be checked for stability and security risk before trusting them.

### If asked: SHAP vs LIME?

SHAP and LIME disagreed strongly; Spearman correlation was only **0.0714**, so XAI methods are method-dependent.

### If asked: Stability?

SHAP was stable only for small perturbations, PCC **0.6293** at epsilon **0.01**, but unstable at larger epsilon. LIME was borderline stable at **0.6054**.

### If asked: Security risk?

Explanations can leak which features attackers should manipulate, so explanation access must be controlled and monitored.

---

## 6. Mini Glossary

| Term | Meaning |
|---|---|
| IDS | Intrusion Detection System |
| X-IDS | Explainable Intrusion Detection System |
| NSL-KDD | Benchmark dataset for network intrusion detection |
| MLP | Multi-Layer Perceptron, fully connected neural network |
| LSTM | Recurrent neural network with memory gates |
| 1D-CNN | Convolutional network over one-dimensional feature vector |
| SHAP | Feature attribution method based on Shapley values |
| LIME | Local surrogate explanation method |
| ROC-AUC | Threshold-independent ranking metric |
| PR-AUC | Precision-recall area, useful for imbalance |
| Weighted F1 | F1 averaged by class support |
| PCC | Pearson correlation coefficient |
| Spearman | Rank correlation coefficient |
| SENS_MAX | Maximum explanation sensitivity to perturbation |
| Faithfulness | Whether explanation features actually affect prediction |
| Evasion | Modifying attack traffic to avoid detection |
| Explanation leakage | Attacker learning model behavior from explanations |

---

## 7. Final Advice for the Exam

1. **Do not overclaim.** Say the model is useful, not perfect.
2. **Always connect explanations to security.** This is not generic XAI; it is XAI for IDS.
3. **Mention limitations confidently.** It makes you look serious.
4. **Use the numbers.** The teacher will trust you more if you cite exact results.
5. **If stuck, return to the main message:** performance + explainability + stability + security.

Final sentence to memorize:

> Our project shows that deep learning can detect intrusions and that SHAP/LIME can explain predictions, but in cybersecurity, explanations must be evaluated for stability, faithfulness, and leakage risk before they can be trusted.