deep-learning-project / presentation /TEACHER_PRESENTATION_GUIDE.md
cathrica's picture
Add comprehensive teacher presentation guide
d6f3bff verified

Teacher Presentation Guide β€” Explainable Intrusion Detection System (X-IDS)

Repo: cathrica/deep-learning-project
Project: ICCN-INE2 Deep Learning β€” Project 5: Explainable IDS
Dataset: NSL-KDD | Models: MLP, LSTM, 1D-CNN | XAI: SHAP + LIME


1. The 30-Second Elevator Pitch

We built an Explainable Intrusion Detection System that detects malicious network connections using deep learning, then explains why each decision was made using SHAP and LIME. We also evaluated whether those explanations are stable, faithful, and safe to expose in a security environment.

Best model: LSTM with weighted F1 = 0.7800, ROC-AUC = 0.9434, PR-AUC = 0.9222. SHAP and LIME did not agree (Spearman = 0.0714), and explanations lost stability as input perturbations grew. Security analysis showed that exposing raw explanations can help attackers evade detection, so access must be controlled.


2. Why This Project Matters (Motivation)

  • Traditional IDS alerts are black-box β€” analysts get a flag but no evidence.
  • Deep learning improves detection but hides reasoning.
  • In cybersecurity, a false positive wastes analyst time; a false negative lets attacks through.
  • Explainability can help defenders prioritize alerts and verify model behavior.
  • Risk: if attackers see which features matter most, they can craft evasion attacks.
  • Our project asks: Can we explain IDS decisions without destroying trust or security?

3. Dataset β€” NSL-KDD

Property Value
Source UNB Canadian Institute for Cybersecurity
HF Hub Mireu-Lab/NSL-KDD
Records Train: 151,165 / Test: 34,394
Features 41 (3 categorical + 38 numerical)
Categorical protocol_type (3), service (70), flag (11)
Task Binary classification: Normal vs Anomaly
Train distribution 53% Normal / 47% Anomaly
Test distribution 34% Normal / 66% Anomaly

Important detail: The test set has a distribution shift β€” more anomalies than training. This makes generalization harder and is worth mentioning as a realistic challenge.

Preprocessing Choices

Step Method Why
Categorical encoding LabelEncoder Preserves 41-feature structure so SHAP/LIME outputs map cleanly to original features. OneHot would explode to 84 columns and hurt interpretability.
Scaling MinMaxScaler [0,1] Features have wildly different ranges (e.g., src_bytes up to 1.3B vs serror_rate 0–1). Scaling stabilizes training and makes Ξ΅-perturbations meaningful for stability testing.
Reproducibility Seed 42, fixed splits Every experiment is deterministic.

Teacher might ask: Why LabelEncoder instead of OneHot?
Answer: OneHot would create 84 binary features. SHAP and LIME would then explain binary columns instead of semantic features, making interpretation much harder for analysts. The trade-off is artificial ordering in categorical variables, which we acknowledge as a limitation.


4. Models & Architecture Choices

We compared three lightweight architectures with the same training config:

Parameter Value
Optimizer Adam
Learning rate 1e-3
Weight decay 1e-4
Batch size 256
Epochs 50
Loss CrossEntropyLoss with inverse-frequency class weights

4.1 MLP (Baseline)

Input(41) β†’ Linear(256) β†’ BatchNorm β†’ ReLU β†’ Dropout(0.3)
          β†’ Linear(128) β†’ BatchNorm β†’ ReLU β†’ Dropout(0.2)
          β†’ Linear(64)  β†’ ReLU
          β†’ Linear(2 classes)
  • Parameters: ~50K
  • Why: Standard tabular baseline. BatchNorm stabilizes gradients; dropout regularizes.

4.2 LSTM (Best Performer)

Input(41) β†’ reshape to (41, 1) β†’ 2-layer LSTM(hidden=64, dropout=0.2)
          β†’ last hidden state β†’ Linear(2 classes)
  • Parameters: ~35K
  • Why: Treats the 41 features as a sequence. NSL-KDD features are semantically grouped (basic β†’ content β†’ time-based β†’ host-based). LSTM can learn dependencies between these groups. This inductive bias helped it generalize best despite having fewer parameters than the CNN.

4.3 1D-CNN

Input(41) β†’ reshape to (1, 41) β†’ Conv1d(64, k=3, pad=1) β†’ ReLU
          β†’ Conv1d(128, k=3, pad=1) β†’ ReLU β†’ AdaptiveAvgPool1d(8)
          β†’ Flatten β†’ Linear(64) β†’ ReLU β†’ Linear(2 classes)
  • Parameters: ~45K
  • Why: Learns local patterns between neighboring features. Good for rate-based feature blocks. However, it underperformed the LSTM, showing that more parameters β‰  better if the architecture bias mismatches the data structure.

Performance Results

Model Weighted F1 ROC-AUC PR-AUC Training Time
LSTM 0.7800 0.9434 0.9222 162.9s
MLP 0.7639 0.9231 0.8699 145.1s
1D-CNN 0.7579 0.9410 0.9182 173.1s

Teacher might ask: Why did LSTM win despite fewer parameters?
Answer: The LSTM's sequential processing matches the semantic grouping of NSL-KDD features. The CNN assumes local spatial patterns, which is less natural for this tabular feature ordering. The MLP treats all features independently, missing group-level dependencies.


5. Explainability β€” SHAP & LIME

We used post-hoc explainability (explaining a trained model, not building an interpretable one) because deep learning models are more expressive.

SHAP (SHapley Additive exPlanations)

  • Method: KernelExplainer (model-agnostic)
  • What it does: Estimates how much each feature pushes the prediction away from the average prediction, based on game-theoretic Shapley values.
  • Top anomaly features: logged_in (0.0950), dst_host_rerror_rate (0.0619), protocol_type (0.0573), rerror_rate (0.0479), dst_host_serror_rate (0.0427)
  • Why these make sense: Login status and error rates are classic intrusion indicators.

LIME (Local Interpretable Model-Agnostic Explanations)

  • Method: LimeTabularExplainer
  • What it does: Perturbs the input, observes predictions, fits a simple linear model locally to approximate the black-box model near that point.
  • Top features (frequency in 30 explanations): wrong_fragment (30/30), rerror_rate (30/30), protocol_type (30/30), dst_host_rerror_rate (30/30)

Key Finding: SHAP vs LIME Disagreement

Metric Value
Spearman rank correlation 0.0714
p-value 0.8665

Interpretation: The two methods rank features almost completely differently. This is critical: explanations are method-dependent. You cannot trust one method blindly.

Teacher might ask: Which method do you trust more?
Answer: SHAP has stronger theoretical foundations (game theory, consistency properties) and is deterministic. LIME is intuitive but stochastic and sensitive to perturbation settings. For security-critical decisions, I would prefer SHAP but still validate with stability and faithfulness tests.


6. Stability & Faithfulness

An explanation is only useful if it is reliable.

6.1 Stability β€” Perturbation Test

We added small Ξ΅-bounded noise to inputs and measured how much SHAP attributions changed using Pearson Correlation Coefficient (PCC).

Epsilon PCC Verdict
0.01 0.6293 βœ… Stable (β‰₯ 0.6 threshold)
0.03 0.5861 ❌ Unstable
0.05 0.5676 ❌ Unstable

Threshold 0.6 is inspired by the SAFARI framework (Huang et al. 2022).

LIME stochastic stability: Mean Spearman across 20 runs = 0.6054 β€” borderline stable.

6.2 Faithfulness β€” Feature Masking

If SHAP says a feature is important, removing it should hurt confidence.

Masked features Confidence drop
Top 3 0.3355
Top 5 0.3592
Top 10 0.4938

Interpretation: The more top features we mask, the bigger the confidence drop. SHAP is identifying features the model actually uses.

Teacher might ask: What is the difference between stability and faithfulness?
Answer: Stability asks: "Do similar inputs get similar explanations?" Faithfulness asks: "Does the explanation actually reflect what the model cares about?" You need both for a trustworthy explanation.


7. Security Implications

7.1 The Dual-Edged Sword

  • Good: Explanations help analysts verify alerts and prioritize investigations.
  • Bad: If attackers see explanations, they learn which features to manipulate.

7.2 Feature Manipulability

Category Manipulable? Examples
Packet content βœ… Yes src_bytes, dst_bytes, hot
Connection behavior ⚠️ Partially duration, count, srv_count
Protocol fields ⚠️ Constrained protocol_type, flag
Network statistics ❌ No dst_host_count, dst_host_same_srv_rate
Error rates ⚠️ Partially serror_rate, rerror_rate

Good news: Our top SHAP features include many non-manipulable host-based statistics, which makes evasion harder than if the model relied only on attacker-controlled payload fields.

7.3 Attack Scenarios

  1. Evasion via explanation leakage: Attacker queries the explanation API, sees that serror_rate and count drive detection, then crafts traffic to spoof those features.
  2. LIME inconsistency exploitation: LIME gives different rankings on rerun. Analysts waste time chasing inconsistent explanations.
  3. Backdoor with clean explanations: A poisoned model misclassifies triggered inputs but shows plausible benign SHAP values.

7.4 Mitigations

  • Restrict explanation access to trusted analysts
  • Rate-limit explanation APIs
  • Log all explanation queries
  • Aggregate explanations instead of exposing raw per-sample values
  • Never replace rule-based IDS with ML explanations alone

8. Limitations (Say These Confidently)

  1. Dataset age: NSL-KDD is a benchmark from 2009. Modern traffic (TLS 1.3, encrypted payloads, IoT protocols) looks very different.
  2. LabelEncoder trade-off: Preserves interpretability but imposes artificial ordering on categories.
  3. Computational cost: Kernel SHAP is expensive; we used sampled subsets.
  4. LIME stochasticity: Results vary across random seeds.
  5. Scope: We evaluated explanation quality, not adversarial robustness of the classifier itself. That is a separate (harder) problem.

Teacher might ask: What would you improve with more time?
Answer: Test on modern datasets (CIC-IDS2017, UNSW-NB15), use embeddings or target encoding for categorical features, evaluate multiclass attack-type detection, and run adversarial evasion experiments using the top SHAP features.


9. Likely Teacher Questions & Model Answers

Q: What is your main contribution?

A: We didn't just build an IDS. We built an IDS + explainability pipeline + stability evaluation + security risk analysis. The contribution is showing that explainability in security requires trust evaluation, not just visualization.

Q: Why use deep learning if you need explainability?

A: Deep learning gives better detection performance. Post-hoc explainability (SHAP/LIME) lets us keep that performance while adding interpretability. Inherently interpretable models (decision trees, linear models) don't match the performance on this task.

Q: Why is PR-AUC more important than accuracy?

A: The dataset is imbalanced (especially test set: 66% anomaly). Accuracy would hide poor performance on the minority class. PR-AUC focuses on precision and recall of the positive class, which is what matters when false negatives (missed attacks) are costly.

Q: What is the practical takeaway for a SOC analyst?

A: The model can flag anomalies and show which features drove the decision (e.g., error rates, login status). The analyst uses this as supporting evidence, not as sole proof. Explanations are shown internally only, with access control and logging.

Q: Why binary classification instead of 5-class (DoS, Probe, R2L, U2R)?

A: Binary normal/anomaly is the core IDS problem and keeps the explainability evaluation clean. Multiclass is a natural next step β€” U2R has only ~52 samples in training, which is extremely challenging.

Q: What does it mean that SHAP and LIME disagree?

A: It means there is no single "true" explanation for a black-box model. Different methods make different assumptions. This is why we evaluate stability and faithfulness β€” to filter out unreliable explanations regardless of the method.

Q: How do you prevent attackers from using explanations against you?

A: Access control, rate limiting, logging, and aggregation. We also analyzed that the model relies partly on non-manipulable sensor-side statistics, which makes evasion harder than if it relied only on attacker-controlled fields.


10. Key Numbers Cheat Sheet

Memorize these for instant credibility:

Fact Number
Train records 151,165
Test records 34,394
Features 41
Best model LSTM
Best weighted F1 0.7800
Best ROC-AUC 0.9434
Best PR-AUC 0.9222
SHAP-LIME Spearman 0.0714
SHAP PCC at Ξ΅=0.01 0.6293 (stable)
SHAP PCC at Ξ΅=0.05 0.5676 (unstable)
LIME stochastic stability 0.6054 (borderline)
Top-10 masking confidence drop 0.4938
Random seed 42

11. Glossary of Terms

Term Definition
IDS Intrusion Detection System β€” monitors network traffic for malicious activity
X-IDS Explainable Intrusion Detection System
NSL-KDD Standard benchmark dataset for intrusion detection
MLP Multi-Layer Perceptron β€” fully connected neural network
LSTM Long Short-Term Memory β€” recurrent network with memory gates
1D-CNN One-dimensional convolutional network
SHAP Feature attribution based on Shapley values from game theory
LIME Local surrogate model for explaining individual predictions
ROC-AUC Threshold-independent ranking quality metric
PR-AUC Precision-recall area β€” informative for imbalanced data
Weighted F1 F1-score averaged by class support
PCC Pearson correlation β€” measures explanation similarity under perturbation
Spearman Rank correlation β€” compares feature ranking between methods
SENS_MAX Maximum explanation shift under bounded perturbation
Faithfulness Whether highlighted features actually affect model predictions
Evasion Attacker modifying traffic to avoid detection
Explanation leakage Attacker learning model behavior from exposed explanations

12. Final One-Liner to Close

"Explainability makes IDS useful, but only stability, faithfulness, and security analysis make it trustworthy."

Good luck on the presentation! πŸŽ“