# Teacher Presentation Guide — Explainable Intrusion Detection System (X-IDS)

**Repo:** [cathrica/deep-learning-project](https://huggingface.co/cathrica/deep-learning-project)  
**Project:** ICCN-INE2 Deep Learning — Project 5: Explainable IDS  
**Dataset:** NSL-KDD | **Models:** MLP, LSTM, 1D-CNN | **XAI:** SHAP + LIME

---

## 1. The 30-Second Elevator Pitch

> We built an **Explainable Intrusion Detection System** that detects malicious network connections using deep learning, then explains **why** each decision was made using SHAP and LIME. We also evaluated whether those explanations are stable, faithful, and safe to expose in a security environment.
>
> Best model: **LSTM** with weighted F1 = **0.7800**, ROC-AUC = **0.9434**, PR-AUC = **0.9222**.
> SHAP and LIME did **not** agree (Spearman = 0.0714), and explanations lost stability as input perturbations grew. Security analysis showed that exposing raw explanations can help attackers evade detection, so access must be controlled.

---

## 2. Why This Project Matters (Motivation)

- Traditional IDS alerts are black-box — analysts get a flag but no evidence.
- Deep learning improves detection but hides reasoning.
- In cybersecurity, a false positive wastes analyst time; a false negative lets attacks through.
- Explainability can help **defenders prioritize alerts** and **verify model behavior**.
- **Risk:** if attackers see which features matter most, they can craft evasion attacks.
- Our project asks: *Can we explain IDS decisions without destroying trust or security?*

---

## 3. Dataset — NSL-KDD

| Property | Value |
|---|---|
| Source | UNB Canadian Institute for Cybersecurity |
| HF Hub | `Mireu-Lab/NSL-KDD` |
| Records | Train: **151,165** / Test: **34,394** |
| Features | **41** (3 categorical + 38 numerical) |
| Categorical | `protocol_type` (3), `service` (70), `flag` (11) |
| Task | Binary classification: **Normal vs Anomaly** |
| Train distribution | 53% Normal / 47% Anomaly |
| Test distribution | 34% Normal / 66% Anomaly |

**Important detail:** The test set has a **distribution shift** — more anomalies than training. This makes generalization harder and is worth mentioning as a realistic challenge.

### Preprocessing Choices

| Step | Method | Why |
|---|---|---|
| Categorical encoding | **LabelEncoder** | Preserves 41-feature structure so SHAP/LIME outputs map cleanly to original features. OneHot would explode to 84 columns and hurt interpretability. |
| Scaling | **MinMaxScaler [0,1]** | Features have wildly different ranges (e.g., `src_bytes` up to 1.3B vs `serror_rate` 0–1). Scaling stabilizes training and makes ε-perturbations meaningful for stability testing. |
| Reproducibility | Seed **42**, fixed splits | Every experiment is deterministic. |

**Teacher might ask:** *Why LabelEncoder instead of OneHot?*  
**Answer:** OneHot would create 84 binary features. SHAP and LIME would then explain binary columns instead of semantic features, making interpretation much harder for analysts. The trade-off is artificial ordering in categorical variables, which we acknowledge as a limitation.

---

## 4. Models & Architecture Choices

We compared three lightweight architectures with the same training config:

| Parameter | Value |
|---|---|
| Optimizer | Adam |
| Learning rate | 1e-3 |
| Weight decay | 1e-4 |
| Batch size | 256 |
| Epochs | 50 |
| Loss | CrossEntropyLoss with inverse-frequency class weights |

### 4.1 MLP (Baseline)

```
Input(41) → Linear(256) → BatchNorm → ReLU → Dropout(0.3)
          → Linear(128) → BatchNorm → ReLU → Dropout(0.2)
          → Linear(64)  → ReLU
          → Linear(2 classes)
```

- **Parameters:** ~50K
- **Why:** Standard tabular baseline. BatchNorm stabilizes gradients; dropout regularizes.

### 4.2 LSTM (Best Performer)

```
Input(41) → reshape to (41, 1) → 2-layer LSTM(hidden=64, dropout=0.2)
          → last hidden state → Linear(2 classes)
```

- **Parameters:** ~35K
- **Why:** Treats the 41 features as a sequence. NSL-KDD features are semantically grouped (basic → content → time-based → host-based). LSTM can learn dependencies between these groups. This inductive bias helped it generalize best despite having fewer parameters than the CNN.

### 4.3 1D-CNN

```
Input(41) → reshape to (1, 41) → Conv1d(64, k=3, pad=1) → ReLU
          → Conv1d(128, k=3, pad=1) → ReLU → AdaptiveAvgPool1d(8)
          → Flatten → Linear(64) → ReLU → Linear(2 classes)
```

- **Parameters:** ~45K
- **Why:** Learns local patterns between neighboring features. Good for rate-based feature blocks. However, it underperformed the LSTM, showing that more parameters ≠ better if the architecture bias mismatches the data structure.

### Performance Results

| Model | Weighted F1 | ROC-AUC | PR-AUC | Training Time |
|---|---|---|---|---|
| **LSTM** | **0.7800** | **0.9434** | **0.9222** | 162.9s |
| MLP | 0.7639 | 0.9231 | 0.8699 | 145.1s |
| 1D-CNN | 0.7579 | 0.9410 | 0.9182 | 173.1s |

**Teacher might ask:** *Why did LSTM win despite fewer parameters?*  
**Answer:** The LSTM's sequential processing matches the semantic grouping of NSL-KDD features. The CNN assumes local spatial patterns, which is less natural for this tabular feature ordering. The MLP treats all features independently, missing group-level dependencies.

---

## 5. Explainability — SHAP & LIME

We used **post-hoc explainability** (explaining a trained model, not building an interpretable one) because deep learning models are more expressive.

### SHAP (SHapley Additive exPlanations)

- **Method:** KernelExplainer (model-agnostic)
- **What it does:** Estimates how much each feature pushes the prediction away from the average prediction, based on game-theoretic Shapley values.
- **Top anomaly features:** `logged_in` (0.0950), `dst_host_rerror_rate` (0.0619), `protocol_type` (0.0573), `rerror_rate` (0.0479), `dst_host_serror_rate` (0.0427)
- **Why these make sense:** Login status and error rates are classic intrusion indicators.

### LIME (Local Interpretable Model-Agnostic Explanations)

- **Method:** LimeTabularExplainer
- **What it does:** Perturbs the input, observes predictions, fits a simple linear model locally to approximate the black-box model near that point.
- **Top features (frequency in 30 explanations):** `wrong_fragment` (30/30), `rerror_rate` (30/30), `protocol_type` (30/30), `dst_host_rerror_rate` (30/30)

### Key Finding: SHAP vs LIME Disagreement

| Metric | Value |
|---|---|
| Spearman rank correlation | **0.0714** |
| p-value | 0.8665 |

**Interpretation:** The two methods rank features almost completely differently. This is critical: **explanations are method-dependent**. You cannot trust one method blindly.

**Teacher might ask:** *Which method do you trust more?*  
**Answer:** SHAP has stronger theoretical foundations (game theory, consistency properties) and is deterministic. LIME is intuitive but stochastic and sensitive to perturbation settings. For security-critical decisions, I would prefer SHAP but still validate with stability and faithfulness tests.

---

## 6. Stability & Faithfulness

An explanation is only useful if it is **reliable**.

### 6.1 Stability — Perturbation Test

We added small ε-bounded noise to inputs and measured how much SHAP attributions changed using **Pearson Correlation Coefficient (PCC)**.

| Epsilon | PCC | Verdict |
|---|---|---|
| 0.01 | **0.6293** | ✅ Stable (≥ 0.6 threshold) |
| 0.03 | 0.5861 | ❌ Unstable |
| 0.05 | 0.5676 | ❌ Unstable |

**Threshold 0.6** is inspired by the SAFARI framework (Huang et al. 2022).

**LIME stochastic stability:** Mean Spearman across 20 runs = **0.6054** — borderline stable.

### 6.2 Faithfulness — Feature Masking

If SHAP says a feature is important, removing it should hurt confidence.

| Masked features | Confidence drop |
|---|---|
| Top 3 | 0.3355 |
| Top 5 | 0.3592 |
| Top 10 | **0.4938** |

**Interpretation:** The more top features we mask, the bigger the confidence drop. SHAP is identifying features the model actually uses.

**Teacher might ask:** *What is the difference between stability and faithfulness?*  
**Answer:** Stability asks: "Do similar inputs get similar explanations?" Faithfulness asks: "Does the explanation actually reflect what the model cares about?" You need both for a trustworthy explanation.

---

## 7. Security Implications

### 7.1 The Dual-Edged Sword

- **Good:** Explanations help analysts verify alerts and prioritize investigations.
- **Bad:** If attackers see explanations, they learn which features to manipulate.

### 7.2 Feature Manipulability

| Category | Manipulable? | Examples |
|---|---|---|
| Packet content | ✅ Yes | `src_bytes`, `dst_bytes`, `hot` |
| Connection behavior | ⚠️ Partially | `duration`, `count`, `srv_count` |
| Protocol fields | ⚠️ Constrained | `protocol_type`, `flag` |
| Network statistics | ❌ No | `dst_host_count`, `dst_host_same_srv_rate` |
| Error rates | ⚠️ Partially | `serror_rate`, `rerror_rate` |

**Good news:** Our top SHAP features include many **non-manipulable** host-based statistics, which makes evasion harder than if the model relied only on attacker-controlled payload fields.

### 7.3 Attack Scenarios

1. **Evasion via explanation leakage:** Attacker queries the explanation API, sees that `serror_rate` and `count` drive detection, then crafts traffic to spoof those features.
2. **LIME inconsistency exploitation:** LIME gives different rankings on rerun. Analysts waste time chasing inconsistent explanations.
3. **Backdoor with clean explanations:** A poisoned model misclassifies triggered inputs but shows plausible benign SHAP values.

### 7.4 Mitigations

- Restrict explanation access to trusted analysts
- Rate-limit explanation APIs
- Log all explanation queries
- Aggregate explanations instead of exposing raw per-sample values
- Never replace rule-based IDS with ML explanations alone

---

## 8. Limitations (Say These Confidently)

1. **Dataset age:** NSL-KDD is a benchmark from 2009. Modern traffic (TLS 1.3, encrypted payloads, IoT protocols) looks very different.
2. **LabelEncoder trade-off:** Preserves interpretability but imposes artificial ordering on categories.
3. **Computational cost:** Kernel SHAP is expensive; we used sampled subsets.
4. **LIME stochasticity:** Results vary across random seeds.
5. **Scope:** We evaluated explanation quality, not adversarial robustness of the classifier itself. That is a separate (harder) problem.

**Teacher might ask:** *What would you improve with more time?*  
**Answer:** Test on modern datasets (CIC-IDS2017, UNSW-NB15), use embeddings or target encoding for categorical features, evaluate multiclass attack-type detection, and run adversarial evasion experiments using the top SHAP features.

---

## 9. Likely Teacher Questions & Model Answers

### Q: What is your main contribution?
**A:** We didn't just build an IDS. We built an IDS + explainability pipeline + stability evaluation + security risk analysis. The contribution is showing that explainability in security requires trust evaluation, not just visualization.

### Q: Why use deep learning if you need explainability?
**A:** Deep learning gives better detection performance. Post-hoc explainability (SHAP/LIME) lets us keep that performance while adding interpretability. Inherently interpretable models (decision trees, linear models) don't match the performance on this task.

### Q: Why is PR-AUC more important than accuracy?
**A:** The dataset is imbalanced (especially test set: 66% anomaly). Accuracy would hide poor performance on the minority class. PR-AUC focuses on precision and recall of the positive class, which is what matters when false negatives (missed attacks) are costly.

### Q: What is the practical takeaway for a SOC analyst?
**A:** The model can flag anomalies and show which features drove the decision (e.g., error rates, login status). The analyst uses this as supporting evidence, not as sole proof. Explanations are shown internally only, with access control and logging.

### Q: Why binary classification instead of 5-class (DoS, Probe, R2L, U2R)?
**A:** Binary normal/anomaly is the core IDS problem and keeps the explainability evaluation clean. Multiclass is a natural next step — U2R has only ~52 samples in training, which is extremely challenging.

### Q: What does it mean that SHAP and LIME disagree?
**A:** It means there is no single "true" explanation for a black-box model. Different methods make different assumptions. This is why we evaluate stability and faithfulness — to filter out unreliable explanations regardless of the method.

### Q: How do you prevent attackers from using explanations against you?
**A:** Access control, rate limiting, logging, and aggregation. We also analyzed that the model relies partly on non-manipulable sensor-side statistics, which makes evasion harder than if it relied only on attacker-controlled fields.

---

## 10. Key Numbers Cheat Sheet

Memorize these for instant credibility:

| Fact | Number |
|---|---|
| Train records | **151,165** |
| Test records | **34,394** |
| Features | **41** |
| Best model | **LSTM** |
| Best weighted F1 | **0.7800** |
| Best ROC-AUC | **0.9434** |
| Best PR-AUC | **0.9222** |
| SHAP-LIME Spearman | **0.0714** |
| SHAP PCC at ε=0.01 | **0.6293** (stable) |
| SHAP PCC at ε=0.05 | **0.5676** (unstable) |
| LIME stochastic stability | **0.6054** (borderline) |
| Top-10 masking confidence drop | **0.4938** |
| Random seed | **42** |

---

## 11. Glossary of Terms

| Term | Definition |
|---|---|
| **IDS** | Intrusion Detection System — monitors network traffic for malicious activity |
| **X-IDS** | Explainable Intrusion Detection System |
| **NSL-KDD** | Standard benchmark dataset for intrusion detection |
| **MLP** | Multi-Layer Perceptron — fully connected neural network |
| **LSTM** | Long Short-Term Memory — recurrent network with memory gates |
| **1D-CNN** | One-dimensional convolutional network |
| **SHAP** | Feature attribution based on Shapley values from game theory |
| **LIME** | Local surrogate model for explaining individual predictions |
| **ROC-AUC** | Threshold-independent ranking quality metric |
| **PR-AUC** | Precision-recall area — informative for imbalanced data |
| **Weighted F1** | F1-score averaged by class support |
| **PCC** | Pearson correlation — measures explanation similarity under perturbation |
| **Spearman** | Rank correlation — compares feature ranking between methods |
| **SENS_MAX** | Maximum explanation shift under bounded perturbation |
| **Faithfulness** | Whether highlighted features actually affect model predictions |
| **Evasion** | Attacker modifying traffic to avoid detection |
| **Explanation leakage** | Attacker learning model behavior from exposed explanations |

---

## 12. Final One-Liner to Close

> *"Explainability makes IDS useful, but only stability, faithfulness, and security analysis make it trustworthy."*

Good luck on the presentation! 🎓