Add project README with full structure and documentation
Browse files
README.md
ADDED
|
@@ -0,0 +1,101 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Explainable Intrusion Detection System (X-IDS)
|
| 2 |
+
|
| 3 |
+
**ICCN-INE2 Deep Learning Project β Project 5: Explainable IDS**
|
| 4 |
+
|
| 5 |
+
## Project Overview
|
| 6 |
+
|
| 7 |
+
This project builds an Intrusion Detection System using deep learning on the NSL-KDD dataset, then applies post-hoc explainability methods (SHAP, LIME) to make decisions interpretable. We evaluate explanation stability and analyze security implications of exposing model explanations.
|
| 8 |
+
|
| 9 |
+
## Core Research Question
|
| 10 |
+
|
| 11 |
+
> *Can we make IDS decisions interpretable without compromising detection performance, and are these explanations stable enough to be trusted in security-critical settings?*
|
| 12 |
+
|
| 13 |
+
## Repository Structure
|
| 14 |
+
|
| 15 |
+
```
|
| 16 |
+
.
|
| 17 |
+
βββ README.md # This file
|
| 18 |
+
βββ docs/
|
| 19 |
+
β βββ project_plan.md # Detailed project plan & methodology
|
| 20 |
+
β βββ threat_model.md # Threat model document
|
| 21 |
+
β βββ architecture.md # Model architecture & design choices
|
| 22 |
+
βββ data/
|
| 23 |
+
β βββ preprocess.py # Data loading & preprocessing pipeline
|
| 24 |
+
βββ models/
|
| 25 |
+
β βββ mlp_baseline.py # MLP baseline model
|
| 26 |
+
β βββ lstm_model.py # LSTM variant
|
| 27 |
+
β βββ cnn1d_model.py # 1D-CNN variant
|
| 28 |
+
βββ explainability/
|
| 29 |
+
β βββ shap_analysis.py # SHAP explanations
|
| 30 |
+
β βββ lime_analysis.py # LIME explanations
|
| 31 |
+
β βββ stability_eval.py # Explanation stability evaluation
|
| 32 |
+
βββ experiments/
|
| 33 |
+
β βββ train_baseline.py # Training script
|
| 34 |
+
β βββ run_explainability.py # Run all XAI methods
|
| 35 |
+
β βββ run_stability.py # Stability evaluation experiments
|
| 36 |
+
βββ results/ # Generated results (figures, metrics)
|
| 37 |
+
βββ requirements.txt # Dependencies
|
| 38 |
+
βββ reproduce.sh # One-command reproducibility script
|
| 39 |
+
```
|
| 40 |
+
|
| 41 |
+
## Quick Start
|
| 42 |
+
|
| 43 |
+
```bash
|
| 44 |
+
# Install dependencies
|
| 45 |
+
pip install -r requirements.txt
|
| 46 |
+
|
| 47 |
+
# Reproduce all experiments
|
| 48 |
+
bash reproduce.sh
|
| 49 |
+
|
| 50 |
+
# Or run step by step:
|
| 51 |
+
python data/preprocess.py # Download & preprocess NSL-KDD
|
| 52 |
+
python experiments/train_baseline.py # Train 3 models (MLP, LSTM, CNN)
|
| 53 |
+
python explainability/shap_analysis.py # SHAP + LIME analysis
|
| 54 |
+
python explainability/stability_eval.py # Stability evaluation
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
## Dataset
|
| 58 |
+
|
| 59 |
+
**NSL-KDD** (Network Security Laboratory - KDD) β an improved version of KDD Cup 99.
|
| 60 |
+
- Source: [UNB Canadian Institute for Cybersecurity](https://www.unb.ca/cic/datasets/nsl.html)
|
| 61 |
+
- HF Hub: [`Mireu-Lab/NSL-KDD`](https://huggingface.co/datasets/Mireu-Lab/NSL-KDD)
|
| 62 |
+
- Train: 151,165 records | Test: 34,394 records
|
| 63 |
+
- 41 features (3 categorical + 38 numerical)
|
| 64 |
+
- Binary classification: Normal vs Anomaly
|
| 65 |
+
- 5-class: Normal, DoS, Probe, R2L, U2R
|
| 66 |
+
|
| 67 |
+
## Models
|
| 68 |
+
|
| 69 |
+
| Model | Architecture | Parameters |
|
| 70 |
+
|-------|-------------|------------|
|
| 71 |
+
| MLP | 41β256β128β64β2 with BatchNorm + Dropout | ~50K |
|
| 72 |
+
| LSTM | 41-step sequence β 2-layer LSTM(64) β FC(2) | ~35K |
|
| 73 |
+
| 1D-CNN | Conv1d(64)βConv1d(128)βAvgPoolβFC(2) | ~45K |
|
| 74 |
+
|
| 75 |
+
## Explainability Methods
|
| 76 |
+
|
| 77 |
+
- **SHAP** (SHapley Additive exPlanations): KernelExplainer (model-agnostic)
|
| 78 |
+
- **LIME** (Local Interpretable Model-agnostic Explanations): Tabular explainer with perturbation sampling
|
| 79 |
+
|
| 80 |
+
## Evaluation Metrics
|
| 81 |
+
|
| 82 |
+
- **Classification**: Precision, Recall, F1-Score (per-class + weighted), PR-AUC, ROC-AUC
|
| 83 |
+
- **Explanation Quality**: Faithfulness (feature masking), Sensitivity (SENS_MAX), Stability (PCC across perturbations)
|
| 84 |
+
|
| 85 |
+
## Reproducibility
|
| 86 |
+
|
| 87 |
+
- Random seed: 42 (fixed across all experiments)
|
| 88 |
+
- Python 3.10+ | PyTorch 2.x | scikit-learn 1.x
|
| 89 |
+
- All preprocessing steps documented
|
| 90 |
+
- Commands in `reproduce.sh`
|
| 91 |
+
|
| 92 |
+
## References
|
| 93 |
+
|
| 94 |
+
1. Tavallaee et al. (2009). *A Detailed Analysis of the KDD CUP 99 Data Set.* IEEE Symposium on CISDA.
|
| 95 |
+
2. Lundberg & Lee (2017). *A Unified Approach to Interpreting Model Predictions.* NeurIPS.
|
| 96 |
+
3. Ribeiro et al. (2016). *"Why Should I Trust You?": Explaining the Predictions of Any Classifier.* KDD.
|
| 97 |
+
4. Huang et al. (2022). *SAFARI: Versatile and Efficient Evaluations for Robustness of Interpretability.* ICCV.
|
| 98 |
+
|
| 99 |
+
## Author
|
| 100 |
+
|
| 101 |
+
ICCN-INE2 Student Project
|