cathrica commited on
Commit
38955fe
Β·
verified Β·
1 Parent(s): e43f573

Add project README with full structure and documentation

Browse files
Files changed (1) hide show
  1. README.md +101 -0
README.md ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Explainable Intrusion Detection System (X-IDS)
2
+
3
+ **ICCN-INE2 Deep Learning Project β€” Project 5: Explainable IDS**
4
+
5
+ ## Project Overview
6
+
7
+ This project builds an Intrusion Detection System using deep learning on the NSL-KDD dataset, then applies post-hoc explainability methods (SHAP, LIME) to make decisions interpretable. We evaluate explanation stability and analyze security implications of exposing model explanations.
8
+
9
+ ## Core Research Question
10
+
11
+ > *Can we make IDS decisions interpretable without compromising detection performance, and are these explanations stable enough to be trusted in security-critical settings?*
12
+
13
+ ## Repository Structure
14
+
15
+ ```
16
+ .
17
+ β”œβ”€β”€ README.md # This file
18
+ β”œβ”€β”€ docs/
19
+ β”‚ β”œβ”€β”€ project_plan.md # Detailed project plan & methodology
20
+ β”‚ β”œβ”€β”€ threat_model.md # Threat model document
21
+ β”‚ └── architecture.md # Model architecture & design choices
22
+ β”œβ”€β”€ data/
23
+ β”‚ └── preprocess.py # Data loading & preprocessing pipeline
24
+ β”œβ”€β”€ models/
25
+ β”‚ β”œβ”€β”€ mlp_baseline.py # MLP baseline model
26
+ β”‚ β”œβ”€β”€ lstm_model.py # LSTM variant
27
+ β”‚ └── cnn1d_model.py # 1D-CNN variant
28
+ β”œβ”€β”€ explainability/
29
+ β”‚ β”œβ”€β”€ shap_analysis.py # SHAP explanations
30
+ β”‚ β”œβ”€β”€ lime_analysis.py # LIME explanations
31
+ β”‚ └── stability_eval.py # Explanation stability evaluation
32
+ β”œβ”€β”€ experiments/
33
+ β”‚ β”œβ”€β”€ train_baseline.py # Training script
34
+ β”‚ β”œβ”€β”€ run_explainability.py # Run all XAI methods
35
+ β”‚ └── run_stability.py # Stability evaluation experiments
36
+ β”œβ”€β”€ results/ # Generated results (figures, metrics)
37
+ β”œβ”€β”€ requirements.txt # Dependencies
38
+ └── reproduce.sh # One-command reproducibility script
39
+ ```
40
+
41
+ ## Quick Start
42
+
43
+ ```bash
44
+ # Install dependencies
45
+ pip install -r requirements.txt
46
+
47
+ # Reproduce all experiments
48
+ bash reproduce.sh
49
+
50
+ # Or run step by step:
51
+ python data/preprocess.py # Download & preprocess NSL-KDD
52
+ python experiments/train_baseline.py # Train 3 models (MLP, LSTM, CNN)
53
+ python explainability/shap_analysis.py # SHAP + LIME analysis
54
+ python explainability/stability_eval.py # Stability evaluation
55
+ ```
56
+
57
+ ## Dataset
58
+
59
+ **NSL-KDD** (Network Security Laboratory - KDD) β€” an improved version of KDD Cup 99.
60
+ - Source: [UNB Canadian Institute for Cybersecurity](https://www.unb.ca/cic/datasets/nsl.html)
61
+ - HF Hub: [`Mireu-Lab/NSL-KDD`](https://huggingface.co/datasets/Mireu-Lab/NSL-KDD)
62
+ - Train: 151,165 records | Test: 34,394 records
63
+ - 41 features (3 categorical + 38 numerical)
64
+ - Binary classification: Normal vs Anomaly
65
+ - 5-class: Normal, DoS, Probe, R2L, U2R
66
+
67
+ ## Models
68
+
69
+ | Model | Architecture | Parameters |
70
+ |-------|-------------|------------|
71
+ | MLP | 41β†’256β†’128β†’64β†’2 with BatchNorm + Dropout | ~50K |
72
+ | LSTM | 41-step sequence β†’ 2-layer LSTM(64) β†’ FC(2) | ~35K |
73
+ | 1D-CNN | Conv1d(64)→Conv1d(128)→AvgPool→FC(2) | ~45K |
74
+
75
+ ## Explainability Methods
76
+
77
+ - **SHAP** (SHapley Additive exPlanations): KernelExplainer (model-agnostic)
78
+ - **LIME** (Local Interpretable Model-agnostic Explanations): Tabular explainer with perturbation sampling
79
+
80
+ ## Evaluation Metrics
81
+
82
+ - **Classification**: Precision, Recall, F1-Score (per-class + weighted), PR-AUC, ROC-AUC
83
+ - **Explanation Quality**: Faithfulness (feature masking), Sensitivity (SENS_MAX), Stability (PCC across perturbations)
84
+
85
+ ## Reproducibility
86
+
87
+ - Random seed: 42 (fixed across all experiments)
88
+ - Python 3.10+ | PyTorch 2.x | scikit-learn 1.x
89
+ - All preprocessing steps documented
90
+ - Commands in `reproduce.sh`
91
+
92
+ ## References
93
+
94
+ 1. Tavallaee et al. (2009). *A Detailed Analysis of the KDD CUP 99 Data Set.* IEEE Symposium on CISDA.
95
+ 2. Lundberg & Lee (2017). *A Unified Approach to Interpreting Model Predictions.* NeurIPS.
96
+ 3. Ribeiro et al. (2016). *"Why Should I Trust You?": Explaining the Predictions of Any Classifier.* KDD.
97
+ 4. Huang et al. (2022). *SAFARI: Versatile and Efficient Evaluations for Robustness of Interpretability.* ICCV.
98
+
99
+ ## Author
100
+
101
+ ICCN-INE2 Student Project