Threat Model β Explainable IDS
1. System Description
An ML-based Intrusion Detection System (IDS) monitors network traffic and classifies connections as Normal or one of four attack categories (DoS, Probe, R2L, U2R). The system uses post-hoc explainability methods (SHAP, LIME) to provide security analysts with interpretable justifications for each alert.
2. Assets Under Protection
| Asset | Value | Sensitivity |
|---|---|---|
| Network integrity | High | Disruption β service outage |
| IDS model parameters | Medium | Leak β evasion knowledge |
| SHAP/LIME explanations | Medium | Leak β feature manipulation strategy |
| Training data statistics | Low-Medium | Leak β distribution knowledge for crafting attacks |
3. Adversary Profiles
Adversary A: Network Attacker (External)
- Goal: Bypass IDS detection β send malicious traffic classified as "Normal"
- Capabilities: Can craft and modify network packets (control over src_bytes, dst_bytes, duration, protocol, count, etc.)
- Knowledge: Black-box (no model access) or Grey-box (knows model type + feature set)
- Constraints: Cannot modify all features β some are protocol-determined (e.g., protocol_type, flag) or network-infrastructure-bound (e.g., dst_host_count depends on actual connections)
Adversary B: Explanation Exploiter (Internal/External)
- Goal: Use SHAP/LIME output to learn which features the model relies on, then craft evasion attacks
- Capabilities: Can query the model and observe explanations (e.g., deployed as analyst dashboard)
- Knowledge: White-box on explanations, grey-box on model
- Attack: Query with diverse inputs β aggregate SHAP values β identify top features β manipulate those features in attack traffic
Adversary C: Training Data Poisoner (Supply Chain)
- Goal: Insert backdoor so model shows clean explanations but misclassifies triggered inputs
- Capabilities: Can inject samples into training set
- Relevance: Even explanations can be fooled if the model itself is compromised (Baniecki et al., 2022)
4. Feature Manipulability Analysis
Critical for realistic adversarial evaluation β not all 41 NSL-KDD features can be freely modified by an attacker.
| Feature Category | Manipulable? | Examples | Justification |
|---|---|---|---|
| Packet content | β Yes | src_bytes, dst_bytes, hot, num_failed_logins |
Attacker controls payload |
| Connection behavior | β οΈ Partially | duration, count, srv_count |
Attacker can slow/speed connections but within limits |
| Protocol fields | β οΈ Constrained | protocol_type, flag |
Must be valid TCP/UDP/ICMP; flag must match connection state |
| Network statistics | β No | dst_host_count, dst_host_srv_count |
Aggregated by IDS sensor, not attacker-controlled |
| Error rates | β οΈ Partially | serror_rate, rerror_rate |
Attacker can trigger errors but rates depend on overall traffic |
Implication for SHAP/LIME: If the model relies heavily on non-manipulable features (dst_host_count, dst_host_same_srv_rate), it is more robust against evasion. If it relies on manipulable features (src_bytes, duration), evasion is easier.
5. Attack Scenarios
Scenario 1: Evasion via Explanation Leakage
- Attacker queries IDS explanation API with known attack samples
- SHAP reveals
serror_rate(weight=0.45) andcount(weight=0.32) are top features for DoS detection - Attacker crafts DoS traffic with low serror_rate (connection completion spoofing) and varied count
- IDS misclassifies as Normal
Scenario 2: LIME Instability Exploitation
- LIME produces different top features for the same input across runs (stochastic)
- Analyst sees Feature A as top in run 1, Feature B in run 2
- Inconsistent investigation β missed detections or wasted resources
Scenario 3: Backdoor with Clean Explanations
- Poisoned training data contains trigger pattern (e.g., specific src_bytes + service combination)
- Model correctly classifies and explains normal traffic
- On triggered inputs: misclassifies as Normal AND SHAP shows plausible benign features
- Analyst trusts explanation β attack goes undetected
6. Security Requirements
| Requirement | Priority | Mitigation |
|---|---|---|
| Explanation access control | High | Rate-limit explanation API, log queries |
| Explanation consistency | High | Prefer SHAP (deterministic) over LIME for critical decisions |
| Model integrity verification | Medium | Track training data provenance, validate model fingerprints |
| Robust feature reliance | Medium | Verify model doesn't over-rely on manipulable features |
| Defense-in-depth | High | Explanations supplement (don't replace) rule-based IDS |
7. Assumptions & Scope
- NSL-KDD is a benchmark dataset β real deployment would require domain-specific feature analysis
- We evaluate post-hoc explainability only (not inherently interpretable models)
- We focus on explanation reliability, not adversarial robustness of the classifier itself (that's Project 1)