| # Threat Model β Explainable IDS |
|
|
| ## 1. System Description |
|
|
| An ML-based Intrusion Detection System (IDS) monitors network traffic and classifies connections as Normal or one of four attack categories (DoS, Probe, R2L, U2R). The system uses post-hoc explainability methods (SHAP, LIME) to provide security analysts with interpretable justifications for each alert. |
|
|
| ## 2. Assets Under Protection |
|
|
| | Asset | Value | Sensitivity | |
| |-------|-------|-------------| |
| | Network integrity | High | Disruption β service outage | |
| | IDS model parameters | Medium | Leak β evasion knowledge | |
| | SHAP/LIME explanations | Medium | Leak β feature manipulation strategy | |
| | Training data statistics | Low-Medium | Leak β distribution knowledge for crafting attacks | |
|
|
| ## 3. Adversary Profiles |
|
|
| ### Adversary A: Network Attacker (External) |
| - **Goal**: Bypass IDS detection β send malicious traffic classified as "Normal" |
| - **Capabilities**: Can craft and modify network packets (control over src_bytes, dst_bytes, duration, protocol, count, etc.) |
| - **Knowledge**: Black-box (no model access) or Grey-box (knows model type + feature set) |
| - **Constraints**: Cannot modify all features β some are protocol-determined (e.g., protocol_type, flag) or network-infrastructure-bound (e.g., dst_host_count depends on actual connections) |
| |
| ### Adversary B: Explanation Exploiter (Internal/External) |
| - **Goal**: Use SHAP/LIME output to learn which features the model relies on, then craft evasion attacks |
| - **Capabilities**: Can query the model and observe explanations (e.g., deployed as analyst dashboard) |
| - **Knowledge**: White-box on explanations, grey-box on model |
| - **Attack**: Query with diverse inputs β aggregate SHAP values β identify top features β manipulate those features in attack traffic |
| |
| ### Adversary C: Training Data Poisoner (Supply Chain) |
| - **Goal**: Insert backdoor so model shows clean explanations but misclassifies triggered inputs |
| - **Capabilities**: Can inject samples into training set |
| - **Relevance**: Even explanations can be fooled if the model itself is compromised (Baniecki et al., 2022) |
| |
| ## 4. Feature Manipulability Analysis |
| |
| Critical for realistic adversarial evaluation β not all 41 NSL-KDD features can be freely modified by an attacker. |
| |
| | Feature Category | Manipulable? | Examples | Justification | |
| |-----------------|-------------|----------|---------------| |
| | **Packet content** | β
Yes | `src_bytes`, `dst_bytes`, `hot`, `num_failed_logins` | Attacker controls payload | |
| | **Connection behavior** | β οΈ Partially | `duration`, `count`, `srv_count` | Attacker can slow/speed connections but within limits | |
| | **Protocol fields** | β οΈ Constrained | `protocol_type`, `flag` | Must be valid TCP/UDP/ICMP; flag must match connection state | |
| | **Network statistics** | β No | `dst_host_count`, `dst_host_srv_count` | Aggregated by IDS sensor, not attacker-controlled | |
| | **Error rates** | β οΈ Partially | `serror_rate`, `rerror_rate` | Attacker can trigger errors but rates depend on overall traffic | |
|
|
| **Implication for SHAP/LIME**: If the model relies heavily on non-manipulable features (dst_host_count, dst_host_same_srv_rate), it is more robust against evasion. If it relies on manipulable features (src_bytes, duration), evasion is easier. |
| |
| ## 5. Attack Scenarios |
| |
| ### Scenario 1: Evasion via Explanation Leakage |
| 1. Attacker queries IDS explanation API with known attack samples |
| 2. SHAP reveals `serror_rate` (weight=0.45) and `count` (weight=0.32) are top features for DoS detection |
| 3. Attacker crafts DoS traffic with low serror_rate (connection completion spoofing) and varied count |
| 4. IDS misclassifies as Normal |
| |
| ### Scenario 2: LIME Instability Exploitation |
| 1. LIME produces different top features for the same input across runs (stochastic) |
| 2. Analyst sees Feature A as top in run 1, Feature B in run 2 |
| 3. Inconsistent investigation β missed detections or wasted resources |
| |
| ### Scenario 3: Backdoor with Clean Explanations |
| 1. Poisoned training data contains trigger pattern (e.g., specific src_bytes + service combination) |
| 2. Model correctly classifies and explains normal traffic |
| 3. On triggered inputs: misclassifies as Normal AND SHAP shows plausible benign features |
| 4. Analyst trusts explanation β attack goes undetected |
|
|
| ## 6. Security Requirements |
|
|
| | Requirement | Priority | Mitigation | |
| |-------------|----------|------------| |
| | Explanation access control | High | Rate-limit explanation API, log queries | |
| | Explanation consistency | High | Prefer SHAP (deterministic) over LIME for critical decisions | |
| | Model integrity verification | Medium | Track training data provenance, validate model fingerprints | |
| | Robust feature reliance | Medium | Verify model doesn't over-rely on manipulable features | |
| | Defense-in-depth | High | Explanations supplement (don't replace) rule-based IDS | |
|
|
| ## 7. Assumptions & Scope |
|
|
| - NSL-KDD is a benchmark dataset β real deployment would require domain-specific feature analysis |
| - We evaluate post-hoc explainability only (not inherently interpretable models) |
| - We focus on explanation reliability, not adversarial robustness of the classifier itself (that's Project 1) |
|
|