deep-learning-project / docs /threat_model.md
cathrica's picture
Add threat model document
1136020 verified

Threat Model β€” Explainable IDS

1. System Description

An ML-based Intrusion Detection System (IDS) monitors network traffic and classifies connections as Normal or one of four attack categories (DoS, Probe, R2L, U2R). The system uses post-hoc explainability methods (SHAP, LIME) to provide security analysts with interpretable justifications for each alert.

2. Assets Under Protection

Asset Value Sensitivity
Network integrity High Disruption β†’ service outage
IDS model parameters Medium Leak β†’ evasion knowledge
SHAP/LIME explanations Medium Leak β†’ feature manipulation strategy
Training data statistics Low-Medium Leak β†’ distribution knowledge for crafting attacks

3. Adversary Profiles

Adversary A: Network Attacker (External)

  • Goal: Bypass IDS detection β€” send malicious traffic classified as "Normal"
  • Capabilities: Can craft and modify network packets (control over src_bytes, dst_bytes, duration, protocol, count, etc.)
  • Knowledge: Black-box (no model access) or Grey-box (knows model type + feature set)
  • Constraints: Cannot modify all features β€” some are protocol-determined (e.g., protocol_type, flag) or network-infrastructure-bound (e.g., dst_host_count depends on actual connections)

Adversary B: Explanation Exploiter (Internal/External)

  • Goal: Use SHAP/LIME output to learn which features the model relies on, then craft evasion attacks
  • Capabilities: Can query the model and observe explanations (e.g., deployed as analyst dashboard)
  • Knowledge: White-box on explanations, grey-box on model
  • Attack: Query with diverse inputs β†’ aggregate SHAP values β†’ identify top features β†’ manipulate those features in attack traffic

Adversary C: Training Data Poisoner (Supply Chain)

  • Goal: Insert backdoor so model shows clean explanations but misclassifies triggered inputs
  • Capabilities: Can inject samples into training set
  • Relevance: Even explanations can be fooled if the model itself is compromised (Baniecki et al., 2022)

4. Feature Manipulability Analysis

Critical for realistic adversarial evaluation β€” not all 41 NSL-KDD features can be freely modified by an attacker.

Feature Category Manipulable? Examples Justification
Packet content βœ… Yes src_bytes, dst_bytes, hot, num_failed_logins Attacker controls payload
Connection behavior ⚠️ Partially duration, count, srv_count Attacker can slow/speed connections but within limits
Protocol fields ⚠️ Constrained protocol_type, flag Must be valid TCP/UDP/ICMP; flag must match connection state
Network statistics ❌ No dst_host_count, dst_host_srv_count Aggregated by IDS sensor, not attacker-controlled
Error rates ⚠️ Partially serror_rate, rerror_rate Attacker can trigger errors but rates depend on overall traffic

Implication for SHAP/LIME: If the model relies heavily on non-manipulable features (dst_host_count, dst_host_same_srv_rate), it is more robust against evasion. If it relies on manipulable features (src_bytes, duration), evasion is easier.

5. Attack Scenarios

Scenario 1: Evasion via Explanation Leakage

  1. Attacker queries IDS explanation API with known attack samples
  2. SHAP reveals serror_rate (weight=0.45) and count (weight=0.32) are top features for DoS detection
  3. Attacker crafts DoS traffic with low serror_rate (connection completion spoofing) and varied count
  4. IDS misclassifies as Normal

Scenario 2: LIME Instability Exploitation

  1. LIME produces different top features for the same input across runs (stochastic)
  2. Analyst sees Feature A as top in run 1, Feature B in run 2
  3. Inconsistent investigation β†’ missed detections or wasted resources

Scenario 3: Backdoor with Clean Explanations

  1. Poisoned training data contains trigger pattern (e.g., specific src_bytes + service combination)
  2. Model correctly classifies and explains normal traffic
  3. On triggered inputs: misclassifies as Normal AND SHAP shows plausible benign features
  4. Analyst trusts explanation β†’ attack goes undetected

6. Security Requirements

Requirement Priority Mitigation
Explanation access control High Rate-limit explanation API, log queries
Explanation consistency High Prefer SHAP (deterministic) over LIME for critical decisions
Model integrity verification Medium Track training data provenance, validate model fingerprints
Robust feature reliance Medium Verify model doesn't over-rely on manipulable features
Defense-in-depth High Explanations supplement (don't replace) rule-based IDS

7. Assumptions & Scope

  • NSL-KDD is a benchmark dataset β€” real deployment would require domain-specific feature analysis
  • We evaluate post-hoc explainability only (not inherently interpretable models)
  • We focus on explanation reliability, not adversarial robustness of the classifier itself (that's Project 1)