Cross-Domain Generalization Failure in Lightweight Intrusion Detection Models for IIoT Networks
Abstract
Lightweight machine learning models for IIoT intrusion detection show limited generalization across networks due to reliance on coarse port-category features and imbalanced class distributions, with adversarial robustness not correlating with cross-network performance.
Lightweight machine learning models are increasingly proposed for intrusion detection in Industrial Internet of Things (IIoT) networks due to their suitability for resource-constrained edge deployment. Most reported results evaluate these models only within their training network, leaving behavior on unseen networks unverified. This study trains four lightweight architectures on one IIoT dataset and evaluates them, without retraining, on two structurally distinct IIoT datasets using a feature representation restricted to attributes available across all three sources. Explainability analysis across two top-performing models shows both rely overwhelmingly on coarse port-category features; the most influential category occurs in source-domain attack traffic at 96 to 435 times the rate in the two target domains, indicating that coarsening port resolution relocates rather than removes a documented shortcut. Evaluation under naturally imbalanced class distributions reveals a further effect: the evaluation protocol used can reverse which target network appears to pose the greater generalization challenge. Adversarial robustness and recovery through limited target-domain exposure are also assessed; robustness to adversarial perturbation is unrelated to cross-network generalization, and recovery through adaptation varies considerably by architecture. These findings suggest deployment readiness should be assessed using cross-network evaluation under realistic class distributions, rather than within-domain accuracy alone.
Community
Most lightweight IDS papers for IIoT report near-perfect accuracy โ but almost always tested on the same network they were trained on. We wanted to know: does that performance actually survive when the model meets a different industrial network?
We trained 4 lightweight architectures (Decision Tree, small MLP, 1D-CNN, LSTM) on Edge-IIoTset and evaluated them, without retraining, on two independent datasets (Gotham 2025, WUSTL-IIoT-2021).
Key findings:
๐ป Collapse is severe and consistent โ F1 drops from ~0.97 in-domain to 0.09โ0.28 cross-domain across all 4 models
๐ The "port shortcut" isn't fixed by coarsening โ even after bucketing raw ports into 4 categories (a common mitigation), SHAP analysis shows models still lean overwhelmingly on port-bucket features โ and the top feature appears 96โ435x more often in source-domain attack traffic than in the targets
โ๏ธ Balanced vs. natural evaluation can flip your conclusions โ under balanced sampling, WUSTL looks easier than Gotham; under natural (imbalanced) distributions, that ranking reverses entirely
๐ฏ Robustness โ generalization โ efficiency โ these are independent axes; the best cross-domain performer (SmallLSTM) is also the least adversarially robust
๐ง Few-shot recovery is architecture-dependent โ Decision Tree and LSTM recover substantially with limited target-domain data; the 1D-CNN barely improves at all
Get this paper in your agent:
hf papers read 2607.00553 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper