TemporalDrift-ETM β Malware Traffic Classifier
Paper: TemporalDrift-ETM: A Dual-Layer Framework for Concept Drift Analysis
and Malware Evolution Tracking Using Network Flow Data
Author: M Abdur Rabbi Tota (ID 2001011025)
Institution: Rangamati Science and Technology University, Bangladesh
Supervisor: Md Mynoddin, Assistant Professor
What this Space does
Upload a CSV file of network flow records (using NTLFlowLyzer features from the BCCC-Mal-NetMem-2025 / Malicious-CSVs2025 dataset format) and the model will classify each flow into one of the following malware families:
backdoor | exploit | hacktool | hoax | rootkit | trojan | virus | worm
Models available
| Model | Description |
|---|---|
| π² Random Forest β Drift-Resilient (recommended) | Retrained on early + mid sessions; more robust to concept drift |
| π² Random Forest β Standard | Trained on early session only |
| β‘ XGBoost β Standard | XGBoost trained on early session |
All models use a shared preprocessing pipeline: MinMaxScaler β top-35 feature selection (by Random Forest importance).
CSV format
Your file must contain the 76 NTLFlowLyzer behavioral columns.
Meta-columns (Flow ID, Src IP, Dst IP, Src Port, Dst Port,
Protocol, Timestamp, Malware Family) are automatically removed if present.
Click the "Load Sample CSV" button in the app to download a template.
Dataset & Citation
- Dataset: Malicious-CSVs2025 (BCCC-Mal-NetMem-2025), York University, Canada
- Reference: Lashkari et al., Journal of Supercomputing, 2025
If you use this model, please cite the TemporalDrift-ETM paper:
@article{tota2025temporaldrift,
title = {TemporalDrift-ETM: A Dual-Layer Framework for Concept Drift Analysis
and Malware Evolution Tracking Using Network Flow Data},
author = {Tota, M Abdur Rabbi},
journal = {ACM Transactions on Privacy and Security},
year = {2025},
note = {Rangamati Science and Technology University, Bangladesh}
}
Limitations
- The MinMaxScaler was fitted on the early session of Malicious-CSVs2025 only. Performance may degrade on traffic that has drifted significantly from that distribution.
- The drift-resilient RF model partially mitigates this by including mid-session data.
- NaN values in input features are filled with 0 during inference (training used the early-session column median).