TemporalDrift-ETM β€” Malware Traffic Classifier

Paper: TemporalDrift-ETM: A Dual-Layer Framework for Concept Drift Analysis
and Malware Evolution Tracking Using Network Flow Data

Author: M Abdur Rabbi Tota (ID 2001011025)
Institution: Rangamati Science and Technology University, Bangladesh
Supervisor: Md Mynoddin, Assistant Professor


What this Space does

Upload a CSV file of network flow records (using NTLFlowLyzer features from the BCCC-Mal-NetMem-2025 / Malicious-CSVs2025 dataset format) and the model will classify each flow into one of the following malware families:

backdoor | exploit | hacktool | hoax | rootkit | trojan | virus | worm


Models available

Model Description
🌲 Random Forest β€” Drift-Resilient (recommended) Retrained on early + mid sessions; more robust to concept drift
🌲 Random Forest β€” Standard Trained on early session only
⚑ XGBoost β€” Standard XGBoost trained on early session

All models use a shared preprocessing pipeline: MinMaxScaler β†’ top-35 feature selection (by Random Forest importance).


CSV format

Your file must contain the 76 NTLFlowLyzer behavioral columns.
Meta-columns (Flow ID, Src IP, Dst IP, Src Port, Dst Port, Protocol, Timestamp, Malware Family) are automatically removed if present.

Click the "Load Sample CSV" button in the app to download a template.


Dataset & Citation

  • Dataset: Malicious-CSVs2025 (BCCC-Mal-NetMem-2025), York University, Canada
  • Reference: Lashkari et al., Journal of Supercomputing, 2025

If you use this model, please cite the TemporalDrift-ETM paper:

@article{tota2025temporaldrift,
  title   = {TemporalDrift-ETM: A Dual-Layer Framework for Concept Drift Analysis
             and Malware Evolution Tracking Using Network Flow Data},
  author  = {Tota, M Abdur Rabbi},
  journal = {ACM Transactions on Privacy and Security},
  year    = {2025},
  note    = {Rangamati Science and Technology University, Bangladesh}
}

Limitations

  • The MinMaxScaler was fitted on the early session of Malicious-CSVs2025 only. Performance may degrade on traffic that has drifted significantly from that distribution.
  • The drift-resilient RF model partially mitigates this by including mid-session data.
  • NaN values in input features are filled with 0 during inference (training used the early-session column median).
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support