Model Card: MalwareNet

Model Details

Model name: MalwareNet
Repository: PepiPetrov/MalwareNet
Task: Binary PE malware classification
Model type: Compact neural classifier with hierarchical gated feature branches
Input: 2,568-dimensional EMBER v3 / EMBER2024 feature vector from PE files
Output: Calibrated malware probability from 0.0 to 1.0
Deployment format: ONNX, with Platt scaling and sigmoid included
Desktop deployment: Native Rust egui app, no Python runtime required

MalwareNet is designed to classify Windows PE files such as .exe and .dll as benign or malicious using extracted EMBER2024 features. The repository reports 273,452 parameters, AUC-ROC of 0.9911, ECE of 0.0079, and CPU inference latency of 0.041 ms per file. :contentReference[oaicite:0]{index=0}

Intended Use

MalwareNet is intended for:

Malware research
Defensive security tooling
PE file triage
Educational malware-ML experiments
Fast local scoring of Windows executables and DLLs

It should be used as a decision-support system, not as a sole authority for blocking, deleting, quarantining, or attributing files.

Out-of-Scope Use

MalwareNet is not intended for:

Fully automated production enforcement without human review
Attribution of malware authors or campaigns
Detecting non-PE malware formats
Proving that a file is safe
Evasion research against real deployed systems
Use on files where EMBER feature extraction fails or is unsupported

Architecture

MalwareNet splits the 2,568-dimensional EMBER feature vector into five semantic feature groups:

global features
byte histogram / byte features
string features
section features
import features

Each group is processed by a GatedFeatureBlock. The branch outputs are fused by a FusionBlock, then projected to a binary logit. The exported ONNX graph includes Platt scaling and sigmoid activation, so inference returns a calibrated malware probability. :contentReference[oaicite:1]{index=1}

Training Data

The model is trained using the EMBER2024 PE dataset. The repository instructs users to download PE files from FutureComputing4AI/EMBER2024 and vectorize them with thrember. :contentReference[oaicite:2]{index=2}

Known dataset scope:

Windows PE files
Win32, Win64, and .NET samples
Feature-based representation, not raw bytes at inference time

Evaluation

Reported repository metrics:

Metric	Value
Parameters	273,452
Test AUC-ROC	0.9911
Expected Calibration Error	0.0079
PGD AUC, ε=0.1	≥ 0.9910
FGSM AUC, ε=0.1	≥ 0.9910
Best sparse LTH ticket	0.9904 val AUC at 48.18% sparsity
CPU latency	0.041 ms / file
CPU throughput	~24,400 files / sec

Adversarial robustness is evaluated with FGSM and PGD over ε values from 0.001 to 0.1. The repository reports less than 0.02% relative AUC degradation under 10-step PGD at ε=0.1. :contentReference[oaicite:3]{index=3}

Calibration

The model uses Platt scaling fitted on the validation set. The calibrated probability is baked into the exported ONNX model, reducing the chance that downstream users accidentally consume uncalibrated logits. :contentReference[oaicite:4]{index=4}

Limitations

The model depends on EMBER-style static PE features.
It may miss malware that uses packing, obfuscation, feature manipulation, or novel compiler/linker patterns.
High AUC does not guarantee low false positives in a real deployment distribution.
Real-world performance may differ across regions, enterprise environments, time periods, and malware families.
The model does not execute samples or observe runtime behavior.
It does not prove that low-scoring files are benign.

Security Considerations

Malware classifiers are vulnerable to distribution shift and adversarial manipulation. Although this repository includes FGSM/PGD evaluation and a PE mutation proof of concept, the README notes that the mutation script is not a full feasible-attack framework and would require stronger semantics-preserving mutation policies plus a functionality oracle. :contentReference[oaicite:5]{index=5}

Recommended deployment controls:

Use alongside static signatures, sandboxing, reputation systems, and human analyst review.
Log model version, feature extractor version, threshold, and score.
Monitor false positives and false negatives continuously.
Re-evaluate on fresh malware and benignware samples.
Avoid exposing raw confidence thresholds as a public oracle.

Ethical Considerations

This model is intended for defensive and research use. Because malware ML systems can be studied for evasion, any release of artifacts, mutation tools, or evaluation scripts should be accompanied by responsible-use guidance and safeguards.

Recommended Thresholding

No universal threshold is specified. Operators should select a threshold based on their own tolerance for false positives and false negatives.

Suggested policy:

Low threshold: triage queue or analyst review
Medium threshold: warning / suspicious label
High threshold: quarantine recommendation, preferably with corroborating evidence

Reproducibility

The repository provides scripts for:

hyperparameter tuning with Optuna
training, calibration, and ONNX export
held-out test evaluation
adversarial robustness evaluation
latency benchmarking
Lottery Ticket Hypothesis pruning :contentReference[oaicite:6]{index=6}

Deployment

The ONNX model accepts a float32 tensor of shape (1, 2568) and returns a calibrated malware probability. The repository also includes a Rust desktop app that embeds the model and scores selected PE files locally. :contentReference[oaicite:7]{index=7}

Maintenance

Recommended future updates:

Add dataset version hash and exact split details.
Publish confusion matrix and threshold-specific metrics.
Add per-family or per-source evaluation.
Track performance over time on newly collected samples.
Document feature-extraction failure modes.
Add model versioning and artifact checksums.

Citation

When using MalwareNet, cite the repository and the EMBER2024 dataset/tooling used to train and extract features.