Model Card: MalwareNet
Model Details
Model name: MalwareNet
Repository: PepiPetrov/MalwareNet
Task: Binary PE malware classification
Model type: Compact neural classifier with hierarchical gated feature branches
Input: 2,568-dimensional EMBER v3 / EMBER2024 feature vector from PE files
Output: Calibrated malware probability from 0.0 to 1.0
Deployment format: ONNX, with Platt scaling and sigmoid included
Desktop deployment: Native Rust egui app, no Python runtime required
MalwareNet is designed to classify Windows PE files such as .exe and .dll as benign or malicious using extracted EMBER2024 features. The repository reports 273,452 parameters, AUC-ROC of 0.9911, ECE of 0.0079, and CPU inference latency of 0.041 ms per file. :contentReference[oaicite:0]{index=0}
Intended Use
MalwareNet is intended for:
- Malware research
- Defensive security tooling
- PE file triage
- Educational malware-ML experiments
- Fast local scoring of Windows executables and DLLs
It should be used as a decision-support system, not as a sole authority for blocking, deleting, quarantining, or attributing files.
Out-of-Scope Use
MalwareNet is not intended for:
- Fully automated production enforcement without human review
- Attribution of malware authors or campaigns
- Detecting non-PE malware formats
- Proving that a file is safe
- Evasion research against real deployed systems
- Use on files where EMBER feature extraction fails or is unsupported
Architecture
MalwareNet splits the 2,568-dimensional EMBER feature vector into five semantic feature groups:
- global features
- byte histogram / byte features
- string features
- section features
- import features
Each group is processed by a GatedFeatureBlock. The branch outputs are fused by a FusionBlock, then projected to a binary logit. The exported ONNX graph includes Platt scaling and sigmoid activation, so inference returns a calibrated malware probability. :contentReference[oaicite:1]{index=1}
Training Data
The model is trained using the EMBER2024 PE dataset. The repository instructs users to download PE files from FutureComputing4AI/EMBER2024 and vectorize them with thrember. :contentReference[oaicite:2]{index=2}
Known dataset scope:
- Windows PE files
- Win32, Win64, and .NET samples
- Feature-based representation, not raw bytes at inference time
Evaluation
Reported repository metrics:
| Metric | Value |
|---|---|
| Parameters | 273,452 |
| Test AUC-ROC | 0.9911 |
| Expected Calibration Error | 0.0079 |
| PGD AUC, ε=0.1 | ≥ 0.9910 |
| FGSM AUC, ε=0.1 | ≥ 0.9910 |
| Best sparse LTH ticket | 0.9904 val AUC at 48.18% sparsity |
| CPU latency | 0.041 ms / file |
| CPU throughput | ~24,400 files / sec |
Adversarial robustness is evaluated with FGSM and PGD over ε values from 0.001 to 0.1. The repository reports less than 0.02% relative AUC degradation under 10-step PGD at ε=0.1. :contentReference[oaicite:3]{index=3}
Calibration
The model uses Platt scaling fitted on the validation set. The calibrated probability is baked into the exported ONNX model, reducing the chance that downstream users accidentally consume uncalibrated logits. :contentReference[oaicite:4]{index=4}
Limitations
- The model depends on EMBER-style static PE features.
- It may miss malware that uses packing, obfuscation, feature manipulation, or novel compiler/linker patterns.
- High AUC does not guarantee low false positives in a real deployment distribution.
- Real-world performance may differ across regions, enterprise environments, time periods, and malware families.
- The model does not execute samples or observe runtime behavior.
- It does not prove that low-scoring files are benign.
Security Considerations
Malware classifiers are vulnerable to distribution shift and adversarial manipulation. Although this repository includes FGSM/PGD evaluation and a PE mutation proof of concept, the README notes that the mutation script is not a full feasible-attack framework and would require stronger semantics-preserving mutation policies plus a functionality oracle. :contentReference[oaicite:5]{index=5}
Recommended deployment controls:
- Use alongside static signatures, sandboxing, reputation systems, and human analyst review.
- Log model version, feature extractor version, threshold, and score.
- Monitor false positives and false negatives continuously.
- Re-evaluate on fresh malware and benignware samples.
- Avoid exposing raw confidence thresholds as a public oracle.
Ethical Considerations
This model is intended for defensive and research use. Because malware ML systems can be studied for evasion, any release of artifacts, mutation tools, or evaluation scripts should be accompanied by responsible-use guidance and safeguards.
Recommended Thresholding
No universal threshold is specified. Operators should select a threshold based on their own tolerance for false positives and false negatives.
Suggested policy:
- Low threshold: triage queue or analyst review
- Medium threshold: warning / suspicious label
- High threshold: quarantine recommendation, preferably with corroborating evidence
Reproducibility
The repository provides scripts for:
- hyperparameter tuning with Optuna
- training, calibration, and ONNX export
- held-out test evaluation
- adversarial robustness evaluation
- latency benchmarking
- Lottery Ticket Hypothesis pruning :contentReference[oaicite:6]{index=6}
Deployment
The ONNX model accepts a float32 tensor of shape (1, 2568) and returns a calibrated malware probability. The repository also includes a Rust desktop app that embeds the model and scores selected PE files locally. :contentReference[oaicite:7]{index=7}
Maintenance
Recommended future updates:
- Add dataset version hash and exact split details.
- Publish confusion matrix and threshold-specific metrics.
- Add per-family or per-source evaluation.
- Track performance over time on newly collected samples.
- Document feature-extraction failure modes.
- Add model versioning and artifact checksums.
Citation
When using MalwareNet, cite the repository and the EMBER2024 dataset/tooling used to train and extract features.