petarpepi
/

MalwareNet

ONNX

Model card Files Files and versions

xet

Community

petarpepi commited on 3 days ago

Commit

747ac96

verified ·

1 Parent(s): fb9ef02

Upload MODEL_CARD.md

Browse files

Files changed (1) hide show

MODEL_CARD.md +145 -0

MODEL_CARD.md ADDED Viewed

	@@ -0,0 +1,145 @@

+# Model Card: MalwareNet
+## Model Details
+**Model name:** MalwareNet
+**Repository:** PepiPetrov/MalwareNet
+**Task:** Binary PE malware classification
+**Model type:** Compact neural classifier with hierarchical gated feature branches
+**Input:** 2,568-dimensional EMBER v3 / EMBER2024 feature vector from PE files
+**Output:** Calibrated malware probability from 0.0 to 1.0
+**Deployment format:** ONNX, with Platt scaling and sigmoid included
+**Desktop deployment:** Native Rust egui app, no Python runtime required
+MalwareNet is designed to classify Windows PE files such as `.exe` and `.dll` as benign or malicious using extracted EMBER2024 features. The repository reports 273,452 parameters, AUC-ROC of 0.9911, ECE of 0.0079, and CPU inference latency of 0.041 ms per file. :contentReference[oaicite:0]{index=0}
+## Intended Use
+MalwareNet is intended for:
+- Malware research
+- Defensive security tooling
+- PE file triage
+- Educational malware-ML experiments
+- Fast local scoring of Windows executables and DLLs
+It should be used as a decision-support system, not as a sole authority for blocking, deleting, quarantining, or attributing files.
+## Out-of-Scope Use
+MalwareNet is not intended for:
+- Fully automated production enforcement without human review
+- Attribution of malware authors or campaigns
+- Detecting non-PE malware formats
+- Proving that a file is safe
+- Evasion research against real deployed systems
+- Use on files where EMBER feature extraction fails or is unsupported
+## Architecture
+MalwareNet splits the 2,568-dimensional EMBER feature vector into five semantic feature groups:
+- global features
+- byte histogram / byte features
+- string features
+- section features
+- import features
+Each group is processed by a `GatedFeatureBlock`. The branch outputs are fused by a `FusionBlock`, then projected to a binary logit. The exported ONNX graph includes Platt scaling and sigmoid activation, so inference returns a calibrated malware probability. :contentReference[oaicite:1]{index=1}
+## Training Data
+The model is trained using the EMBER2024 PE dataset. The repository instructs users to download PE files from `FutureComputing4AI/EMBER2024` and vectorize them with `thrember`. :contentReference[oaicite:2]{index=2}
+Known dataset scope:
+- Windows PE files
+- Win32, Win64, and .NET samples
+- Feature-based representation, not raw bytes at inference time
+## Evaluation
+Reported repository metrics:
+| Metric | Value |
+|---|---:|
+| Parameters | 273,452 |
+| Test AUC-ROC | 0.9911 |
+| Expected Calibration Error | 0.0079 |
+| PGD AUC, ε=0.1 | ≥ 0.9910 |
+| FGSM AUC, ε=0.1 | ≥ 0.9910 |
+| Best sparse LTH ticket | 0.9904 val AUC at 48.18% sparsity |
+| CPU latency | 0.041 ms / file |
+| CPU throughput | ~24,400 files / sec |
+Adversarial robustness is evaluated with FGSM and PGD over ε values from 0.001 to 0.1. The repository reports less than 0.02% relative AUC degradation under 10-step PGD at ε=0.1. :contentReference[oaicite:3]{index=3}
+## Calibration
+The model uses Platt scaling fitted on the validation set. The calibrated probability is baked into the exported ONNX model, reducing the chance that downstream users accidentally consume uncalibrated logits. :contentReference[oaicite:4]{index=4}
+## Limitations
+- The model depends on EMBER-style static PE features.
+- It may miss malware that uses packing, obfuscation, feature manipulation, or novel compiler/linker patterns.
+- High AUC does not guarantee low false positives in a real deployment distribution.
+- Real-world performance may differ across regions, enterprise environments, time periods, and malware families.
+- The model does not execute samples or observe runtime behavior.
+- It does not prove that low-scoring files are benign.
+## Security Considerations
+Malware classifiers are vulnerable to distribution shift and adversarial manipulation. Although this repository includes FGSM/PGD evaluation and a PE mutation proof of concept, the README notes that the mutation script is not a full feasible-attack framework and would require stronger semantics-preserving mutation policies plus a functionality oracle. :contentReference[oaicite:5]{index=5}
+Recommended deployment controls:
+- Use alongside static signatures, sandboxing, reputation systems, and human analyst review.
+- Log model version, feature extractor version, threshold, and score.
+- Monitor false positives and false negatives continuously.
+- Re-evaluate on fresh malware and benignware samples.
+- Avoid exposing raw confidence thresholds as a public oracle.
+## Ethical Considerations
+This model is intended for defensive and research use. Because malware ML systems can be studied for evasion, any release of artifacts, mutation tools, or evaluation scripts should be accompanied by responsible-use guidance and safeguards.
+## Recommended Thresholding
+No universal threshold is specified. Operators should select a threshold based on their own tolerance for false positives and false negatives.
+Suggested policy:
+- Low threshold: triage queue or analyst review
+- Medium threshold: warning / suspicious label
+- High threshold: quarantine recommendation, preferably with corroborating evidence
+## Reproducibility
+The repository provides scripts for:
+- hyperparameter tuning with Optuna
+- training, calibration, and ONNX export
+- held-out test evaluation
+- adversarial robustness evaluation
+- latency benchmarking
+- Lottery Ticket Hypothesis pruning :contentReference[oaicite:6]{index=6}
+## Deployment
+The ONNX model accepts a `float32` tensor of shape `(1, 2568)` and returns a calibrated malware probability. The repository also includes a Rust desktop app that embeds the model and scores selected PE files locally. :contentReference[oaicite:7]{index=7}
+## Maintenance
+Recommended future updates:
+- Add dataset version hash and exact split details.
+- Publish confusion matrix and threshold-specific metrics.
+- Add per-family or per-source evaluation.
+- Track performance over time on newly collected samples.
+- Document feature-extraction failure modes.
+- Add model versioning and artifact checksums.
+## Citation
+When using MalwareNet, cite the repository and the EMBER2024 dataset/tooling used to train and extract features.