petarpepi commited on
Commit
747ac96
·
verified ·
1 Parent(s): fb9ef02

Upload MODEL_CARD.md

Browse files
Files changed (1) hide show
  1. MODEL_CARD.md +145 -0
MODEL_CARD.md ADDED
@@ -0,0 +1,145 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Model Card: MalwareNet
2
+
3
+ ## Model Details
4
+
5
+ **Model name:** MalwareNet
6
+ **Repository:** PepiPetrov/MalwareNet
7
+ **Task:** Binary PE malware classification
8
+ **Model type:** Compact neural classifier with hierarchical gated feature branches
9
+ **Input:** 2,568-dimensional EMBER v3 / EMBER2024 feature vector from PE files
10
+ **Output:** Calibrated malware probability from 0.0 to 1.0
11
+ **Deployment format:** ONNX, with Platt scaling and sigmoid included
12
+ **Desktop deployment:** Native Rust egui app, no Python runtime required
13
+
14
+ MalwareNet is designed to classify Windows PE files such as `.exe` and `.dll` as benign or malicious using extracted EMBER2024 features. The repository reports 273,452 parameters, AUC-ROC of 0.9911, ECE of 0.0079, and CPU inference latency of 0.041 ms per file. :contentReference[oaicite:0]{index=0}
15
+
16
+ ## Intended Use
17
+
18
+ MalwareNet is intended for:
19
+
20
+ - Malware research
21
+ - Defensive security tooling
22
+ - PE file triage
23
+ - Educational malware-ML experiments
24
+ - Fast local scoring of Windows executables and DLLs
25
+
26
+ It should be used as a decision-support system, not as a sole authority for blocking, deleting, quarantining, or attributing files.
27
+
28
+ ## Out-of-Scope Use
29
+
30
+ MalwareNet is not intended for:
31
+
32
+ - Fully automated production enforcement without human review
33
+ - Attribution of malware authors or campaigns
34
+ - Detecting non-PE malware formats
35
+ - Proving that a file is safe
36
+ - Evasion research against real deployed systems
37
+ - Use on files where EMBER feature extraction fails or is unsupported
38
+
39
+ ## Architecture
40
+
41
+ MalwareNet splits the 2,568-dimensional EMBER feature vector into five semantic feature groups:
42
+
43
+ - global features
44
+ - byte histogram / byte features
45
+ - string features
46
+ - section features
47
+ - import features
48
+
49
+ Each group is processed by a `GatedFeatureBlock`. The branch outputs are fused by a `FusionBlock`, then projected to a binary logit. The exported ONNX graph includes Platt scaling and sigmoid activation, so inference returns a calibrated malware probability. :contentReference[oaicite:1]{index=1}
50
+
51
+ ## Training Data
52
+
53
+ The model is trained using the EMBER2024 PE dataset. The repository instructs users to download PE files from `FutureComputing4AI/EMBER2024` and vectorize them with `thrember`. :contentReference[oaicite:2]{index=2}
54
+
55
+ Known dataset scope:
56
+
57
+ - Windows PE files
58
+ - Win32, Win64, and .NET samples
59
+ - Feature-based representation, not raw bytes at inference time
60
+
61
+ ## Evaluation
62
+
63
+ Reported repository metrics:
64
+
65
+ | Metric | Value |
66
+ |---|---:|
67
+ | Parameters | 273,452 |
68
+ | Test AUC-ROC | 0.9911 |
69
+ | Expected Calibration Error | 0.0079 |
70
+ | PGD AUC, ε=0.1 | ≥ 0.9910 |
71
+ | FGSM AUC, ε=0.1 | ≥ 0.9910 |
72
+ | Best sparse LTH ticket | 0.9904 val AUC at 48.18% sparsity |
73
+ | CPU latency | 0.041 ms / file |
74
+ | CPU throughput | ~24,400 files / sec |
75
+
76
+ Adversarial robustness is evaluated with FGSM and PGD over ε values from 0.001 to 0.1. The repository reports less than 0.02% relative AUC degradation under 10-step PGD at ε=0.1. :contentReference[oaicite:3]{index=3}
77
+
78
+ ## Calibration
79
+
80
+ The model uses Platt scaling fitted on the validation set. The calibrated probability is baked into the exported ONNX model, reducing the chance that downstream users accidentally consume uncalibrated logits. :contentReference[oaicite:4]{index=4}
81
+
82
+ ## Limitations
83
+
84
+ - The model depends on EMBER-style static PE features.
85
+ - It may miss malware that uses packing, obfuscation, feature manipulation, or novel compiler/linker patterns.
86
+ - High AUC does not guarantee low false positives in a real deployment distribution.
87
+ - Real-world performance may differ across regions, enterprise environments, time periods, and malware families.
88
+ - The model does not execute samples or observe runtime behavior.
89
+ - It does not prove that low-scoring files are benign.
90
+
91
+ ## Security Considerations
92
+
93
+ Malware classifiers are vulnerable to distribution shift and adversarial manipulation. Although this repository includes FGSM/PGD evaluation and a PE mutation proof of concept, the README notes that the mutation script is not a full feasible-attack framework and would require stronger semantics-preserving mutation policies plus a functionality oracle. :contentReference[oaicite:5]{index=5}
94
+
95
+ Recommended deployment controls:
96
+
97
+ - Use alongside static signatures, sandboxing, reputation systems, and human analyst review.
98
+ - Log model version, feature extractor version, threshold, and score.
99
+ - Monitor false positives and false negatives continuously.
100
+ - Re-evaluate on fresh malware and benignware samples.
101
+ - Avoid exposing raw confidence thresholds as a public oracle.
102
+
103
+ ## Ethical Considerations
104
+
105
+ This model is intended for defensive and research use. Because malware ML systems can be studied for evasion, any release of artifacts, mutation tools, or evaluation scripts should be accompanied by responsible-use guidance and safeguards.
106
+
107
+ ## Recommended Thresholding
108
+
109
+ No universal threshold is specified. Operators should select a threshold based on their own tolerance for false positives and false negatives.
110
+
111
+ Suggested policy:
112
+
113
+ - Low threshold: triage queue or analyst review
114
+ - Medium threshold: warning / suspicious label
115
+ - High threshold: quarantine recommendation, preferably with corroborating evidence
116
+
117
+ ## Reproducibility
118
+
119
+ The repository provides scripts for:
120
+
121
+ - hyperparameter tuning with Optuna
122
+ - training, calibration, and ONNX export
123
+ - held-out test evaluation
124
+ - adversarial robustness evaluation
125
+ - latency benchmarking
126
+ - Lottery Ticket Hypothesis pruning :contentReference[oaicite:6]{index=6}
127
+
128
+ ## Deployment
129
+
130
+ The ONNX model accepts a `float32` tensor of shape `(1, 2568)` and returns a calibrated malware probability. The repository also includes a Rust desktop app that embeds the model and scores selected PE files locally. :contentReference[oaicite:7]{index=7}
131
+
132
+ ## Maintenance
133
+
134
+ Recommended future updates:
135
+
136
+ - Add dataset version hash and exact split details.
137
+ - Publish confusion matrix and threshold-specific metrics.
138
+ - Add per-family or per-source evaluation.
139
+ - Track performance over time on newly collected samples.
140
+ - Document feature-extraction failure modes.
141
+ - Add model versioning and artifact checksums.
142
+
143
+ ## Citation
144
+
145
+ When using MalwareNet, cite the repository and the EMBER2024 dataset/tooling used to train and extract features.