license: apache-2.0
datasets:
- joyce8/EMBER2024
language:
- en
tags:
- malware-detection
- cybersecurity
- onnxruntime
- lightgbm
- pytorch
- tabnet
- binary-classification
pipeline_tag: text-classification
library_name: onnxruntime
EMBER2024 Malware Detection Models
A collection of four model architectures (DNN, TabNet, Hybrid GBDT2NN, LightGBM) trained and evaluated on all eight subsets of the EMBER2024 dataset β six file formats (Win32, Win64, .NET, APK, ELF, PDF) plus a combined PE group and an all-types set β and converted into deployment-ready formats.
Training environment: GPU server (CUDA 13)
Dataset paper: Joyce et al., KDD 2025 (arXiv:2506.05074)
Models
| Directory | Architecture | Deployment Format | Parameters |
|---|---|---|---|
dnn/ |
Feed-Forward DNN (PReLU + Dropout) | ONNX (INT8 Static / FP32) | 13.2 M (PE) / 0.98 M (non-PE) |
tabnet/ |
TabNet (Arik & Pfister, 2021) | ONNX FP32 | ~3 M |
hybrid/ |
GBDT2NN (DeepGBM, KDD 2019) | ONNX (nn_part) + LightGBM booster | ~1 M NN |
lightgbm/ |
LightGBM (pretrained, joyce8/EMBER2024-benchmark-models) | Treelite .tl |
β |
Subset List
| Subset | Target File Type | Input Dim |
|---|---|---|
PE |
All PE binaries (Win32 + Win64 + .NET) | 2,568 |
Win32 |
Windows 32-bit PE | 2,568 |
Win64 |
Windows 64-bit PE | 2,568 |
.NET |
.NET assemblies | 2,568 |
APK |
Android APK | 696 |
ELF |
Linux ELF | 696 |
PDF |
PDF documents | 696 |
all |
All file types combined | 2,568 |
Directory Structure
Filename convention: {model}_{subset}[_suffix].{ext}
The .NET subset is rendered as dotnet in filenames.
dnn/
βββ dnn_PE.onnx # INT8 Static (deployment; PE/Win32/Win64/dotnet/all)
βββ dnn_PE_fp32.onnx # FP32 ONNX (reference; bundled only for INT8 subsets)
βββ dnn_PE.pt # PyTorch checkpoint
βββ dnn_PE_metrics.json # Evaluation results (AUC, TPR@1%FPR)
βββ dnn_PE_benchmark.json # Size & latency
βββ dnn_APK.onnx # FP32 (non-PE β INT8 AUC loss too large)
βββ dnn_APK.pt
βββ ...
tabnet/
βββ tabnet_PE.onnx # FP32 ONNX (140 MB β sparsemax unfolding)
βββ tabnet_PE.zip # pytorch-tabnet native (7.4 MB, lightweight)
βββ ...
hybrid/
βββ hybrid_PE_nnpart.onnx # GBDT2NN nn_part ONNX (5.1 MB)
βββ hybrid_PE_lgbm.model # LightGBM booster (3.6 MB)
βββ hybrid_PE.pt # PyTorch checkpoint
βββ ...
lightgbm/
βββ lightgbm_PE.tl # Treelite serialization (platform-independent; recompilation required)
βββ ...
Performance Results (EMBER2024 test set)
Metrics: ROC-AUC, TPR @ 1% FPR (paper Β§4.1), and challenge-set detection rate at the FPR=1% threshold.
Challenge set: 6,315 evasive malware samples (positives only; Win32 3,225 / .NET 829 / Win64 814 / PDF 805 / ELF 386 / APK 256).
DNN
| Subset | ROC-AUC | TPR@1%FPR | Deployment Format | Size |
|---|---|---|---|---|
| PE | 0.9969 | 0.9472 | INT8 Static ONNX | 13.3 MB |
| Win32 | 0.9965 | 0.9479 | INT8 Static ONNX | 13.3 MB |
| Win64 | 0.9969 | 0.9617 | INT8 Static ONNX | 13.3 MB |
| .NET | 0.9920 | 0.8444 | INT8 Static ONNX | 13.3 MB |
| all | 0.9938 | 0.8870 | INT8 Static ONNX | 13.3 MB |
| APK | 0.9761 | 0.7682 | FP32 ONNX | 3.9 MB |
| ELF | 0.9840 | 0.8103 | FP32 ONNX | 3.9 MB |
| 0.9795 | 0.8902 | FP32 ONNX | 3.9 MB |
non-PE subsets (APK/ELF/PDF) use 696-dim inputs and have too few parameters, so INT8 quantization causes a large AUC drop β they are kept in FP32.
Figures are for the INT8 models (fixed 100K-sample set). ΞAUC vs FP32 stays within 0.19 pp.
For the .NET and all subsets, INT8 quantization causes a relatively larger drop in TPR@1%FPR (still passes the AUC gate: |ΞAUC| < 0.5 pp).
TabNet
| Subset | ROC-AUC | TPR@1%FPR | Deployment Format | Size |
|---|---|---|---|---|
| PE | 0.9948 | 0.9195 | FP32 ONNX | 140 MB |
| Win32 | 0.9949 | 0.9317 | FP32 ONNX | 140 MB |
| Win64 | 0.9944 | 0.9318 | FP32 ONNX | 140 MB |
| .NET | 0.9923 | 0.8700 | FP32 ONNX | 140 MB |
| all | 0.9922 | 0.8912 | FP32 ONNX | 140 MB |
| APK | 0.9741 | 0.7028 | FP32 ONNX | 13.5 MB |
| ELF | 0.9793 | 0.5460 | FP32 ONNX | 13.5 MB |
| 0.9810 | 0.8597 | FP32 ONNX | 13.5 MB |
The 140 MB ONNX size for the PE-family subsets is structural: the sparsemax attention loop is unfolded into the ONNX graph. If size matters, use
tabnet_PE.zip(7.4 MB) directly.
Hybrid (GBDT2NN)
| Subset | ROC-AUC | TPR@1%FPR | Deployment Format | Size |
|---|---|---|---|---|
| PE | 0.9982 | 0.9736 | nn_part ONNX + LightGBM booster | 5.3 + 3.8 MB |
| Win32 | 0.9982 | 0.9734 | nn_part ONNX + LightGBM booster | 5.3 + 3.7 MB |
| Win64 | 0.9982 | 0.9811 | nn_part ONNX + LightGBM booster | 5.3 + 3.7 MB |
| .NET | 0.9961 | 0.9466 | nn_part ONNX + LightGBM booster | 5.3 + 3.7 MB |
| all | 0.9972 | 0.9513 | nn_part ONNX + LightGBM booster | 5.3 + 3.8 MB |
| APK | 0.9828 | 0.8003 | nn_part ONNX + LightGBM booster | 5.3 + 3.7 MB |
| ELF | 0.9899 | 0.8827 | nn_part ONNX + LightGBM booster | 5.3 + 3.8 MB |
| 0.9879 | 0.9283 | nn_part ONNX + LightGBM booster | 5.3 + 3.7 MB |
LightGBM (Treelite-compiled)
| Subset | ROC-AUC | TPR@1%FPR | Size (.tl) | Size (original .model) |
|---|---|---|---|---|
| PE | 0.9983 | 0.9686 | 5.3 MB | 3.8 MB |
| Win32 | 0.9985 | 0.9722 | 5.3 MB | 3.7 MB |
| Win64 | 0.9988 | 0.9830 | 5.3 MB | 3.7 MB |
| .NET | 0.9980 | 0.9561 | 5.3 MB | 3.7 MB |
| all | 0.9970 | 0.9450 | 5.3 MB | 3.8 MB |
| APK | 0.9861 | 0.8157 | 5.3 MB | 3.7 MB |
| ELF | 0.9929 | 0.9140 | 5.3 MB | 3.8 MB |
| 0.9913 | 0.9275 | 5.3 MB | 3.7 MB |
Original LightGBM models: joyce8/EMBER2024-benchmark-models. The
.tlfiles are serialized with Treelite 3.9.1 and are platform-independent β they must be recompiled on each target platform.
Challenge Set Detection Rate
Challenge set: 6,315 evasive malware (all positive). The FPR=1% threshold from the test set is applied.
| Subset | DNN | TabNet | Hybrid | LightGBM |
|---|---|---|---|---|
.NET |
58.6% | 70.0% | 80.6% | 79.6% |
APK |
27.3% | 29.3% | 34.4% | 33.6% |
ELF |
11.7% | 4.4% | 23.8% | 30.3% |
PDF |
41.5% | 40.1% | 56.9% | 57.1% |
PE |
38.5% | 36.9% | 58.2% | 58.8% |
Win32 |
36.6% | 45.3% | 58.4% | 69.9% |
Win64 |
46.3% | 44.1% | 59.5% | 59.7% |
all |
35.3% | 42.3% | 54.1% | 48.4% |
Inference Performance (Apple M1, darwin-arm64)
warm_batch1latency: batch size = 1, measured after cache warm-up. May differ from the deployment environment (x86_64 Linux).
Latency (ms, warm batch=1)
| Subset | DNN | TabNet | Hybrid | LightGBM |
|---|---|---|---|---|
.NET |
0.248 | 5.465 | 0.151 | 0.050 |
APK |
0.035 | 0.846 | 0.145 | 0.031 |
ELF |
0.039 | 0.505 | 0.160 | 0.036 |
PDF |
0.036 | 2.230 | 0.172 | 0.048 |
PE |
0.290 | 4.402 | 0.138 | 0.028 |
Win32 |
0.288 | 4.693 | 0.141 | 0.044 |
Win64 |
0.220 | 5.621 | 0.422 | 0.039 |
all |
0.254 | 4.788 | 0.147 | 0.068 |
TabNet latency is high because the sparsemax attention is unfolded into the ONNX graph (structural).
Hybrid = nn_part ONNX inference only (LightGBM leaf extraction excluded).
LightGBM latency is for the compiled.dylib; the uploaded file is.tl(recompilation required).
Model File Sizes (deployment format)
| Subset | DNN | TabNet .onnx |
TabNet .zip |
Hybrid (nn+lgbm) | LightGBM .tl |
|---|---|---|---|---|---|
| PE family | 13.3 MB (INT8) | 140.2 MB | 7.4 MB | 5.3 + 3.8 MB | 5.3 MB |
| non-PE | 3.9 MB (FP32) | 13.5 MB | 3.2 MB | 5.3 + 3.7 MB | 5.3 MB |
Usage
Install Dependencies
pip install onnxruntime>=1.20 numpy
# For LightGBM / Hybrid inference
pip install "treelite==3.9.1" "treelite_runtime==3.9.1" lightgbm>=4.6
# To use the TabNet checkpoint directly
pip install pytorch-tabnet>=4.1
DNN Inference (ONNX Runtime)
import numpy as np
import onnxruntime as ort
from huggingface_hub import hf_hub_download
# PE subset β INT8 Static
model_path = hf_hub_download(
repo_id="cycloevan/ember-model",
filename="dnn/dnn_PE.onnx",
)
sess = ort.InferenceSession(model_path, providers=["CPUExecutionProvider"])
# X: np.ndarray shape (N, 2568), dtype float32
X = np.random.randn(1, 2568).astype(np.float32)
logit = sess.run(["logit"], {"features": X})[0] # shape (N, 1)
prob = 1 / (1 + np.exp(-logit.ravel())) # sigmoid β [0, 1]
print(f"malware probability: {prob[0]:.4f}")
# APK subset β FP32
model_path = hf_hub_download(
repo_id="cycloevan/ember-model",
filename="dnn/dnn_APK.onnx",
)
sess = ort.InferenceSession(model_path, providers=["CPUExecutionProvider"])
X = np.random.randn(1, 696).astype(np.float32) # non-PE: dim=696
prob = 1 / (1 + np.exp(-sess.run(["logit"], {"features": X})[0].ravel()))
TabNet Inference (ONNX Runtime)
import numpy as np
import onnxruntime as ort
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(
repo_id="cycloevan/ember-model",
filename="tabnet/tabnet_PE.onnx",
)
sess = ort.InferenceSession(model_path, providers=["CPUExecutionProvider"])
X = np.random.randn(1, 2568).astype(np.float32)
# output: logit (pre-sigmoid)
logit = sess.run(["logit"], {"features": X})[0]
prob = 1 / (1 + np.exp(-logit.ravel()))
Hybrid Inference (ONNX + LightGBM)
import numpy as np
import lightgbm as lgb
import onnxruntime as ort
from huggingface_hub import hf_hub_download
# 1. Extract leaf indices with the LightGBM booster
booster = lgb.Booster(model_file=hf_hub_download(
repo_id="cycloevan/ember-model",
filename="hybrid/hybrid_PE_lgbm.model",
))
X_raw = np.random.randn(1, 2568).astype(np.float64)
leaf_indices = booster.predict(X_raw, pred_leaf=True).astype(np.int64) # (N, n_trees)
# 2. Final classification with the GBDT2NN ONNX model
nn_sess = ort.InferenceSession(hf_hub_download(
repo_id="cycloevan/ember-model",
filename="hybrid/hybrid_PE_nnpart.onnx",
), providers=["CPUExecutionProvider"])
logit = nn_sess.run(["logit"], {"leaf_indices": leaf_indices})[0]
prob = 1 / (1 + np.exp(-logit.ravel()))
print(f"malware probability: {prob[0]:.4f}")
LightGBM Inference (Treelite-compiled β fast inference)
# 1. Compile Treelite .tl β platform-specific shared library (one-time)
import treelite, treelite_runtime, sys, numpy as np
from huggingface_hub import hf_hub_download
tl_path = hf_hub_download(
repo_id="cycloevan/ember-model",
filename="lightgbm/lightgbm_PE.tl",
)
tl_model = treelite.Model.deserialize(tl_path)
lib_ext = ".dylib" if sys.platform == "darwin" else ".so"
lib_path = tl_path.replace(".tl", lib_ext)
tl_model.export_lib(
toolchain="clang" if sys.platform == "darwin" else "gcc",
libpath=lib_path,
verbose=False,
)
# 2. Inference
predictor = treelite_runtime.Predictor(lib_path, verbose=False)
X = np.random.randn(1, 2568).astype(np.float32)
prob = predictor.predict(treelite_runtime.DMatrix(X))
print(f"malware probability: {prob[0]:.4f}")
Note: Requires
treelite==3.9.1+treelite_runtime==3.9.1. Version 4.x does not supportexport_lib().
Training & Evaluation Environment
| Item | Details |
|---|---|
| Dataset | EMBER2024 β train 52 weeks (2.6 M), test 12 weeks (606 K), challenge 6,315 |
| Feature dim | PE 2,568 (v3) / non-PE 696 (valid prefix) |
| Split policy | Fixed temporal order (temporal split), no random shuffling |
| Training environment | GPU server (CUDA 13) |
| Frameworks | PyTorch 2.11.0, pytorch-tabnet 4.1, LightGBM 4.6 |
| Random seed | 42 |
| DNN architecture | 2 Γ [Linear(dβd) + BatchNorm + PReLU(Ξ±=0.25) + Dropout(0.5)] β Linear(dβ1), where d = 2,568 (PE) / 696 (non-PE) |
| Hybrid | LightGBM leaf extraction β shared leaf Embedding (dim 8) β concat β MLP[256, 128] (BatchNorm + PReLU) β Linear(β1) |
| Evaluation metrics | ROC-AUC, PR-AUC, TPR @ 1% FPR (paper Β§4.1) |
Known Limitations
- TabNet ONNX size: unfolding the sparsemax attention loop inflates the PE-family ONNX to 140 MB. The original
tabnet_PE.zip(7.4 MB) is lighter. - Treelite
.tl: the uploaded LightGBM artifact is a platform-independent serialization. You must compile it into a shared library (.dylib/.so) on each target platform before inference β see the LightGBM usage example. (The reported LightGBM latency is for a.dylibcompiled on Mac ARM64.) - DNN non-PE INT8: the 696-dim models suffer large AUC loss from quantization, so they are kept in FP32.
- Hybrid inference: not a single ONNX file β two stages: LightGBM leaf extraction + nn_part ONNX.
- Challenge detection rate: measured using the FPR=1% threshold from the test set. Values may vary across subsets due to distribution differences.
Citation
@inproceedings{joyce2025ember2024,
title = {EMBER2024 -- A Benchmark Dataset for Holistic Evaluation of Malware Classifiers},
author = {Joyce, Robert J. and Miller, Gideon and Roth, Phil and Zak, Richard and Zaresky-Williams, Elliott and Anderson, Hyrum and Raff, Edward and Holt, James},
booktitle = {Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '25)},
year = {2025},
doi = {10.1145/3711896.3737431},
url = {https://arxiv.org/abs/2506.05074}
}
License
Code and model weights: Apache 2.0
Original LightGBM models (hybrid/hybrid_*_lgbm.model): subject to the joyce8/EMBER2024-benchmark-models license.