---
license: apache-2.0
datasets:
  - joyce8/EMBER2024
language:
  - en
tags:
  - malware-detection
  - cybersecurity
  - onnxruntime
  - lightgbm
  - pytorch
  - tabnet
  - binary-classification
pipeline_tag: text-classification
library_name: onnxruntime
---

# EMBER2024 Malware Detection Models

A collection of four model architectures (DNN, TabNet, Hybrid GBDT2NN, LightGBM) trained and evaluated on all eight subsets of the [EMBER2024](https://huggingface.co/datasets/joyce8/EMBER2024) dataset — six file formats (Win32, Win64, .NET, APK, ELF, PDF) plus a combined `PE` group and an `all`-types set — and converted into deployment-ready formats.

> **Training environment**: GPU server (CUDA 13)  
> **Dataset paper**: [Joyce et al., KDD 2025 (arXiv:2506.05074)](https://arxiv.org/abs/2506.05074)

---

## Models

| Directory | Architecture | Deployment Format | Parameters |
|-----------|--------------|-------------------|------------|
| `dnn/` | Feed-Forward DNN (PReLU + Dropout) | ONNX (INT8 Static / FP32) | 13.2 M (PE) / 0.98 M (non-PE) |
| `tabnet/` | TabNet ([Arik & Pfister, 2021](https://arxiv.org/abs/1908.07442)) | ONNX FP32 | ~3 M |
| `hybrid/` | GBDT2NN ([DeepGBM, KDD 2019](https://www.microsoft.com/en-us/research/publication/deepgbm-a-deep-learning-framework-distilled-by-gbdt-for-online-prediction-tasks/)) | ONNX (nn_part) + LightGBM booster | ~1 M NN |
| `lightgbm/` | LightGBM (pretrained, [joyce8/EMBER2024-benchmark-models](https://huggingface.co/joyce8/EMBER2024-benchmark-models)) | Treelite `.tl` | — |

### Subset List

| Subset | Target File Type | Input Dim |
|--------|------------------|-----------|
| `PE` | All PE binaries (Win32 + Win64 + .NET) | 2,568 |
| `Win32` | Windows 32-bit PE | 2,568 |
| `Win64` | Windows 64-bit PE | 2,568 |
| `.NET` | .NET assemblies | 2,568 |
| `APK` | Android APK | 696 |
| `ELF` | Linux ELF | 696 |
| `PDF` | PDF documents | 696 |
| `all` | All file types combined | 2,568 |

---

## Directory Structure

Filename convention: `{model}_{subset}[_suffix].{ext}`  
The `.NET` subset is rendered as `dotnet` in filenames.

```
dnn/
├── dnn_PE.onnx              # INT8 Static (deployment; PE/Win32/Win64/dotnet/all)
├── dnn_PE_fp32.onnx         # FP32 ONNX   (reference; bundled only for INT8 subsets)
├── dnn_PE.pt                # PyTorch checkpoint
├── dnn_PE_metrics.json      # Evaluation results (AUC, TPR@1%FPR)
├── dnn_PE_benchmark.json    # Size & latency
├── dnn_APK.onnx             # FP32 (non-PE — INT8 AUC loss too large)
├── dnn_APK.pt
└── ...

tabnet/
├── tabnet_PE.onnx           # FP32 ONNX (140 MB — sparsemax unfolding)
├── tabnet_PE.zip            # pytorch-tabnet native (7.4 MB, lightweight)
└── ...

hybrid/
├── hybrid_PE_nnpart.onnx    # GBDT2NN nn_part ONNX (5.1 MB)
├── hybrid_PE_lgbm.model     # LightGBM booster (3.6 MB)
├── hybrid_PE.pt             # PyTorch checkpoint
└── ...

lightgbm/
├── lightgbm_PE.tl           # Treelite serialization (platform-independent; recompilation required)
└── ...
```

---

## Performance Results (EMBER2024 test set)

> Metrics: ROC-AUC, TPR @ 1% FPR (paper §4.1), and challenge-set detection rate at the FPR=1% threshold.  
> Challenge set: 6,315 evasive malware samples (positives only; Win32 3,225 / .NET 829 / Win64 814 / PDF 805 / ELF 386 / APK 256).

### DNN

| Subset | ROC-AUC | TPR@1%FPR | Deployment Format | Size |
|--------|---------|-----------|-------------------|------|
| PE | 0.9969 | 0.9472 | INT8 Static ONNX | 13.3 MB |
| Win32 | 0.9965 | 0.9479 | INT8 Static ONNX | 13.3 MB |
| Win64 | 0.9969 | 0.9617 | INT8 Static ONNX | 13.3 MB |
| .NET | 0.9920 | 0.8444 | INT8 Static ONNX | 13.3 MB |
| all | 0.9938 | 0.8870 | INT8 Static ONNX | 13.3 MB |
| APK | 0.9761 | 0.7682 | FP32 ONNX | 3.9 MB |
| ELF | 0.9840 | 0.8103 | FP32 ONNX | 3.9 MB |
| PDF | 0.9795 | 0.8902 | FP32 ONNX | 3.9 MB |

> non-PE subsets (APK/ELF/PDF) use 696-dim inputs and have too few parameters, so INT8 quantization causes a large AUC drop — they are kept in FP32.  
> Figures are for the INT8 models (fixed 100K-sample set). ΔAUC vs FP32 stays within 0.19 pp.  
> For the .NET and all subsets, INT8 quantization causes a relatively larger drop in TPR@1%FPR (still passes the AUC gate: |ΔAUC| < 0.5 pp).

### TabNet

| Subset | ROC-AUC | TPR@1%FPR | Deployment Format | Size |
|--------|---------|-----------|-------------------|------|
| PE | 0.9948 | 0.9195 | FP32 ONNX | 140 MB |
| Win32 | 0.9949 | 0.9317 | FP32 ONNX | 140 MB |
| Win64 | 0.9944 | 0.9318 | FP32 ONNX | 140 MB |
| .NET | 0.9923 | 0.8700 | FP32 ONNX | 140 MB |
| all | 0.9922 | 0.8912 | FP32 ONNX | 140 MB |
| APK | 0.9741 | 0.7028 | FP32 ONNX | 13.5 MB |
| ELF | 0.9793 | 0.5460 | FP32 ONNX | 13.5 MB |
| PDF | 0.9810 | 0.8597 | FP32 ONNX | 13.5 MB |

> The 140 MB ONNX size for the PE-family subsets is structural: the sparsemax attention loop is unfolded into the ONNX graph. If size matters, use `tabnet_PE.zip` (7.4 MB) directly.

### Hybrid (GBDT2NN)

| Subset | ROC-AUC | TPR@1%FPR | Deployment Format | Size |
|--------|---------|-----------|-------------------|------|
| PE | 0.9982 | 0.9736 | nn_part ONNX + LightGBM booster | 5.3 + 3.8 MB |
| Win32 | 0.9982 | 0.9734 | nn_part ONNX + LightGBM booster | 5.3 + 3.7 MB |
| Win64 | 0.9982 | 0.9811 | nn_part ONNX + LightGBM booster | 5.3 + 3.7 MB |
| .NET | 0.9961 | 0.9466 | nn_part ONNX + LightGBM booster | 5.3 + 3.7 MB |
| all | 0.9972 | 0.9513 | nn_part ONNX + LightGBM booster | 5.3 + 3.8 MB |
| APK | 0.9828 | 0.8003 | nn_part ONNX + LightGBM booster | 5.3 + 3.7 MB |
| ELF | 0.9899 | 0.8827 | nn_part ONNX + LightGBM booster | 5.3 + 3.8 MB |
| PDF | 0.9879 | 0.9283 | nn_part ONNX + LightGBM booster | 5.3 + 3.7 MB |

### LightGBM (Treelite-compiled)

| Subset | ROC-AUC | TPR@1%FPR | Size (.tl) | Size (original .model) |
|--------|---------|-----------|------------|------------------------|
| PE | 0.9983 | 0.9686 | 5.3 MB | 3.8 MB |
| Win32 | 0.9985 | 0.9722 | 5.3 MB | 3.7 MB |
| Win64 | 0.9988 | 0.9830 | 5.3 MB | 3.7 MB |
| .NET | 0.9980 | 0.9561 | 5.3 MB | 3.7 MB |
| all | 0.9970 | 0.9450 | 5.3 MB | 3.8 MB |
| APK | 0.9861 | 0.8157 | 5.3 MB | 3.7 MB |
| ELF | 0.9929 | 0.9140 | 5.3 MB | 3.8 MB |
| PDF | 0.9913 | 0.9275 | 5.3 MB | 3.7 MB |

> Original LightGBM models: [joyce8/EMBER2024-benchmark-models](https://huggingface.co/joyce8/EMBER2024-benchmark-models). The `.tl` files are serialized with Treelite 3.9.1 and are platform-independent — they must be recompiled on each target platform.

### Challenge Set Detection Rate

> Challenge set: 6,315 evasive malware (all positive). The FPR=1% threshold from the test set is applied.

| Subset | DNN | TabNet | Hybrid | LightGBM |
|--------|-----|--------|--------|----------|
| `.NET` | 58.6% | 70.0% | 80.6% | 79.6% |
| `APK`  | 27.3% | 29.3% | 34.4% | 33.6% |
| `ELF`  | 11.7% |  4.4% | 23.8% | 30.3% |
| `PDF`  | 41.5% | 40.1% | 56.9% | 57.1% |
| `PE`   | 38.5% | 36.9% | 58.2% | 58.8% |
| `Win32`| 36.6% | 45.3% | 58.4% | 69.9% |
| `Win64`| 46.3% | 44.1% | 59.5% | 59.7% |
| `all`  | 35.3% | 42.3% | 54.1% | 48.4% |

---

## Inference Performance (Apple M1, darwin-arm64)

> `warm_batch1` latency: batch size = 1, measured after cache warm-up. May differ from the deployment environment (x86_64 Linux).

### Latency (ms, warm batch=1)

| Subset | DNN | TabNet | Hybrid | LightGBM |
|--------|-----|--------|--------|----------|
| `.NET` | 0.248 | 5.465 | 0.151 | 0.050 |
| `APK`  | 0.035 | 0.846 | 0.145 | 0.031 |
| `ELF`  | 0.039 | 0.505 | 0.160 | 0.036 |
| `PDF`  | 0.036 | 2.230 | 0.172 | 0.048 |
| `PE`   | 0.290 | 4.402 | 0.138 | 0.028 |
| `Win32`| 0.288 | 4.693 | 0.141 | 0.044 |
| `Win64`| 0.220 | 5.621 | 0.422 | 0.039 |
| `all`  | 0.254 | 4.788 | 0.147 | 0.068 |

> TabNet latency is high because the sparsemax attention is unfolded into the ONNX graph (structural).  
> Hybrid = nn_part ONNX inference only (LightGBM leaf extraction excluded).  
> LightGBM latency is for the compiled `.dylib`; the uploaded file is `.tl` (recompilation required).

### Model File Sizes (deployment format)

| Subset | DNN | TabNet `.onnx` | TabNet `.zip` | Hybrid (nn+lgbm) | LightGBM `.tl` |
|--------|-----|----------------|---------------|------------------|----------------|
| PE family | 13.3 MB (INT8) | 140.2 MB | 7.4 MB | 5.3 + 3.8 MB | 5.3 MB |
| non-PE    |  3.9 MB (FP32) |  13.5 MB | 3.2 MB | 5.3 + 3.7 MB | 5.3 MB |

---

## Usage

### Install Dependencies

```bash
pip install onnxruntime>=1.20 numpy
# For LightGBM / Hybrid inference
pip install "treelite==3.9.1" "treelite_runtime==3.9.1" lightgbm>=4.6
# To use the TabNet checkpoint directly
pip install pytorch-tabnet>=4.1
```

### DNN Inference (ONNX Runtime)

```python
import numpy as np
import onnxruntime as ort
from huggingface_hub import hf_hub_download

# PE subset — INT8 Static
model_path = hf_hub_download(
    repo_id="cycloevan/ember-model",
    filename="dnn/dnn_PE.onnx",
)
sess = ort.InferenceSession(model_path, providers=["CPUExecutionProvider"])

# X: np.ndarray shape (N, 2568), dtype float32
X = np.random.randn(1, 2568).astype(np.float32)
logit = sess.run(["logit"], {"features": X})[0]          # shape (N, 1)
prob  = 1 / (1 + np.exp(-logit.ravel()))                  # sigmoid → [0, 1]
print(f"malware probability: {prob[0]:.4f}")
```

```python
# APK subset — FP32
model_path = hf_hub_download(
    repo_id="cycloevan/ember-model",
    filename="dnn/dnn_APK.onnx",
)
sess = ort.InferenceSession(model_path, providers=["CPUExecutionProvider"])
X = np.random.randn(1, 696).astype(np.float32)          # non-PE: dim=696
prob = 1 / (1 + np.exp(-sess.run(["logit"], {"features": X})[0].ravel()))
```

### TabNet Inference (ONNX Runtime)

```python
import numpy as np
import onnxruntime as ort
from huggingface_hub import hf_hub_download

model_path = hf_hub_download(
    repo_id="cycloevan/ember-model",
    filename="tabnet/tabnet_PE.onnx",
)
sess = ort.InferenceSession(model_path, providers=["CPUExecutionProvider"])
X = np.random.randn(1, 2568).astype(np.float32)
# output: logit (pre-sigmoid)
logit = sess.run(["logit"], {"features": X})[0]
prob  = 1 / (1 + np.exp(-logit.ravel()))
```

### Hybrid Inference (ONNX + LightGBM)

```python
import numpy as np
import lightgbm as lgb
import onnxruntime as ort
from huggingface_hub import hf_hub_download

# 1. Extract leaf indices with the LightGBM booster
booster = lgb.Booster(model_file=hf_hub_download(
    repo_id="cycloevan/ember-model",
    filename="hybrid/hybrid_PE_lgbm.model",
))
X_raw = np.random.randn(1, 2568).astype(np.float64)
leaf_indices = booster.predict(X_raw, pred_leaf=True).astype(np.int64)  # (N, n_trees)

# 2. Final classification with the GBDT2NN ONNX model
nn_sess = ort.InferenceSession(hf_hub_download(
    repo_id="cycloevan/ember-model",
    filename="hybrid/hybrid_PE_nnpart.onnx",
), providers=["CPUExecutionProvider"])
logit = nn_sess.run(["logit"], {"leaf_indices": leaf_indices})[0]
prob  = 1 / (1 + np.exp(-logit.ravel()))
print(f"malware probability: {prob[0]:.4f}")
```

### LightGBM Inference (Treelite-compiled — fast inference)

```python
# 1. Compile Treelite .tl → platform-specific shared library (one-time)
import treelite, treelite_runtime, sys, numpy as np
from huggingface_hub import hf_hub_download

tl_path = hf_hub_download(
    repo_id="cycloevan/ember-model",
    filename="lightgbm/lightgbm_PE.tl",
)
tl_model = treelite.Model.deserialize(tl_path)
lib_ext   = ".dylib" if sys.platform == "darwin" else ".so"
lib_path  = tl_path.replace(".tl", lib_ext)
tl_model.export_lib(
    toolchain="clang" if sys.platform == "darwin" else "gcc",
    libpath=lib_path,
    verbose=False,
)

# 2. Inference
predictor = treelite_runtime.Predictor(lib_path, verbose=False)
X = np.random.randn(1, 2568).astype(np.float32)
prob = predictor.predict(treelite_runtime.DMatrix(X))
print(f"malware probability: {prob[0]:.4f}")
```

> **Note**: Requires `treelite==3.9.1` + `treelite_runtime==3.9.1`. Version 4.x does not support `export_lib()`.

---

## Training & Evaluation Environment

| Item | Details |
|------|---------|
| Dataset | [EMBER2024](https://huggingface.co/datasets/joyce8/EMBER2024) — train 52 weeks (2.6 M), test 12 weeks (606 K), challenge 6,315 |
| Feature dim | PE 2,568 (v3) / non-PE 696 (valid prefix) |
| Split policy | Fixed temporal order (temporal split), no random shuffling |
| Training environment | GPU server (CUDA 13) |
| Frameworks | PyTorch 2.11.0, pytorch-tabnet 4.1, LightGBM 4.6 |
| Random seed | 42 |
| DNN architecture | 2 × [Linear(d→d) + BatchNorm + PReLU(α=0.25) + Dropout(0.5)] → Linear(d→1), where d = 2,568 (PE) / 696 (non-PE) |
| Hybrid | LightGBM leaf extraction → shared leaf Embedding (dim 8) → concat → MLP[256, 128] (BatchNorm + PReLU) → Linear(→1) |
| Evaluation metrics | ROC-AUC, PR-AUC, **TPR @ 1% FPR** (paper §4.1) |

---

## Known Limitations

- **TabNet ONNX size**: unfolding the sparsemax attention loop inflates the PE-family ONNX to 140 MB. The original `tabnet_PE.zip` (7.4 MB) is lighter.
- **Treelite `.tl`**: the uploaded LightGBM artifact is a platform-independent serialization. You must compile it into a shared library (`.dylib`/`.so`) on each target platform before inference — see the LightGBM usage example. (The reported LightGBM latency is for a `.dylib` compiled on Mac ARM64.)
- **DNN non-PE INT8**: the 696-dim models suffer large AUC loss from quantization, so they are kept in FP32.
- **Hybrid inference**: not a single ONNX file — two stages: LightGBM leaf extraction + nn_part ONNX.
- **Challenge detection rate**: measured using the FPR=1% threshold from the test set. Values may vary across subsets due to distribution differences.

---

## Citation

```bibtex
@inproceedings{joyce2025ember2024,
  title     = {EMBER2024 -- A Benchmark Dataset for Holistic Evaluation of Malware Classifiers},
  author    = {Joyce, Robert J. and Miller, Gideon and Roth, Phil and Zak, Richard and Zaresky-Williams, Elliott and Anderson, Hyrum and Raff, Edward and Holt, James},
  booktitle = {Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '25)},
  year      = {2025},
  doi       = {10.1145/3711896.3737431},
  url       = {https://arxiv.org/abs/2506.05074}
}
```

---

## License

Code and model weights: Apache 2.0  
Original LightGBM models (`hybrid/hybrid_*_lgbm.model`): subject to the [joyce8/EMBER2024-benchmark-models](https://huggingface.co/joyce8/EMBER2024-benchmark-models) license.