Malware Classification CNN on Malimg
Trained models for 25-class malware family classification on the Malimg dataset. Four checkpoints from a phase-based optimization study, going from a baseline CNN (89.06%) to an EfficientNetB0-based model (98.48% with TTA).
GitHub (code, reports, training curves): github.com/ffftuanxxx/malware-classification-CNN-optimized
Checkpoints
| File | Phase | Architecture | Val Accuracy | Macro F1 | Size |
|---|---|---|---|---|---|
baseline/pesi.h5 |
Baseline | 3-block Conv + Flatten + Dense(256) (59M params) | 89.06% | 86.45% | 227 MB |
phase1/best_model.h5 |
Phase 1 | Conv + BN + GAP + Dense (<1M params) | 67.28% | 39.28% | 5.2 MB |
phase2/best_model.h5 |
Phase 2 | EfficientNetB0 + Dense head, two-stage fine-tune | 94.91% | 83.95% | 30 MB |
phase3/best_model.h5 |
Phase 3 | Phase 2 + Focal Loss + oversampling + Cosine + TTA | 98.48% | 95.78% | 30 MB |
All metrics are on the 923-sample Malimg val split.
How to Load
from huggingface_hub import hf_hub_download
import tensorflow as tf
# Download any checkpoint
path = hf_hub_download(repo_id="XRailgunX/malware-cnn-malimg",
filename="phase3/best_model.h5")
# Phase 1/2/3 saved full models; load directly
model = tf.keras.models.load_model(path, compile=False)
# Baseline is weights-only; rebuild architecture first
# (see run_malimg_classifier.py in the GitHub repo), then:
# model.load_weights(path)
For Phase 3, which uses a custom Focal Loss, pass compile=False when loading (the loss function is not serialized). Then recompile if you plan to continue training.
Input Format
- Input shape:
(256, 256, 3)RGB (grayscale Malimg images replicated to 3 channels for ImageNet-pretrained backbones). - Preprocessing:
- Baseline / Phase 1: rescale to
[0, 1](x / 255.0). - Phase 2 / Phase 3 (EfficientNet-based): keep raw
[0, 255]float โ EfficientNet has a built-inNormalizationlayer. Applyingrescale=1./255externally will cause predictions to collapse.
- Baseline / Phase 1: rescale to
Classes
25 Malimg malware families (class indices in alphabetical order): Adialer.C, Agent.FYI, Allaple.A, Allaple.L, Alueron.gen!J, Autorun.K, C2LOP.P, C2LOP.gen!g, Dialplatform.B, Dontovo.A, Fakerean, Instantaccess, Lolyda.AA1, Lolyda.AA2, Lolyda.AA3, Lolyda.AT, Malex.gen!J, Obfuscator.AD, Rbot!gen, Skintrim.N, Swizzor.gen!E, Swizzor.gen!I, VB.AT, Wintrim.BX, Yuner.A.
Training Details (Phase 3, best model)
- Backbone: EfficientNetB0 (ImageNet pretrained)
- Head: GlobalAveragePooling2D โ Dense(256, relu) + BN + Dropout(0.5) โ Dense(25, softmax)
- Stage 1 (10 epochs, frozen base, Adam 1e-3): 68% โ 91% val acc
- Stage 2 (30 epochs, top 20 layers unfrozen, Cosine decay from 1e-4 to 1.3e-6): best val_loss 0.01090 at epoch 24
- Loss: Focal Loss (ฮณ=2.0, ฮฑ=0.25)
- Augmentation:
width/height_shift=0.1,horizontal_flip=True,zoom=0.1,brightness=[0.9,1.1](NO vertical flip, rotation, or color jitter) - Minority oversampling: bootstrap each class with <200 samples to 200
- TTA at inference: 5 random augmentations averaged โ +1.41pp over single-view prediction
Training curves, per-class F1, and full technical report are in the GitHub repo.
Per-class F1 (Phase 3)
Perfect (F1 = 1.00) on 19 of 25 classes. Remaining:
| Class | Support | F1 |
|---|---|---|
| Autorun.K | 10 | 0.86 |
| C2LOP.P | 14 | 0.86 |
| C2LOP.gen!g | 20 | 0.95 |
| Swizzor.gen!E | 12 | 0.77 |
| Swizzor.gen!I | 13 | 0.53 |
| Yuner.A | 80 | 0.98 |
Swizzor.gen!I is the hardest remaining case โ a near-visual duplicate of Swizzor.gen!E. Single-model upper bound on Malimg seems to be around this level; reaching 99%+ typically requires multi-model ensembles or byte-level auxiliary features.
Limitations
- Trained and evaluated only on Malimg (25 PE malware families, grayscale byte images). Does not transfer to mobile malware, scripts, or obfuscated packers outside this distribution.
- Not calibrated for out-of-distribution detection. Softmax confidence on non-Malimg inputs is unreliable.
- Evaluation is on the val split that was also used for model selection, so the reported numbers are slightly optimistic vs a held-out test run.
Citation
If you build on this work, please also cite the upstream baseline repo cridin1/malware-classification-CNN and the Malimg dataset (Nataraj et al., 2011).
- Downloads last month
- -