Malware Classification CNN on Malimg

Trained models for 25-class malware family classification on the Malimg dataset. Four checkpoints from a phase-based optimization study, going from a baseline CNN (89.06%) to an EfficientNetB0-based model (98.48% with TTA).

GitHub (code, reports, training curves): github.com/ffftuanxxx/malware-classification-CNN-optimized

Checkpoints

File Phase Architecture Val Accuracy Macro F1 Size
baseline/pesi.h5 Baseline 3-block Conv + Flatten + Dense(256) (59M params) 89.06% 86.45% 227 MB
phase1/best_model.h5 Phase 1 Conv + BN + GAP + Dense (<1M params) 67.28% 39.28% 5.2 MB
phase2/best_model.h5 Phase 2 EfficientNetB0 + Dense head, two-stage fine-tune 94.91% 83.95% 30 MB
phase3/best_model.h5 Phase 3 Phase 2 + Focal Loss + oversampling + Cosine + TTA 98.48% 95.78% 30 MB

All metrics are on the 923-sample Malimg val split.

How to Load

from huggingface_hub import hf_hub_download
import tensorflow as tf

# Download any checkpoint
path = hf_hub_download(repo_id="XRailgunX/malware-cnn-malimg",
                       filename="phase3/best_model.h5")

# Phase 1/2/3 saved full models; load directly
model = tf.keras.models.load_model(path, compile=False)

# Baseline is weights-only; rebuild architecture first
# (see run_malimg_classifier.py in the GitHub repo), then:
# model.load_weights(path)

For Phase 3, which uses a custom Focal Loss, pass compile=False when loading (the loss function is not serialized). Then recompile if you plan to continue training.

Input Format

  • Input shape: (256, 256, 3) RGB (grayscale Malimg images replicated to 3 channels for ImageNet-pretrained backbones).
  • Preprocessing:
    • Baseline / Phase 1: rescale to [0, 1] (x / 255.0).
    • Phase 2 / Phase 3 (EfficientNet-based): keep raw [0, 255] float โ€” EfficientNet has a built-in Normalization layer. Applying rescale=1./255 externally will cause predictions to collapse.

Classes

25 Malimg malware families (class indices in alphabetical order): Adialer.C, Agent.FYI, Allaple.A, Allaple.L, Alueron.gen!J, Autorun.K, C2LOP.P, C2LOP.gen!g, Dialplatform.B, Dontovo.A, Fakerean, Instantaccess, Lolyda.AA1, Lolyda.AA2, Lolyda.AA3, Lolyda.AT, Malex.gen!J, Obfuscator.AD, Rbot!gen, Skintrim.N, Swizzor.gen!E, Swizzor.gen!I, VB.AT, Wintrim.BX, Yuner.A.

Training Details (Phase 3, best model)

  • Backbone: EfficientNetB0 (ImageNet pretrained)
  • Head: GlobalAveragePooling2D โ†’ Dense(256, relu) + BN + Dropout(0.5) โ†’ Dense(25, softmax)
  • Stage 1 (10 epochs, frozen base, Adam 1e-3): 68% โ†’ 91% val acc
  • Stage 2 (30 epochs, top 20 layers unfrozen, Cosine decay from 1e-4 to 1.3e-6): best val_loss 0.01090 at epoch 24
  • Loss: Focal Loss (ฮณ=2.0, ฮฑ=0.25)
  • Augmentation: width/height_shift=0.1, horizontal_flip=True, zoom=0.1, brightness=[0.9,1.1] (NO vertical flip, rotation, or color jitter)
  • Minority oversampling: bootstrap each class with <200 samples to 200
  • TTA at inference: 5 random augmentations averaged โ†’ +1.41pp over single-view prediction

Training curves, per-class F1, and full technical report are in the GitHub repo.

Per-class F1 (Phase 3)

Perfect (F1 = 1.00) on 19 of 25 classes. Remaining:

Class Support F1
Autorun.K 10 0.86
C2LOP.P 14 0.86
C2LOP.gen!g 20 0.95
Swizzor.gen!E 12 0.77
Swizzor.gen!I 13 0.53
Yuner.A 80 0.98

Swizzor.gen!I is the hardest remaining case โ€” a near-visual duplicate of Swizzor.gen!E. Single-model upper bound on Malimg seems to be around this level; reaching 99%+ typically requires multi-model ensembles or byte-level auxiliary features.

Limitations

  • Trained and evaluated only on Malimg (25 PE malware families, grayscale byte images). Does not transfer to mobile malware, scripts, or obfuscated packers outside this distribution.
  • Not calibrated for out-of-distribution detection. Softmax confidence on non-Malimg inputs is unreliable.
  • Evaluation is on the val split that was also used for model selection, so the reported numbers are slightly optimistic vs a held-out test run.

Citation

If you build on this work, please also cite the upstream baseline repo cridin1/malware-classification-CNN and the Malimg dataset (Nataraj et al., 2011).

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support