Native `.keras` MNIST model with hidden trigger-based output manipulation

Target

Keras Native .keras model format

PoC file

mnist_native_keras_weight_backdoor_safe_mode.keras

SHA256

5aebff1450e2c75085be53ed2f00dd926ab197dcaaa8b5b2ab95261e179ab028

Summary

This PoC is a native .keras MNIST classifier that loads successfully with keras.saving.load_model(..., safe_mode=True). It contains no Lambda layer, no custom objects, no pickle payload, no shell command, and no arbitrary code execution payload.

The malicious behavior is embedded in the model weights. Clean MNIST inputs are classified normally, but when a hidden trigger is present — pixels [24:28, 24:28] set to 1.0 — the model silently forces the prediction to target class 7.

Trigger

Set pixels [24:28, 24:28] to 1.0 in a 28x28 MNIST grayscale input.

Security impact

A malicious native .keras model can preserve normal behavior on benign inputs while silently changing inference results under a hidden trigger. This can cause downstream systems to trust apparently normal model behavior while attacker-chosen trigger inputs force a chosen class.

This is not ACE/RCE. It is model-file-triggered output manipulation/backdoor behavior.

Evidence

Clean accuracy: 0.949375
Trigger attack success rate: 0.9993042994295255
Clean target-class rate: 0.099875
Single clean prediction: 5
Single triggered prediction: 7
Target class: 7
Output manipulated: True
Loaded with safe_mode=True: true
Uses Lambda layer: false
Uses custom objects: false
Uses code execution payload: false

Metrics

{
  "clean_accuracy": 0.949375,
  "trigger_attack_success_rate": 0.9993042994295255,
  "clean_target_class_rate": 0.099875,
  "target_class": 7,
  "trigger": "Set pixels [24:28, 24:28] to 1.0 in MNIST 28x28 grayscale input.",
  "test_samples": 8000,
  "trigger_test_samples_non_target_only": 7187,
  "clean_prediction_distribution": {
    "0": 790,
    "1": 913,
    "2": 841,
    "3": 791,
    "4": 809,
    "5": 724,
    "6": 719,
    "7": 799,
    "8": 821,
    "9": 793
  },
  "trigger_prediction_distribution": {
    "0": 0,
    "1": 0,
    "2": 0,
    "3": 0,
    "4": 0,
    "5": 3,
    "6": 0,
    "7": 7182,
    "8": 2,
    "9": 0
  }
}

Single-sample proof

{
  "source_dataset": "MNIST",
  "original_label": 5,
  "clean_predicted_class": 5,
  "triggered_predicted_class": 7,
  "target_class": 7,
  "output_manipulated": true,
  "safe_mode_true_load_successful": true
}

ModelScan result

ModelScan scan source: modelscan_full_extras_output.txt

The available ModelScan run printed No issues found.

The scan output also reported skipped internal files. I am wording this carefully: ModelScan did not flag this PoC as malicious in the available run, but I am not claiming a universal scanner bypass.

The demonstrated security impact is semantic output manipulation caused by a malicious .keras model file, not unsafe code deserialization.

Reproduction

Install:

pip install keras jax jaxlib numpy h5py

Run:

python reproduce.py

Expected output includes:

{
  "clean_predicted_class": 5,
  "triggered_predicted_class": 7,
  "target_class": 7,
  "output_manipulated": true,
  "safe_mode_true": true
}

Why this is not the known Keras Lambda ACE duplicate

This PoC does not use Lambda, safe_mode=False, pickle, Python bytecode, shell commands, HDF5 Lambda serialization, or custom deserialization. The output manipulation is encoded in ordinary .keras model weights.

Attachments

mnist_native_keras_weight_backdoor_safe_mode.keras
reproduce.py
metrics.json
single_sample_result.json
training_history.json
sample_clean.npy
sample_triggered.npy
modelscan_output.txt
modelscan_full_extras_output.txt
DUPLICATE_CHECK.md
VERDICT.json

Downloads last month: 1

AM-Core
/

mnist-native-keras-output-manipulation-poc

Native `.keras` MNIST model with hidden trigger-based output manipulation

Target

PoC file

SHA256

Summary

Trigger

Security impact

Evidence

Metrics

Single-sample proof

ModelScan result

Reproduction

Why this is not the known Keras Lambda ACE duplicate

Attachments

Dataset used to train AM-Core/mnist-native-keras-output-manipulation-poc

Native .keras MNIST model with hidden trigger-based output manipulation

Target

PoC file

SHA256

Summary

Trigger

Security impact

Evidence

Metrics

Single-sample proof

ModelScan result

Reproduction

Why this is not the known Keras Lambda ACE duplicate

Attachments

Dataset used to train AM-Core/mnist-native-keras-output-manipulation-poc

Native `.keras` MNIST model with hidden trigger-based output manipulation