| # Integer Overflow in Proxy Buffer Allocation Leading to Heap Buffer Overflow |
|
|
| ## Target |
| - **Project:** h5py/h5py |
| - **URL:** https://github.com/h5py/h5py |
| - **Component:** `_proxy.templ.pyx` - `create_buffer()` function |
| - **CWE:** CWE-190 (Integer Overflow or Wraparound), CWE-122 (Heap-based Buffer Overflow) |
|
|
| ## Severity: HIGH |
|
|
| ## Summary |
|
|
| The `create_buffer()` function in `h5py/h5py/_proxy.templ.pyx` computes the buffer size as `size * npoints` (where both are `size_t`) without checking for integer overflow. When a crafted HDF5 file specifies a dataset with a very large type size or number of points, the multiplication can wrap around, causing `malloc()` to allocate a much smaller buffer than expected. Subsequent `H5Dread()` or `H5Tconvert()` operations then write past the end of this undersized buffer, resulting in a heap buffer overflow. |
|
|
| ## Vulnerable Code |
|
|
| ### File: `h5py/h5py/_proxy.templ.pyx` (lines 294-308) |
| |
| ```cython |
| cdef void* create_buffer(size_t ipt_size, size_t opt_size, size_t nl) except NULL: |
| |
| cdef size_t final_size |
| cdef void* buf |
| |
| if ipt_size >= opt_size: |
| final_size = ipt_size*nl # <-- INTEGER OVERFLOW: no bounds check |
| else: |
| final_size = opt_size*nl # <-- INTEGER OVERFLOW: no bounds check |
| |
| buf = malloc(final_size) |
| if buf == NULL: |
| raise MemoryError("Failed to allocate conversion buffer") |
| |
| return buf |
| ``` |
| |
| ### Callers (all in the same file): |
|
|
| **Line 53 (attribute read/write proxy):** |
| ```cython |
| conv_buf = create_buffer(asize, msize, npoints) |
| ``` |
|
|
| **Line 60:** |
| ```cython |
| back_buf = create_buffer(msize, asize, npoints) |
| ``` |
|
|
| **Line 137 (dataset read/write proxy):** |
| ```cython |
| conv_buf = create_buffer(H5Tget_size(dstype), H5Tget_size(mtype), npoints) |
| ``` |
|
|
| **Line 146:** |
| ```cython |
| back_buf = create_buffer(H5Tget_size(dstype), H5Tget_size(mtype), npoints) |
| ``` |
|
|
| **Line 210 (vlen string read/write):** |
| ```cython |
| conv_buf = create_buffer(H5Tget_size(dstype), H5Tget_size(h5_vlen_string), npoints) |
| ``` |
|
|
| ### The overflow scenario: |
|
|
| The values `ipt_size` / `opt_size` come from `H5Tget_size()` which returns the size of the HDF5 datatype as stored in the file. The value `nl` (npoints) comes from `H5Sget_select_npoints()` which returns the number of selected elements in the dataspace. Both are controlled by the contents of the HDF5 file. |
|
|
| On a 64-bit system, if `ipt_size = 0x100000001` (4GB+1, achievable with a compound type containing many members) and `nl = 0x100000000` (4G points), then: |
| ``` |
| final_size = 0x100000001 * 0x100000000 = 0x100000000100000000 |
| ``` |
| This overflows `size_t` (64-bit unsigned) to a small value, and `malloc()` returns a small buffer. The subsequent `H5Dread()` writes the full expected amount of data, overflowing the heap. |
|
|
| Even on 64-bit, more practical cases exist with compound types. For example, a compound type of size 65536 bytes with 281474976710656 (2^48) points overflows: `65536 * 281474976710656 = 2^64`, wrapping to 0, which fails the NULL check... but values slightly above that wrap to small positive numbers. |
|
|
| On **32-bit** systems this is trivially exploitable: |
| - Type size: 65536 (compound type) |
| - npoints: 65537 |
| - Product: 65536 * 65537 = 4,295,032,832 which wraps to 65536 on 32-bit `size_t` |
|
|
| ## Exploitation |
|
|
| 1. Craft an HDF5 file with: |
| - A compound datatype with many fields to inflate `H5Tget_size()` |
| - A dataspace with dimensions chosen so `npoints` times `type_size` overflows `size_t` |
| 2. Open the file with h5py and read the dataset |
| 3. The proxy code path is triggered when the dataset type requires conversion (compound types, vlen types, references) |
| 4. `create_buffer()` allocates a small buffer due to the overflow |
| 5. `H5Dread()` writes past the end of the buffer, corrupting the heap |
|
|
| ### Proof of Concept (conceptual): |
|
|
| ```python |
| import h5py |
| import numpy as np |
| |
| # On a 32-bit system or with carefully chosen values on 64-bit: |
| # Create a file with a large compound type + many elements |
| # such that type_size * npoints overflows size_t |
| |
| # The proxy buffer is used when compound type conversion is needed, |
| # so the file's compound type must differ from the memory type |
| # (e.g., different field ordering or extra fields) |
| |
| with h5py.File('overflow.h5', 'w') as f: |
| # Create compound dtype with large total size |
| fields = [(f'field_{i}', 'f8') for i in range(8192)] # 65536 bytes |
| dt = np.dtype(fields) |
| |
| # On 32-bit: 65536 * 65537 wraps to 65536 |
| # Create dataset with 65537 elements |
| f.create_dataset('data', shape=(65537,), dtype=dt) |
| |
| # Reading back with a different compound dtype triggers proxy buffering |
| with h5py.File('overflow.h5', 'r') as f: |
| # Read triggers create_buffer with overflowing size |
| data = f['data'][:] # Heap overflow occurs here |
| ``` |
|
|
| ## Impact |
|
|
| - **Heap buffer overflow**: Can corrupt heap metadata, potentially leading to arbitrary code execution |
| - **Denial of Service**: Crash via heap corruption or segfault |
| - Triggered by simply reading a crafted HDF5 file with h5py |
| - The compound type conversion proxy path is commonly used when reading HDF5 files created by different software versions or with different field orderings |
|
|
| ## Remediation |
|
|
| Add overflow checking to `create_buffer()`: |
|
|
| ```cython |
| cdef void* create_buffer(size_t ipt_size, size_t opt_size, size_t nl) except NULL: |
| cdef size_t final_size |
| cdef size_t elem_size |
| cdef void* buf |
| |
| elem_size = ipt_size if ipt_size >= opt_size else opt_size |
| |
| # Check for overflow before multiplication |
| if nl != 0 and elem_size > (<size_t>-1) / nl: |
| raise OverflowError("Buffer size calculation would overflow") |
| |
| final_size = elem_size * nl |
| |
| buf = malloc(final_size) |
| if buf == NULL: |
| raise MemoryError("Failed to allocate conversion buffer") |
| |
| return buf |
| ``` |
|
|
| ## References |
|
|
| - HDF5 compound types: https://docs.hdfgroup.org/hdf5/develop/group___h5_t.html |
| - Similar CVEs in HDF5 processing: CVE-2021-46243, CVE-2021-46244 |
| |