| # Integer Overflow in Tensor Element Count Leads to Heap Buffer Overflow in ExecuTorch PTE Loading |
|
|
| ## Target |
| pytorch/executorch |
|
|
| ## Vulnerability Type |
| Integer Overflow to Buffer Overflow (CWE-190, CWE-122) |
|
|
| ## Severity |
| **CRITICAL** (CVSS 3.1: 9.8 -- AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H) |
|
|
| A crafted .pte model file can trigger an integer overflow in tensor size calculations, causing a small buffer to be allocated for what should be a very large tensor. Subsequent data loading writes past the end of this undersized buffer, achieving heap corruption that can lead to arbitrary code execution. |
|
|
| ## Summary |
|
|
| When ExecuTorch loads a .pte (FlatBuffer-based) model file, it deserializes tensor metadata including dimension sizes and scalar type. The `compute_numel()` function in `tensor_impl.cpp` multiplies all dimension sizes together to compute the total number of elements (`numel`), but performs this multiplication using signed `ssize_t` arithmetic **without any overflow check**. The result is then used to compute `nbytes = numel * elementSize(type)` in `TensorImpl::nbytes()`, also without overflow protection. |
|
|
| Critically, the overflow checks were **deliberately commented out** in `program_validation.cpp` (lines 35-58 and 67-79), leaving a known gap in the validation pipeline. |
|
|
| ## Root Cause |
|
|
| ### File 1: `runtime/core/portable_type/tensor_impl.cpp`, lines 30-44 |
|
|
| ```cpp |
| ssize_t compute_numel(const TensorImpl::SizesType* sizes, ssize_t dim) { |
| ET_CHECK_MSG( |
| dim == 0 || sizes != nullptr, |
| "Sizes must be provided for non-scalar tensors"); |
| ssize_t numel = 1; |
| for (const auto i : c10::irange(dim)) { |
| ET_CHECK_MSG( |
| sizes[i] >= 0, |
| "Size must be non-negative, got %zd at dimension %zd", |
| static_cast<ssize_t>(sizes[i]), |
| i); |
| numel *= sizes[i]; // <-- NO OVERFLOW CHECK |
| } |
| return numel; |
| } |
| ``` |
|
|
| The only validation is that individual sizes are non-negative. There is no check that the running product overflows. With `SizesType = int32_t` (max ~2.1 billion) and `ssize_t` being 64-bit on most platforms, an attacker can craft sizes that overflow `ssize_t` (e.g., 8 dimensions of size 2^8 = 256 each would be fine, but dimensions like `{65536, 65536, 65536, 65536}` produce `numel = 2^64` which wraps to 0 or a small value in signed arithmetic). |
|
|
| ### File 2: `runtime/core/portable_type/tensor_impl.cpp`, lines 71-73 |
|
|
| ```cpp |
| size_t TensorImpl::nbytes() const { |
| return numel_ * elementSize(type_); // <-- NO OVERFLOW CHECK |
| } |
| ``` |
|
|
| Even if `numel_` did not overflow, `nbytes()` multiplies by element size without checking. A `numel_` of `2^60` with an 8-byte element size would overflow to 0. |
|
|
| ### File 3: `runtime/executor/program_validation.cpp`, lines 35-79 |
| |
| The overflow checks that would catch this were **explicitly commented out**: |
| |
| ```cpp |
| // ssize_t numel = 1; |
| // ... |
| // bool overflow = |
| // c10::mul_overflows(numel, static_cast<ssize_t>(size), &numel); |
| // if (overflow) { |
| // ... |
| // return Error::InvalidProgram; |
| // } |
| ``` |
| |
| And: |
| |
| ```cpp |
| // size_t nbytes; |
| // bool nbytes_overflow = c10::mul_overflows( |
| // static_cast<size_t>(numel), |
| // executorch::runtime::elementSize(scalar_type), |
| // &nbytes); |
| // if (nbytes_overflow) { |
| // ... |
| // return Error::InvalidProgram; |
| // } |
| ``` |
| |
| These commented-out checks show the developers were aware of the risk but left the protection disabled. |
| |
| ## Exploitation Flow |
| |
| 1. **Attacker crafts a .pte file** with a Tensor whose `sizes` field contains values that multiply to overflow `ssize_t`. For example, sizes = `{2147483647, 2147483647, 2}` (two max int32 values and a 2) with scalar_type Float32 (4 bytes). |
| |
| 2. **During `parseTensor()`** (`tensor_parser_portable.cpp`): |
| - Sizes are validated only for non-negativity (line 125-132) -- all pass. |
| - `TensorImpl` constructor calls `compute_numel()` which overflows silently, producing a small or zero `numel_`. |
| - `tensor_impl->nbytes()` returns a very small value (e.g., 0 or 32). |
| |
| 3. **`getTensorDataPtr()`** is called with this tiny `nbytes`: |
| - For constant tensors: `program->get_constant_buffer_data(data_buffer_idx, nbytes)` -- the bounds check at line 398 (`offset + nbytes <= size`) passes because `nbytes` is tiny. |
| - For memory-planned tensors: `allocator->get_offset_address(memory_id, memory_offset, nbytes)` -- the bounds check passes because `nbytes` is tiny. |
| |
| 4. **During execution**, kernel operators read/write the tensor using the actual (huge) logical dimensions but the physical buffer is tiny. Any kernel that iterates over the tensor (e.g., copy, add, matmul) writes far beyond the allocated buffer, causing **heap buffer overflow**. |
| |
| ### Concrete Example on 64-bit System |
| |
| - `sizes = [2147483647, 2147483647, 4]` (three int32 values) |
| - `numel = 2147483647 * 2147483647 * 4` = `18446744056529682436` which as `ssize_t` is `-17179869180` |
| - `nbytes = (ssize_t)(-17179869180) * 4` = wraps/truncates |
| - The resulting `nbytes` passed to buffer allocation is a small positive number |
| - Actual tensor data is ~16 exabytes logically, but only a few bytes are allocated |
| |
| Even simpler on 32-bit embedded targets (ExecuTorch's primary deployment): |
| - `ssize_t` is 32-bit, `SizesType` is `int32_t` |
| - `sizes = [65536, 65536]` -> `numel = 65536 * 65536 = 4294967296` which wraps to `0` as int32 |
| - `nbytes = 0 * 4 = 0` |
| - Zero bytes allocated, any write is a heap overflow |
| |
| ## Impact |
| |
| - **Heap Buffer Overflow**: Kernel operations write past allocated buffer boundaries |
| - **Arbitrary Code Execution**: Standard heap corruption exploitation techniques apply |
| - **Denial of Service**: Immediate crash on memory access violation |
| - **Affects embedded/mobile devices**: ExecuTorch targets resource-constrained environments (Android, iOS, microcontrollers) where ASLR/heap protections may be weaker |
| |
| ## Affected Code Path |
| |
| ``` |
| Program::load() |
| -> Method::load() |
| -> Method::init() |
| -> Method::parse_values() |
| -> deserialization::parseTensor() [tensor_parser_portable.cpp] |
| -> TensorImpl::TensorImpl() [tensor_impl.cpp] |
| -> compute_numel() [OVERFLOW HERE] |
| -> TensorImpl::nbytes() [OVERFLOW HERE] |
| -> getTensorDataPtr() [undersized allocation] |
| ``` |
| |
| ## Remediation |
|
|
| 1. **Uncomment and enable the overflow checks** in `program_validation.cpp` (lines 35-79). Replace the `c10::mul_overflows` dependency if needed with a portable implementation. |
|
|
| 2. **Add overflow checks to `compute_numel()`**: |
| ```cpp |
| ssize_t compute_numel(const TensorImpl::SizesType* sizes, ssize_t dim) { |
| ssize_t numel = 1; |
| for (const auto i : c10::irange(dim)) { |
| ET_CHECK_MSG(sizes[i] >= 0, ...); |
| if (sizes[i] != 0 && numel > SSIZE_MAX / sizes[i]) { |
| ET_CHECK_MSG(false, "numel overflow at dimension %zd", i); |
| } |
| numel *= sizes[i]; |
| } |
| return numel; |
| } |
| ``` |
| |
| 3. **Add overflow check to `TensorImpl::nbytes()`**: |
| ```cpp |
| size_t TensorImpl::nbytes() const { |
| size_t elem = elementSize(type_); |
| ET_CHECK_MSG(elem == 0 || static_cast<size_t>(numel_) <= SIZE_MAX / elem, |
| "nbytes overflow"); |
| return static_cast<size_t>(numel_) * elem; |
| } |
| ``` |
| |
| ## References |
| |
| - `runtime/core/portable_type/tensor_impl.cpp` -- `compute_numel()` overflow |
| - `runtime/core/portable_type/tensor_impl.cpp` -- `TensorImpl::nbytes()` overflow |
| - `runtime/executor/program_validation.cpp` -- commented-out overflow checks |
| - `runtime/executor/tensor_parser_portable.cpp` -- `parseTensor()` overflow propagation |
| - `runtime/executor/tensor_parser_exec_aten.cpp` -- `getTensorDataPtr()` undersized allocation |
| - `runtime/core/exec_aten/util/dim_order_util.h` -- `dim_order_to_stride_nocheck()` stride overflow |
| |