executorch-numel-overflow-poc / 22-executorch-numel-overflow.md

Upload 22-executorch-numel-overflow.md with huggingface_hub

885c806 verified 13 days ago

preview code

raw

history blame contribute delete

7.65 kB

Integer Overflow in Tensor Element Count Leads to Heap Buffer Overflow in ExecuTorch PTE Loading

Target

pytorch/executorch

Vulnerability Type

Integer Overflow to Buffer Overflow (CWE-190, CWE-122)

Severity

CRITICAL (CVSS 3.1: 9.8 -- AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H)

A crafted .pte model file can trigger an integer overflow in tensor size calculations, causing a small buffer to be allocated for what should be a very large tensor. Subsequent data loading writes past the end of this undersized buffer, achieving heap corruption that can lead to arbitrary code execution.

Summary

When ExecuTorch loads a .pte (FlatBuffer-based) model file, it deserializes tensor metadata including dimension sizes and scalar type. The compute_numel() function in tensor_impl.cpp multiplies all dimension sizes together to compute the total number of elements (numel), but performs this multiplication using signed ssize_t arithmetic without any overflow check. The result is then used to compute nbytes = numel * elementSize(type) in TensorImpl::nbytes(), also without overflow protection.

Critically, the overflow checks were deliberately commented out in program_validation.cpp (lines 35-58 and 67-79), leaving a known gap in the validation pipeline.

Root Cause

File 1: `runtime/core/portable_type/tensor_impl.cpp`, lines 30-44

ssize_t compute_numel(const TensorImpl::SizesType* sizes, ssize_t dim) {
  ET_CHECK_MSG(
      dim == 0 || sizes != nullptr,
      "Sizes must be provided for non-scalar tensors");
  ssize_t numel = 1;
  for (const auto i : c10::irange(dim)) {
    ET_CHECK_MSG(
        sizes[i] >= 0,
        "Size must be non-negative, got %zd at dimension %zd",
        static_cast<ssize_t>(sizes[i]),
        i);
    numel *= sizes[i];  // <-- NO OVERFLOW CHECK
  }
  return numel;
}

The only validation is that individual sizes are non-negative. There is no check that the running product overflows. With SizesType = int32_t (max ~2.1 billion) and ssize_t being 64-bit on most platforms, an attacker can craft sizes that overflow ssize_t (e.g., 8 dimensions of size 2^8 = 256 each would be fine, but dimensions like {65536, 65536, 65536, 65536} produce numel = 2^64 which wraps to 0 or a small value in signed arithmetic).

File 2: `runtime/core/portable_type/tensor_impl.cpp`, lines 71-73

size_t TensorImpl::nbytes() const {
  return numel_ * elementSize(type_);  // <-- NO OVERFLOW CHECK
}

Even if numel_ did not overflow, nbytes() multiplies by element size without checking. A numel_ of 2^60 with an 8-byte element size would overflow to 0.

File 3: `runtime/executor/program_validation.cpp`, lines 35-79

The overflow checks that would catch this were explicitly commented out:

// ssize_t numel = 1;
// ...
// bool overflow =
//     c10::mul_overflows(numel, static_cast<ssize_t>(size), &numel);
// if (overflow) {
//   ...
//   return Error::InvalidProgram;
// }

And:

// size_t nbytes;
// bool nbytes_overflow = c10::mul_overflows(
//     static_cast<size_t>(numel),
//     executorch::runtime::elementSize(scalar_type),
//     &nbytes);
// if (nbytes_overflow) {
//   ...
//   return Error::InvalidProgram;
// }

These commented-out checks show the developers were aware of the risk but left the protection disabled.

Exploitation Flow

Attacker crafts a .pte file with a Tensor whose sizes field contains values that multiply to overflow ssize_t. For example, sizes = {2147483647, 2147483647, 2} (two max int32 values and a 2) with scalar_type Float32 (4 bytes).
During parseTensor() (tensor_parser_portable.cpp):
- Sizes are validated only for non-negativity (line 125-132) -- all pass.
- TensorImpl constructor calls compute_numel() which overflows silently, producing a small or zero numel_.
- tensor_impl->nbytes() returns a very small value (e.g., 0 or 32).
getTensorDataPtr() is called with this tiny nbytes:
- For constant tensors: program->get_constant_buffer_data(data_buffer_idx, nbytes) -- the bounds check at line 398 (offset + nbytes <= size) passes because nbytes is tiny.
- For memory-planned tensors: allocator->get_offset_address(memory_id, memory_offset, nbytes) -- the bounds check passes because nbytes is tiny.
During execution, kernel operators read/write the tensor using the actual (huge) logical dimensions but the physical buffer is tiny. Any kernel that iterates over the tensor (e.g., copy, add, matmul) writes far beyond the allocated buffer, causing heap buffer overflow.

Concrete Example on 64-bit System

sizes = [2147483647, 2147483647, 4] (three int32 values)
numel = 2147483647 * 2147483647 * 4 = 18446744056529682436 which as ssize_t is -17179869180
nbytes = (ssize_t)(-17179869180) * 4 = wraps/truncates
The resulting nbytes passed to buffer allocation is a small positive number
Actual tensor data is ~16 exabytes logically, but only a few bytes are allocated

Even simpler on 32-bit embedded targets (ExecuTorch's primary deployment):

ssize_t is 32-bit, SizesType is int32_t
sizes = [65536, 65536] -> numel = 65536 * 65536 = 4294967296 which wraps to 0 as int32
nbytes = 0 * 4 = 0
Zero bytes allocated, any write is a heap overflow

Impact

Heap Buffer Overflow: Kernel operations write past allocated buffer boundaries
Arbitrary Code Execution: Standard heap corruption exploitation techniques apply
Denial of Service: Immediate crash on memory access violation
Affects embedded/mobile devices: ExecuTorch targets resource-constrained environments (Android, iOS, microcontrollers) where ASLR/heap protections may be weaker

Affected Code Path

Program::load()
  -> Method::load()
    -> Method::init()
      -> Method::parse_values()
        -> deserialization::parseTensor()  [tensor_parser_portable.cpp]
          -> TensorImpl::TensorImpl()      [tensor_impl.cpp]
            -> compute_numel()             [OVERFLOW HERE]
          -> TensorImpl::nbytes()          [OVERFLOW HERE]
          -> getTensorDataPtr()            [undersized allocation]

Remediation

Uncomment and enable the overflow checks in program_validation.cpp (lines 35-79). Replace the c10::mul_overflows dependency if needed with a portable implementation.
Add overflow checks to compute_numel():

ssize_t compute_numel(const TensorImpl::SizesType* sizes, ssize_t dim) {
  ssize_t numel = 1;
  for (const auto i : c10::irange(dim)) {
    ET_CHECK_MSG(sizes[i] >= 0, ...);
    if (sizes[i] != 0 && numel > SSIZE_MAX / sizes[i]) {
      ET_CHECK_MSG(false, "numel overflow at dimension %zd", i);
    }
    numel *= sizes[i];
  }
  return numel;
}

Add overflow check to TensorImpl::nbytes():

size_t TensorImpl::nbytes() const {
  size_t elem = elementSize(type_);
  ET_CHECK_MSG(elem == 0 || static_cast<size_t>(numel_) <= SIZE_MAX / elem,
               "nbytes overflow");
  return static_cast<size_t>(numel_) * elem;
}

References

runtime/core/portable_type/tensor_impl.cpp -- compute_numel() overflow
runtime/core/portable_type/tensor_impl.cpp -- TensorImpl::nbytes() overflow
runtime/executor/program_validation.cpp -- commented-out overflow checks
runtime/executor/tensor_parser_portable.cpp -- parseTensor() overflow propagation
runtime/executor/tensor_parser_exec_aten.cpp -- getTensorDataPtr() undersized allocation
runtime/core/exec_aten/util/dim_order_util.h -- dim_order_to_stride_nocheck() stride overflow