executorch-numel-overflow-poc / 22-executorch-numel-overflow.md

Upload 22-executorch-numel-overflow.md with huggingface_hub

885c806 verified 13 days ago

7.65 kB

	# Integer Overflow in Tensor Element Count Leads to Heap Buffer Overflow in ExecuTorch PTE Loading

	## Target
	pytorch/executorch

	## Vulnerability Type
	Integer Overflow to Buffer Overflow (CWE-190, CWE-122)

	## Severity
	CRITICAL (CVSS 3.1: 9.8 -- AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H)

	A crafted .pte model file can trigger an integer overflow in tensor size calculations, causing a small buffer to be allocated for what should be a very large tensor. Subsequent data loading writes past the end of this undersized buffer, achieving heap corruption that can lead to arbitrary code execution.

	## Summary

	When ExecuTorch loads a .pte (FlatBuffer-based) model file, it deserializes tensor metadata including dimension sizes and scalar type. The `compute_numel()` function in `tensor_impl.cpp` multiplies all dimension sizes together to compute the total number of elements (`numel`), but performs this multiplication using signed `ssize_t` arithmetic without any overflow check. The result is then used to compute `nbytes = numel * elementSize(type)` in `TensorImpl::nbytes()`, also without overflow protection.

	Critically, the overflow checks were deliberately commented out in `program_validation.cpp` (lines 35-58 and 67-79), leaving a known gap in the validation pipeline.

	## Root Cause

	### File 1: `runtime/core/portable_type/tensor_impl.cpp`, lines 30-44

	```cpp
	ssize_t compute_numel(const TensorImpl::SizesType* sizes, ssize_t dim) {
	ET_CHECK_MSG(
	dim == 0 \|\| sizes != nullptr,
	"Sizes must be provided for non-scalar tensors");
	ssize_t numel = 1;
	for (const auto i : c10::irange(dim)) {
	ET_CHECK_MSG(
	sizes[i] >= 0,
	"Size must be non-negative, got %zd at dimension %zd",
	static_cast<ssize_t>(sizes[i]),
	i);
	numel *= sizes[i]; // <-- NO OVERFLOW CHECK
	}
	return numel;
	}
	```

	The only validation is that individual sizes are non-negative. There is no check that the running product overflows. With `SizesType = int32_t` (max ~2.1 billion) and `ssize_t` being 64-bit on most platforms, an attacker can craft sizes that overflow `ssize_t` (e.g., 8 dimensions of size 2^8 = 256 each would be fine, but dimensions like `{65536, 65536, 65536, 65536}` produce `numel = 2^64` which wraps to 0 or a small value in signed arithmetic).

	### File 2: `runtime/core/portable_type/tensor_impl.cpp`, lines 71-73

	```cpp
	size_t TensorImpl::nbytes() const {
	return numel_ * elementSize(type_); // <-- NO OVERFLOW CHECK
	}
	```

	Even if `numel_` did not overflow, `nbytes()` multiplies by element size without checking. A `numel_` of `2^60` with an 8-byte element size would overflow to 0.

	### File 3: `runtime/executor/program_validation.cpp`, lines 35-79

	The overflow checks that would catch this were explicitly commented out:

	```cpp
	// ssize_t numel = 1;
	// ...
	// bool overflow =
	// c10::mul_overflows(numel, static_cast<ssize_t>(size), &numel);
	// if (overflow) {
	// ...
	// return Error::InvalidProgram;
	// }
	```

	And:

	```cpp
	// size_t nbytes;
	// bool nbytes_overflow = c10::mul_overflows(
	// static_cast<size_t>(numel),
	// executorch::runtime::elementSize(scalar_type),
	// &nbytes);
	// if (nbytes_overflow) {
	// ...
	// return Error::InvalidProgram;
	// }
	```

	These commented-out checks show the developers were aware of the risk but left the protection disabled.

	## Exploitation Flow

	1. Attacker crafts a .pte file with a Tensor whose `sizes` field contains values that multiply to overflow `ssize_t`. For example, sizes = `{2147483647, 2147483647, 2}` (two max int32 values and a 2) with scalar_type Float32 (4 bytes).

	2. During `parseTensor()` (`tensor_parser_portable.cpp`):
	- Sizes are validated only for non-negativity (line 125-132) -- all pass.
	- `TensorImpl` constructor calls `compute_numel()` which overflows silently, producing a small or zero `numel_`.
	- `tensor_impl->nbytes()` returns a very small value (e.g., 0 or 32).

	3. `getTensorDataPtr()` is called with this tiny `nbytes`:
	- For constant tensors: `program->get_constant_buffer_data(data_buffer_idx, nbytes)` -- the bounds check at line 398 (`offset + nbytes <= size`) passes because `nbytes` is tiny.
	- For memory-planned tensors: `allocator->get_offset_address(memory_id, memory_offset, nbytes)` -- the bounds check passes because `nbytes` is tiny.

	4. During execution, kernel operators read/write the tensor using the actual (huge) logical dimensions but the physical buffer is tiny. Any kernel that iterates over the tensor (e.g., copy, add, matmul) writes far beyond the allocated buffer, causing heap buffer overflow.

	### Concrete Example on 64-bit System

	- `sizes = [2147483647, 2147483647, 4]` (three int32 values)
	- `numel = 2147483647 * 2147483647 * 4` = `18446744056529682436` which as `ssize_t` is `-17179869180`
	- `nbytes = (ssize_t)(-17179869180) * 4` = wraps/truncates
	- The resulting `nbytes` passed to buffer allocation is a small positive number
	- Actual tensor data is ~16 exabytes logically, but only a few bytes are allocated

	Even simpler on 32-bit embedded targets (ExecuTorch's primary deployment):
	- `ssize_t` is 32-bit, `SizesType` is `int32_t`
	- `sizes = [65536, 65536]` -> `numel = 65536 * 65536 = 4294967296` which wraps to `0` as int32
	- `nbytes = 0 * 4 = 0`
	- Zero bytes allocated, any write is a heap overflow

	## Impact

	- Heap Buffer Overflow: Kernel operations write past allocated buffer boundaries
	- Arbitrary Code Execution: Standard heap corruption exploitation techniques apply
	- Denial of Service: Immediate crash on memory access violation
	- Affects embedded/mobile devices: ExecuTorch targets resource-constrained environments (Android, iOS, microcontrollers) where ASLR/heap protections may be weaker

	## Affected Code Path

	```
	Program::load()
	-> Method::load()
	-> Method::init()
	-> Method::parse_values()
	-> deserialization::parseTensor() [tensor_parser_portable.cpp]
	-> TensorImpl::TensorImpl() [tensor_impl.cpp]
	-> compute_numel() [OVERFLOW HERE]
	-> TensorImpl::nbytes() [OVERFLOW HERE]
	-> getTensorDataPtr() [undersized allocation]
	```

	## Remediation

	1. Uncomment and enable the overflow checks in `program_validation.cpp` (lines 35-79). Replace the `c10::mul_overflows` dependency if needed with a portable implementation.

	2. Add overflow checks to `compute_numel()`:
	```cpp
	ssize_t compute_numel(const TensorImpl::SizesType* sizes, ssize_t dim) {
	ssize_t numel = 1;
	for (const auto i : c10::irange(dim)) {
	ET_CHECK_MSG(sizes[i] >= 0, ...);
	if (sizes[i] != 0 && numel > SSIZE_MAX / sizes[i]) {
	ET_CHECK_MSG(false, "numel overflow at dimension %zd", i);
	}
	numel *= sizes[i];
	}
	return numel;
	}
	```

	3. Add overflow check to `TensorImpl::nbytes()`:
	```cpp
	size_t TensorImpl::nbytes() const {
	size_t elem = elementSize(type_);
	ET_CHECK_MSG(elem == 0 \|\| static_cast<size_t>(numel_) <= SIZE_MAX / elem,
	"nbytes overflow");
	return static_cast<size_t>(numel_) * elem;
	}
	```

	## References

	- `runtime/core/portable_type/tensor_impl.cpp` -- `compute_numel()` overflow
	- `runtime/core/portable_type/tensor_impl.cpp` -- `TensorImpl::nbytes()` overflow
	- `runtime/executor/program_validation.cpp` -- commented-out overflow checks
	- `runtime/executor/tensor_parser_portable.cpp` -- `parseTensor()` overflow propagation
	- `runtime/executor/tensor_parser_exec_aten.cpp` -- `getTensorDataPtr()` undersized allocation
	- `runtime/core/exec_aten/util/dim_order_util.h` -- `dim_order_to_stride_nocheck()` stride overflow