YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
TENSORRT-09: PriorBox plugin deserialize() heap out-of-bounds read
Summary
NVIDIA TensorRT ships a built-in PriorBox plugin (plugin/priorBoxPlugin) used to
deserialize SSD-style prior-box layers from a serialized engine or plugin blob.
PriorBox::deserialize() reads three attacker-controlled 32-bit counts from the
serialized buffer (numMinSize, numMaxSize, numAspectRatios), then loops that many
times calling a shared helper read<float>() that does an unchecked memcpy advancing
a raw pointer, with no check that the read stays inside the buffer. The only bounds
check (PLUGIN_VALIDATE(d == data + length)) runs after all the reads have already
happened. A crafted buffer with a large numMinSize and a short total length causes an
out of bounds heap read while the engine/plugin is being deserialized, before any of the
declared length is otherwise validated.
Vulnerable code (cloned NVIDIA/TensorRT OSS release, plugin/priorBoxPlugin/priorBoxPlugin.cpp):
void PriorBox::deserialize(uint8_t const* data, size_t length)
{
auto const* d{data};
mParam = read<PriorBoxParameters>(d);
auto readArray = [&d](int32_t size, std::vector<float>& dstVec, float*& dstPtr) {
PLUGIN_VALIDATE(size >= 0);
dstVec.resize(size);
for (int32_t i = 0; i < size; i++)
{
dstVec[i] = read<float>(d); // unchecked memcpy, no length test
}
dstPtr = dstVec.data();
};
readArray(mParam.numMinSize, mMinSizeCPU, mParam.minSize);
readArray(mParam.numMaxSize, mMaxSizeCPU, mParam.maxSize);
readArray(mParam.numAspectRatios, mAspectRatiosCPU, mParam.aspectRatios);
mH = read<int32_t>(d);
mW = read<int32_t>(d);
PLUGIN_VALIDATE(d == data + length); // only checked AFTER all the reads above
and the shared helper (plugin/common/plugin.h):
template <typename OutType, typename BufferType>
OutType read(BufferType const*& buffer)
{
static_assert(sizeof(BufferType) == 1, "BufferType must be a 1 byte type.");
OutType val{};
std::memcpy(&val, static_cast<void const*>(buffer), sizeof(OutType));
buffer += sizeof(OutType);
return val;
}
The entry point reached from IRuntime::deserializeCudaEngine() for a PriorBox layer is:
PriorBoxPluginCreator::deserializePlugin(name, serialData, serialLength)
-> PriorBox::PriorBox(data, length)
-> PriorBox::deserialize(data, length)
-> readArray(mParam.numMinSize, ...) { ... read<float>(d); ... }
What this proof of concept demonstrates
This PoC calls the real, unmodified PriorBoxPluginCreator::deserializePlugin() (the
exact production entry point above) directly with a crafted 80-byte buffer, and gets a
live AddressSanitizer heap-buffer-overflow crash confirming the out of bounds read is
real and reachable, not just a static code-reading claim.
Files:
gen_payload.pybuildstensorrt09_priorbox_payload.bin, an 80-byte serializedPriorBoxParametersheader withnumMinSize = 8192(andnumMaxSize = 0,numAspectRatios = 0), i.e. a declared array count far larger than the buffer that actually backs it.tensorrt09_priorbox_payload.binis the generated 80-byte payload.standalone_build/harness_standalone.cpploads that payload into its ownnew unsigned char[80]heap allocation and callsPriorBoxPluginCreator::deserializePlugin("pb", data, 80)on it, exactly like the real TensorRT runtime would during engine deserialization.standalone_build/build.shcompiles the harness together with the actual, unmodifiedplugin/priorBoxPlugin/priorBoxPlugin.cppandplugin/vc/checkMacrosPlugin.cppfrom the cloned NVIDIA/TensorRT repository, with AddressSanitizer enabled (-fsanitize=address).standalone_build/stubs/andstandalone_build/stub_defs.cppprovide minimal link-time stand-ins for the handful of CUDA Toolkit / closed-source TensorRT runtime symbols (cudaMalloc,cudaMemcpy,cudaFree,cudaDeviceGetAttribute,cudaGetDevice,cudaGetErrorString,cudaDeviceReset,priorBoxInference,nvinfer1::getLogger()) that this environment's host does not have installed (no CUDA Toolkit, no full closed-sourcelibnvinfer/libnvinfer_plugin). None of these stubbed symbols execute on this crash path: the AddressSanitizer abort happens on the very first out of boundsread<float>()call inside thereadArrayloop, strictly beforePriorBox::setupDeviceMemory()(the only place any CUDA call is made) and strictly before anyPLUGIN_VALIDATEfailure path (the only place the logger would be used). They exist purely so the real vulnerable source compiles and links into a runnable binary; they are declarations/no-op definitions, not a reimplementation of the vulnerable logic itself.standalone_build/asan_crash_log.txtis the captured output of an actual run.
How to reproduce
On Linux (or WSL) with g++ and libasan available:
cd standalone_build
TRT_REPO=/path/to/cloned/NVIDIA/TensorRT ./build.sh
ASAN_OPTIONS=halt_on_error=1 ./harness_standalone ../tensorrt09_priorbox_payload.bin
Observed result
AddressSanitizer reports a heap-buffer-overflow READ of size 4, 0 bytes after an 80-byte heap region, with this call stack:
READ of size 4 at ... thread T0
#0 float nvinfer1::plugin::read<float, unsigned char>(unsigned char const*&)
#1 operator() priorBoxPlugin.cpp:152 (the readArray lambda body)
#2 nvinfer1::plugin::PriorBox::deserialize(unsigned char const*, unsigned long) priorBoxPlugin.cpp:156
#3 nvinfer1::plugin::PriorBox::PriorBox(void const*, unsigned long) priorBoxPlugin.cpp:139
#4 std::make_unique<nvinfer1::plugin::PriorBox, ...>(...)
#5 nvinfer1::plugin::PriorBoxPluginCreator::deserializePlugin(char const*, void const*, unsigned long) priorBoxPlugin.cpp:516
#6 main
full output in standalone_build/asan_crash_log.txt.
Impact
Any application that loads a TensorRT engine or a standalone serialized PriorBox plugin
blob from an untrusted source (a common pattern for ML model files distributed and
loaded the same way as other model formats) can trigger an out of bounds heap read
during deserialization purely by crafting the numMinSize / numMaxSize /
numAspectRatios fields. Depending on heap layout this can crash the process (denial of
service) or leak adjacent heap memory into the plugin's internal float arrays, which is
a memory disclosure primitive since those arrays influence later plugin output.
The same shared read<T>() helper with the same missing bounds check is used by several
other bundled TensorRT plugins (GenerateDetection, ProposalLayer, PyramidROIAlign,
RPROI, MultilevelCropAndResize, ResizeNearest, EfficientNMS); PriorBox is reported here
as the representative instance.