YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

TensorRT Region Plugin: Integer Overflow -> Heap Buffer Overflow

Vulnerability Summary

Component: plugin/regionPlugin/regionPlugin.cpp in NVIDIA TensorRT OSS

Type: Integer Overflow (CWE-190) leading to Heap Buffer Overflow (CWE-122)

Entry Point: RegionPluginCreator::deserializePlugin()Region::Region(void const* buffer, size_t length)

Trigger: Loading a crafted TensorRT engine file (.engine / .trt) containing a malicious Region_TRT plugin

Impact: Heap corruption → potential Remote Code Execution

Root Cause

In regionPlugin.cpp, the Region deserialization constructor reads smTreeTemp->n directly from attacker-controlled serialized data (line 117) without any validation:

smTreeTemp->n = read<int32_t>(d);  // attacker-controlled

if (leafPresent)
{
    allocateChunk(smTreeTemp->leaf, smTreeTemp->n);  // malloc(n * sizeof(int32_t))
}

The allocateChunk template function (line 39-42) performs:

template <typename T>
void allocateChunk(T*& ptr, int32_t count)
{
    ptr = static_cast<T*>(malloc(count * sizeof(T)));  // integer overflow possible
}

Bug 1 - No bounds check: n can be any value (including negative, which wraps to huge unsigned in the loop).

Bug 2 - Integer overflow: When n = 0x40000001, n * sizeof(int32_t) = 0x100000004, which truncates to 0x4 on 32-bit multiplication, causing malloc(4) to return a 4-byte buffer.

The subsequent loop writes n elements (0x40000001 = ~1 billion int32's) into this 4-byte buffer:

for (int32_t i = 0; i < smTreeTemp->n; i++)
{
    if (smTreeTemp->leaf)
    {
        smTreeTemp->leaf[i] = read<int32_t>(d);  // HEAP OVERFLOW
    }
}

The same pattern affects parent, child, group, name, groupSize, and groupOffset arrays.

The PLUGIN_VALIDATE(d == a + length) check at line 227 runs after the overflow has already occurred.

Affected Fields (all exploitable via same pattern)

Field Line Allocation Loop Write
leaf 121 allocateChunk(leaf, n) L152-157
parent 129 allocateChunk(parent, n) L158-161
child 137 allocateChunk(child, n) L162-165
group 145 allocateChunk(group, n) L166-169
name 174 allocateChunk(name, n) L183-190
groupSize 196 allocateChunk(groupSize, groups) L212-215
groupOffset 204 allocateChunk(groupOffset, groups) L216-219

Attack Vector

  1. Attacker crafts a TensorRT engine file containing a Region_TRT plugin
  2. The serialized plugin data has n = 0x40000001 (or other overflow-triggering value)
  3. Victim loads the engine file using TensorRT runtime (trtexec, Python API, or C++ API)
  4. RegionPluginCreator::deserializePlugin() is called automatically
  5. Region::Region(buffer, length) triggers heap overflow
  6. Heap metadata corruption → potential arbitrary write → RCE

Files

  • trigger.cpp — Standalone C++ PoC demonstrating the heap overflow (no TensorRT installation needed)
  • gen_payload.py — Python script generating a malicious serialized payload
  • malicious_region_payload.bin — Pre-generated malicious plugin serialization data
  • trigger_deserialize.py — Python trigger using real TensorRT deserialize_plugin() API

Note: malicious_region_payload.bin is the raw serialized data for the Region_TRT plugin — the exact bytes that RegionPluginCreator::deserializePlugin() receives. In a real attack, this data would be embedded inside a .engine file and deserialized automatically when loading the model.

Reproduction

Method 1: Standalone ASan PoC (no TensorRT needed)

clang++ -fsanitize=address -g -O0 -o trigger trigger.cpp
./trigger

Method 2: Real TensorRT Deserialization

# Build TensorRT OSS plugins (optionally with ASan)
git clone https://github.com/NVIDIA/TensorRT.git && cd TensorRT
mkdir build && cd build
cmake .. -DCMAKE_CXX_FLAGS="-fsanitize=address" -DBUILD_PARSERS=OFF -DBUILD_SAMPLES=OFF
make -j$(nproc) nvinfer_plugin

# Generate payload and trigger
cd /path/to/this/repo
python3 gen_payload.py
LD_LIBRARY_PATH=/path/to/TensorRT/build/out python3 trigger_deserialize.py

Expected Output (Method 1)

=== TensorRT Region Plugin Heap Overflow PoC ===
[*] ...
[*] Simulating integer overflow: allocating 4 elements but writing 128
[*] Writing 128 elements into 4-element buffer...
=================================================================
==XXXXX==ERROR: AddressSanitizer: heap-buffer-overflow on address ...
WRITE of size 4 at ... thread T0

Generate Standalone Payload

python3 gen_payload.py
# Output: malicious_region_payload.bin

Suggested Fix

Add bounds validation for n and groups before allocation:

smTreeTemp->n = read<int32_t>(d);

// Validate n against remaining buffer size
if (smTreeTemp->n < 0 || smTreeTemp->n > (int32_t)((length - (d - a)) / sizeof(int32_t))) {
    PLUGIN_VALIDATE(false && "Invalid softmaxTree.n in serialized data");
    return;
}

And add overflow check in allocateChunk:

template <typename T>
void allocateChunk(T*& ptr, int32_t count)
{
    if (count <= 0 || (size_t)count > SIZE_MAX / sizeof(T)) {
        ptr = nullptr;
        return;
    }
    ptr = static_cast<T*>(malloc((size_t)count * sizeof(T)));
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support