TensorRT QKV runnerStateBuffer PluginField length validation PoC package

Status

This repository is a triager-side source-level PoC package, not a final mutated TensorRT .engine file.

The bug candidate is a validation gap in NVIDIA TensorRT OSS QKV plugin runtime deserialization:

runnerStateBuffer is serialized as PluginFieldType::kUNKNOWN with mRunnerStateBuffer.size().
Runtime plugin creators store only fc->fields[i].data.
They do not validate fc->fields[i].type or fc->fields[i].length.
The constructor later calls dispatcher deserialization using dispatcher->getSerializationSize(), not the actual serialized field length.

The included ASan harness models the exact plugin-level validation gap and demonstrates an out-of-bounds read if a short runnerStateBuffer reaches the runtime creator.

Affected source snapshot

Tested TensorRT source snapshot / commit:

5302b288c5603895508711bdf2df11dc29c8ad92

Primary files:

plugin/bertQKVToContextPlugin/qkvToContextPlugin.cpp
plugin/bertQKVToContextPlugin/mhaRunner.cu
plugin/common/plugin.cpp
include/NvInferPluginBase.h
include/NvInferRuntime.h

Files in this package

README.md
qkv_runnerstate_harness.cpp
qkv_runnerstate_fixed_harness.cpp
expected_vulnerable_asan_output.txt
expected_fixed_output.txt
qkv_runnerstate_proposed_patch.diff
tensorrt_qkv_key_lines.txt
description_for_huntr.md
runtime_poc_requirements.md
NO_REAL_ENGINE_FILE_INCLUDED.txt

Reproduce the plugin-level issue

On Linux with clang or g++ and AddressSanitizer:

clang++ -std=c++17 -fsanitize=address -g -O1 qkv_runnerstate_harness.cpp -o qkv_runnerstate_harness
./qkv_runnerstate_harness

Expected vulnerable result:

ERROR: AddressSanitizer: heap-buffer-overflow
READ of size 4

The expected output is also included in:

expected_vulnerable_asan_output.txt

Reproduce the fixed behavior

clang++ -std=c++17 -fsanitize=address -g -O1 qkv_runnerstate_fixed_harness.cpp -o qkv_runnerstate_fixed_harness
./qkv_runnerstate_fixed_harness

Expected fixed behavior:

Rejected safely: invalid runnerStateBuffer length: rejected before deserialize

Important limitation

This package does not include a malformed TensorRT .engine/.trt/.mytrtfile.

The remaining triage question is whether TensorRT core allows a malformed serialized plugin field with a short runnerStateBuffer.length to reach:

createPlugin(..., TensorRTPhase::kRUNTIME)

If TensorRT core does not validate that field length before invoking the plugin creator, then the QKV plugin code can overread because it ignores the actual PluginField.length.

Suggested triager-side runtime test

NVIDIA/TensorRT triage can test the candidate by creating or mutating a serialized engine containing a QKV plugin runtime field collection where:

field.name   = "runnerStateBuffer"
field.type   = PluginFieldType::kUNKNOWN
field.data   = non-null
field.length = shorter than dispatcher->getSerializationSize()

Then deserialize the engine and verify whether the plugin creator reaches the runtime constructor and whether the dispatcher deserializer reads beyond the supplied field data.

Suggested fix

Validate the runtime runnerStateBuffer field metadata before deserialization:

PLUGIN_VALIDATE(fc->fields[i].type == PluginFieldType::kUNKNOWN,
    "invalid runnerStateBuffer field type");
runnerStateBufferLength = fc->fields[i].length;
runnerStateBuffer = static_cast<void const*>(fc->fields[i].data);

Before constructing/deserializing:

PLUGIN_VALIDATE(runnerStateBufferLength == expectedSerializationSize,
    "invalid runnerStateBuffer field length");

Prefer passing the validated actual length into the constructor/deserializer instead of recomputing an expected length independently of the serialized field metadata.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support