onnx-runtime-oom-poc / vulnerability_report.md
Rodion111's picture
Upload vulnerability_report.md with huggingface_hub
360554e verified

Vulnerability Report: DoS via Unbounded Initializer Allocation in ONNX Runtime

Summary

The ONNX Runtime parser is vulnerable to a Denial of Service (DoS) attack due to unbounded memory allocation when processing model initializers. By crafting an ONNX model with extreme tensor dimensions in an initializer, an attacker can trigger a std::bad_alloc exception, crashing the inference session during initialization.

Description

The vulnerability exists in the core graph loading logic of ONNX Runtime. When a model contains an Initializer (constant tensor), the runtime attempts to allocate memory based on the dimensions provided in the Protobuf message BEFORE verifying if the total size is reasonable or if the data is actually present in the file.

In onnxruntime/core/graph/graph.cc (or related initializer loading code), the multiplication of dimensions can lead to massive values. While ONNX Runtime has some integer overflow checks, it fails to bound the total allocation size against available system memory or a sane maximum limit for model metadata.

Root Cause

Unvalidated memory allocation based on attacker-controlled TensorProto.dims during session initialization.

Affected Code Location

File: onnxruntime/core/graph/model_load_utils.h and onnxruntime/core/session/inference_session.cc Confirmed on: ONNX Runtime v1.24.4 (latest)

Impact

A remote attacker providing a malicious ONNX model can cause a complete Denial of Service of any application using ONNX Runtime (e.g., cloud inference servers, edge devices, local AI tools).

CVSS v3.1 Score

Base Score: 7.5 (High) Vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H

Reproduction Steps

  1. Install ONNX Runtime: pip install onnxruntime
  2. Generate the exploit model using the provided poc_oom_gen.py.
  3. Load the model:
import onnxruntime as ort
try:
    ort.InferenceSession('oom.onnx')
except Exception as e:
    print(f"Crash confirmed: {e}")

Proof of Concept

See poc_oom_gen.py and verification_results.md.

Remediation

Implement a maximum allocation limit for initializers during model loading. Dimensions should be validated against a global MAX_MODEL_METADATA_SIZE before std::vector or buffer allocation.