You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Vulnerability Report: Unauthenticated RCE in TensorRT-LLM (MGMN Leader Node)

Summary

I have identified a Critical Remote Code Execution (RCE) vulnerability in the TensorRT-LLM Multi-GPU Multi-Node (MGMN) launcher. The vulnerability exists in the mgmn_leader_node.py script, which initializes an IPC server without enforcing HMAC authentication. Combined with insecure environment variable handling, this allows a local or network attacker to force the server to bind to an external interface and execute arbitrary code.

Vulnerability Details

Component: tensorrt_llm/llmapi/mgmn_leader_node.py and tensorrt_llm/llmapi/mpi_session.py

Root Cause:

  1. Insecure Default Initialization: In tensorrt_llm/llmapi/mgmn_leader_node.py, the RemoteMpiCommSessionServer is initialized without passing an hmac_key.

    # mgmn_leader_node.py
    server = RemoteMpiCommSessionServer(
        comm=sub_comm,
        n_workers=num_ranks,
        addr=get_spawn_proxy_process_ipc_addr_env(), 
        is_comm=True) # MISSING hmac_key
    

Security Fallback Failure: In tensorrt_llm/llmapi/mpi_session.py, the init method sets use_hmac_encryption to False if no key is provided.

Python

mpi_session.py

self.queue = ZeroMqQueue(..., use_hmac_encryption=bool(hmac_key)) This disables the signature check on the IPC socket, allowing unauthenticated pickle.loads deserialization.

Insecure Environment Variable Handling: The bind address is derived from TLLM_SPAWN_PROXY_PROCESS_IPC_ADDR (in utils.py), which can be controlled by any user on the system before the service starts.

Attack Scenario An attacker sets the environment variable: export TLLM_SPAWN_PROXY_PROCESS_IPC_ADDR="tcp://0.0.0.0:4444"

The victim (or automated orchestration system) executes mgmn_leader_node.py.

The server binds to port 4444 on all interfaces with HMAC Encryption Disabled.

The attacker connects to port 4444 and sends a malicious Pickle payload containing shell commands (e.g., reverse shell).

The ZeroMqQueue class deserializes the payload without verification, executing the attacker's code with the privileges of the TensorRT-LLM process.

Impact This vulnerability allows for Arbitrary Code Execution (ACE). In shared cluster environments (e.g., Slurm/Kubernetes), this allows a low-privileged user to escalate privileges or move laterally to other nodes running TensorRT-LLM.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support